Notable performance improvements have been made using the techniques below. There is more to do, see the Performance page for current issues and thoughts.
When I last profiled the I2P code, the vast majority of time was spent within one function: java.math.BigInteger's modPow. Rather than try to tune this method, we'll call out to GNU MP - an insanely fast math library (with tuned assembler for many architectures). (Editor: see NativeBigInteger for faster public key cryptography)
ugha and duck are working on the C/JNI glue code, and the existing java code is already deployed with hooks for that whenever its ready. Preliminary results look fantastic - running the router with the native GMP modPow is providing over a 800% speedup in encryption performance, and the load was cut in half. This was just on one user's machine, and things are nowhere near ready for packaging and deployment, yet.
Garlic wrapping a "reply" LeaseSet[implemented but needs tuning]
This algorithm tweak will only be relevant for applications that want their peers to reply to them (though that includes everything that uses I2PTunnel or mihi's ministreaming lib):
Previously, when Alice sent Bob a message, when Bob replied he had to do a lookup in the network database - sending out a few requests to get Alice's current LeaseSet. If he already has Alice's current LeaseSet, he can instead just send his reply immediately - this is (part of) why it typically takes a little longer talking to someone the first time you connect, but subsequent communication is faster. Currently - for all clients - we wrap the sender's current LeaseSet in the garlic that is delivered to the recipient, so that when they go to reply, they'll always have the LeaseSet locally stored - completely removing any need for a network database lookup on replies. This trades off a large portion of the sender's bandwidth for that faster reply. If we didn't do this very often, overall network bandwidth usage would decrease, since the recipient doesn't have to do the network database lookup.
For unpublished LeaseSets such as "shared clients", this is the only way to get the LeaseSet to Bob. Unfortunately this bundling every time adds almost 100% overhead to a high-bandwidth connection, and much more to a connection with smaller messages.
Changes scheduled for release 0.6.2 will bundle the LeaseSet only when necessary, at the beginning of a connection or when the LeaseSet changes. This will substantially reduce the total overhead of I2P messaging.
More efficient TCP rejection[implemented]
At the moment, all TCP connections do all of their peer validation after going through the full (expensive) Diffie-Hellman handshaking to negotiate a private session key. This means that if someone's clock is really wrong, or their NAT/firewall/etc is improperly configured (or they're just running an incompatible version of the router), they're going to consistently (though not constantly, thanks to the banlist) cause a futile expensive cryptographic operation on all the peers they know about. While we will want to keep some verification/validation within the encryption boundary, we'll want to update the protocol to do some of it first, so that we can reject them cleanly without wasting much CPU or other resources.
Adjust the tunnel testing[implemented]
Rather than going with the fairly random scheme we have now, we should use a more context aware algorithm for testing tunnels. e.g. if we already know its passing valid data correctly, there's no need to test it, while if we haven't seen any data through it recently, perhaps its worthwhile to throw some data its way. This will reduce the tunnel contention due to excess messages, as well as improve the speed at which we detect - and address - failing tunnels.
Persistent Tunnel / Lease Selection
Outbound tunnel selection implemented in 0.6.1.30, inbound lease selection implemented in release 0.6.2.
Selecting tunnels and leases at random for every message creates a large incidence of out-of-order delivery, which prevents the streaming lib from increasing its window size as much as it could. By persisting with the same selections for a given connection, the transfer rate is much faster.
Compress some data structures[implemented]
The I2NP messages and the data they contain is already defined in a fairly compact structure, though one attribute of the RouterInfo structure is not - "options" is a plain ASCII name = value mapping. Right now, we're filling it with those published statistics - around 3300 bytes per peer. Trivial to implement GZip compression would nearly cut that to 1/3 its size, and when you consider how often RouterInfo structures are passed across the network, that's significant savings - every time a router asks another router for a networkDb entry that the peer doesn't have, it sends back 3-10 RouterInfo of them.
Update the ministreaming protocol[replaced by full streaming protocol]
Currently mihi's ministreaming library has a fairly simple stream negotiation protocol - Alice sends Bob a SYN message, Bob replies with an ACK message, then Alice and Bob send each other some data, until one of them sends the other a CLOSE message. For long lasting connections (to an IRC server, for instance), that overhead is negligible, but for simple one-off request/response situations (an HTTP request/reply, for instance), that's more than twice as many messages as necessary. If, however, Alice piggybacked her first payload in with the SYN message, and Bob piggybacked his first reply with the ACK - and perhaps also included the CLOSE flag - transient streams such as HTTP requests could be reduced to a pair of messages, instead of the SYN+ACK+request+response+CLOSE.
Implement full streaming protocol[implemented]
The ministreaming protocol takes advantage of a poor design decision in the I2P client protocol (I2CP) - the exposure of "mode=GUARANTEED", allowing what would otherwise be an unreliable, best-effort, message based protocol to be used for reliable, blocking operation (under the covers, its still all unreliable and message based, with the router providing delivery guarantees by garlic wrapping an "ACK" message in with the payload, so once the data gets to the target, the ACK message is forwarded back to us [through tunnels, of course]).
As I've said, having I2PTunnel (and the ministreaming lib) go this route was the best thing that could be done, but more efficient mechanisms are available. When we rip out the "mode=GUARANTEED" functionality, we're essentially leaving ourselves with an I2CP that looks like an anonymous IP layer, and as such, we'll be able to implement the streaming library to take advantage of the design experiences of the TCP layer - selective ACKs, congestion detection, nagle, etc.