Number
111
Author
EinMByte, zzz
Editor
manas, str4d
Created
Thread
http://zzz.i2p/topics/1577
Last updated
Status
Open
Supercedes
106

Overview

This proposal describes an authenticated key agreement protocol to improve the resistance of [NTCP] to various forms of automated identification and attacks.

The proposal is organized as follows: the security goals are presented, followed by a discussion of the basic protocol. Next, a complete specification of all protocol messages is given. Finally, router addresses and version identification are discussed. An appendix discussing a generic attack on common padding schemes is also included, as well as an appendix containing a number of candidates for the authenticated cipher.

Motivation

[NTCP] data is encrypted after the first message (and the first message appears to be random data), thus preventing protocol identification through "payload analysis". It is still vulnerable to protocol identification through "flow analysis". That's because the first 4 messages (i.e. the handshake) are fixed length (288, 304, 448, and 48 bytes).

By adding random amounts of random data to each of the messages, we can make it a lot harder.

The authors acknowledge that standard security practices would suggest to use an existing protocol such as TLS, but this is [Prop104] and it has problems of its own. Wherever appropriate, "future work" paragraphs have been added to indicate missing features or subjects of discussion.

Design Goals

  • Support NTCP 1 and 2 on a single port, auto-detect, and published as a single "transport" (i.e. [RouterAddress]) in the [NetDB].
  • Publish support for version 1 only, 2 only, or 1+2 in the NetDB in a separate field, and default to version 1 only (don't bind version support to a particular router version)
  • Ensure that all implementations (java/i2pd/kovri/go) can add version 2 support (or not) on their own schedules
  • Add random padding to all NTCP messages including handshake and data messages (i.e. length obfuscation so all messages aren't a multiple of 16 bytes) Provide options mechanism for both sides to request min and max padding and/or padding distribution. Specifics of the padding distribution are implementation-dependent and may or may not be specified in the protocol itself.
  • Obfuscate the contents of messages that aren't encrypted (1 and 2), sufficiently so that DPI boxes and AV signatures can't easily classify them. Also ensure that the messages going to a single peer or set of peers do not have a similar pattern of bits.
  • Fix loss of bits in DH due to Java format [Ticket1112], possibly (probably?) by switching to X25519.
  • Switch to a real key derivation function (KDF) rather than using the DH result as-is?
  • Add "probing resistance" (as Tor calls it); this includes replay resistance.
  • Maintain 2-way authenticated key exchange (2W-AKE). 1W-AKE is not sufficient for our application.
  • Continue to use the variable-type, variable-length signatures (from the published [RouterIdentity] signing key) for authentication. Do not rely on a separate published key.
  • Add options/version in handshake for future extensibility.
  • Add resistance to malicious MitM TCP segmentation if possible.
  • Don't add significantly to CPU required for connection setup; if possible, reduce it significantly.
  • Add message authentication (MAC), possibly HMAC-SHA256 or Poly1305 (see alternatives below), and remove Adler checksum.
  • If possible, reduce the 4-message, two-round-trip handshake to a 3-message, one-round-trip handshake, as in [SSU]. This would require moving Bob's signature in message 4 to message 2. Research the reason for 4 messages in the ten-year-old email/status/meeting archives.
  • Maintain timestamps for replay and skew detection.
  • Avoid any year 2038 issues in timestamps, must work until at least 2106.
  • Increase max message size from 16K to 32K or 64K.
  • Any new crypto should be readily available in libraries for use in Java (1.7), C++, and Go router implementations.
  • Include representatives of Java, C++, and Go router developers in the design.
  • Minimize changes (but there will still be a lot).

Goals to be clarified further:

  • Support both versions in a common set of code (this may not be possible and is implementation-dependent in any case).

Non-Goals

  • Bullet-proof DPI resistance... that would be pluggable transports, [Prop109].
  • A TLS-based (or HTTPS-lookalike) transport... that would be [Prop104].
  • Don't change the symmetric stream crypto, continue to use AES-CBC-256 (but do add flags in the handshake so we can change later). However, why is using the last 16 bytes of the previous message as the IV better than just using counter mode? To be researched. Salsa 20 also an option (see alternatives below).
  • Timing-based DPI resistance (inter-message timing/delays can be implementation-dependent; intra-message delays can be introduced at any point, including before sending the random padding, for example). Artificial delays (what obfs4 calls IAT or inter-arrival time) are independent of the protocol itself.
  • Deniability of participating in a session (there's signatures in there).

Non-goals that may be partially reconsidered or discussed:

  • The degree of protection against Deep Packet Inspection (DPI)
  • Post-Quantum (PQ) security
  • Deniability

Security Goals

We consider three parties:

  • Alice, who wishes to establish a new session.
  • Bob, with whom Alice wishes to establish a session.
  • Mallory, the "man in the middle" between Alice and Bob.

At most two participants can engage in active attacks.

Alice and Bob are both in possession of a static key pair, which is contained in their [RouterIdentity].

The proposed protocol attempts to allow Alice and Bob to agree on a shared secret key (in the sequel denoted K) under the following requirements:

  1. Private key security: neither Bob nor Mallory learns anything about Alice's static private key. Symmetrically, Alice does not learn anything about Bob's static private key.
  2. The session key K is only known by Alice and Bob.
  3. Perfect forward secrecy: the agreed upon session key remains secret in the future, even when the static private keys of Alice and/or Bob are revealed after the key has been agreed upon.
  4. Two-way authentication: Alice is certain that she has established a session with Bob, and vice versa.
  5. Protection against straightforward DPI: it is not trivial to detect that Alice and Bob are engaged in the protocol using only straightforward deep packet inspection (DPI) techniques.
  6. Limited deniability: neither Alice nor Bob can deny participation in the protocol, but if either leaks the shared key the other party can deny the authenticity of the contents of the transmitted data.

The present proposal attempts to provide all five requirements based on the Station-To-Station (STS) protocol [STS]. Note that this protocol is also the basis for the [SSU] protocol.

The notion of "straightforward DPI" is here taken to include the following adversary capabilities:

  1. The ability to inspect all data sent or received by the target.
  2. The ability to perform operations on the observed data, such as applying block ciphers or hash functions.
  3. The ability to store and compare with previously sent messages.
  4. The ability to modify, delay or fragment packets.

However, the attacker has the following restrictions:

  1. The inability to map IP addresses to router hashes. While this is trivial, it would require a DPI system specifically designed to target I2P.
  2. The inability to use timing information to detect the protocol.
  3. Generally speaking, the DPI toolbox shouldn't contain any built-in tools that are specifically designed for I2P detection. This includes creating "honeypots", which would for example include nonrandom padding in their messages. Note that this does not exclude machine learning systems or highly configurable DPI tools as long as they meet the other requirements.

To counter payload analysis, it is ensured that all messages are indistinguishable from random. This also requires their length to be random, which is more complicated than just adding random padding. In fact, in Appendix A, the authors argue that a naive (i.e. uniform) padding scheme does not resolve the problem. Appendix A therefore proposes to include either random delays or to develop an alternate padding scheme that can provide reasonable protection for the proposed attack.

To protect against the sixth entry above, implementations should include random delays in the protocol. Such techniques are not covered by this proposal, but they could also resolve the padding length issues. In summary, the proposal provides good protection against payload analysis (when the considerations in Appendix A are taken into account), but only limited protection against flow analysis.

Future work:

  • Consider the behaviour of the protocol when packets are dropped or reordered by an attacker. Recent interesting work in this area can be found in [IACR-1150].
  • Provide a more accurate classification of DPI systems, taking into account the existing literature related to the subject.
  • Discuss the formal security of the proposed protocol, ideally taking into account the DPI attacker model.

Basic Protocol

The protocol consists of two phases:

  1. Key exchange, based on Diffie-Hellman (in principle over an arbitrary group).
  2. Signature exchange, to provide authentication.

For the group used by (1), we will use multiplicative notation despite the fact that additive notation is more common for elliptic curve groups. Finally, note that the protocol allows the integration of post-quantum key exchange mechanisms such as supersingular isogeny key exchange [SIDH]. However, full post-quantum security would also require introducing new signature types in the RouterInfo, and is outside of the scope of this proposal.

The signatures used for (2) can be any signature supported by the RI structure [SigningPublicKey].

Let g be the (known) generator, x and y private keys, and public keys X = g^x, Y = g^y. X and Y are elements of an the group used by (1).

The STS protocol proceeds as follows:

Alice                           Bob

X -------------------------------->
<-----------------Y, E_K(S_B(X, Y))
E_K(S_A(X, Y))-------------------->

where S_J(M) is the signature of M by party J, and the session key is K = KDF(X^y) = KDF(Y^x). KDF is assumed to be a key derivation function, the choice of which is in principle arbitrary.

Some notes on the above protocol, which are also discussed in [STS] are listed below:

  • It is prudent to let both parties sign both X and Y, but in principle this is not necessary.
  • The signatures are encrypted to prevent Mallory from substituting the (then unencrypted) signature in the last message with his own signature. This is further discussed in Trac [Ticket1849].
  • NTCP2 adds various options, as well as timestamps to this protocol.

Future work:

  • The original NTCP protocol requires four messages in the establishment sequence, for unclear reasons. This proposal does not provide an explanation for this yet.

Messages

The establishment sequence is as follows:

Alice                           Bob

SessionRequest ------------------->
<------------------- SessionCreated
SessionConfirmed ----------------->

Once a session has been established, Alice and Bob can exchange Data messages. Approximately every 15 minutes, TimeSync messages are transmitted.

All message types (SessionRequest, SessionCreated, SessionConfirmed, Data and TimeSync) are specified in this section.

Some notations:

- RH_A = Router hash Alice
- RH_B = Router hash Bob

1) SessionRequest

Raw contents:

+----+----+----+----+----+----+----+----+
|       AES-CBC-256 encrypted data      |
+     (length implied by packet size)   +
|                                       |
~               .   .   .               ~
|                                       |
+----+----+----+----+----+----+----+----+
~                padding                ~
+----+----+----+----+----+----+----+----+

data :: AES-256-CBC encrypted options, X and padding
        key: RH_B
        iv: 0x0000 0000 0000 0000

padding :: 0-15 bytes

Unencrypted data:

+----+----+----+----+----+----+----+----+
|                                       |
+              options                  +
|                                       |
+----+----+----+----+----+----+----+----+
|              ext_options              |
+       (number implied by options)     +
|                                       |
~               .   .   .               ~
|                                       |
+----+----+----+----+----+----+----+----+
|                   X                   |
+       (length implied by options)     +
|                                       |
~               .   .   .               ~
|                                       |
+----+----+----+----+----+----+----+----+
|      Arbitrary amount of padding      |
+      (length implied by options)      +
|                                       |
~               .   .   .               ~
|                                       |
+----+----+----+----+----+----+----+----+

options :: options block

ext_options :: additional options blocks, format currently undefined
               length: multiple of 16 bytes

X :: padded to multiple of 16

Options block:

+----+----+----+----+----+----+----+----+
|   ver   |    KE   |   auth  |  padLen |
+----+----+----+----+----+----+----+----+
|        tsA        | NO | Reserved (0) |
+----+----+----+----+----+----+----+----+

ver :: Protocol version (currently 2)

KE  :: Key-exchange mechanism used
          0: Diffie-Hellman in Z/pZ [RFC-3526], 2048 bit p
             KDF = SHA256 (possibly truncated)
          1: Diffie-Hellman over curve 25519 (X25519)
             KDF = SHA256 (possibly truncated)

auth :: Authenticated encryption mode
        Key = K, to be agreed upon using KE
        0: AES-CBC-256/HMAC-MD5 [RFC-2104]
           IV  = included before the encrypted data and MAC (for first
                 message)
               = last encrypted block of (your own) previous message
                 [XXX: alternative IV approaches to be investigated]
        ... (Proposed alternatives are listed in Appendix B.)

padLen :: Length of the padding
          (Distribution to be determined, see Appendix A.)

tsA :: Unix timestamp
       Wraps around in 2106

NO :: Number of following option blocks.

Notes

  • The timestamp and padding length in the initial AES block ensure that the ciphertext is different for every session, even with IV = 0.
    • [XXX: The simple assumption is that Alice will not send multiple different SessionRequest messages to the same Bob within a second. This assumption could potentially be broken by a system time change, but the packets are still protected if there is sufficient randomness in the padding length, which will depend on the padding algorithm.]
    • [XXX: Alternatively, the SessionRequest message could be prepended with a random IV. This would ensure cryptographic indistinguishability, but at the expense of packet size identifiability: the base packet size would be 16 bytes larger, reducing the range of potential packet sizes that the padding algorithm could generate. Given the fact that additional options blocks may be included, the random IV may in fact be negligible overhead - to be investigated.]
  • Reserved options must be set to zero if ver = 2. This increases the accuracy of version detection.
  • Diffie-Hellman parameters may never be sent twice to avoid DPI attacks.
  • The "KE" and "auth" options must be compatible, i.e. the shared secret K must be of the appropriate size. If more "auth" options are added, this could implicitly change the meaning of the "KE" flag to use a different KDF or a different truncation size.
  • KE = 0 is not exactly the same as in NTCP 1, where X was represented in Java's BigInteger format. NTCP2 uses the regular representation of X.
  • auth = 0 is not exactly the same as in NTCP 1, since it includes a MAC (HMAC-MD5). The author suggests that this should only be used as a transitional option, for reasons discussed below.
  • The options block and X are encrypted to ensure payload indistinguishably, which is a necessary DPI countermeasure. We use AES to achieve obfuscation, rather than more complicated and slower alternatives such as elligator2 (which would apply to X25519). The padding does not need to be encrypted by Alice [XXX: Is this valid?], but should be decrypted by Bob to inhibit timing attacks.

Authenticated encryption

In subsequent messages, B will be the block size (in bytes) of the cipher used for authenticated encryption (as specified in the "auth" field).

Encrypted/authenticated data will be represented as

+----+----+----+----+----+----+----+----+
|   Encrypted and authenticated data    |
+   (mode determined by auth option)    +
|                                       |
~               .   .   .               ~
|                                       |
+----+----+----+----+----+----+----+----+

For AES-CBC-256/HMAC-MD5 this has the following specific format

+----+----+----+----+----+----+----+----+
|                                       |
+                 MAC                   +
|                                       |
+----+----+----+----+----+----+----+----+
|       AES-CBC-256 encrypted data      |
+                                       +
|                                       |
~               .   .   .               ~
|                                       |
+----+----+----+----+----+----+----+----+

The first encrypted and authenticated data (separate for Alice and Bob) starts with a random IV:

+----+----+----+----+----+----+----+----+
|                                       |
+                  IV                   +
|                                       |
+----+----+----+----+----+----+----+----+
|                                       |
+                 MAC                   +
|                                       |
+----+----+----+----+----+----+----+----+
|       AES-CBC-256 encrypted data      |
+                                       +
|                                       |
~               .   .   .               ~
|                                       |
+----+----+----+----+----+----+----+----+

Future work

[RFC-6151] states:

The attacks on HMAC-MD5 do not seem to indicate a practical vulnerability when used as a message authentication code. Considering that the distinguishing-H attack is different from a distinguishing-R attack, which distinguishes an HMAC from a random function, the practical impact on HMAC usage as a pseudorandom function (PRF) such as in a key derivation function is not well understood.

Therefore, it may not be urgent to remove HMAC-MD5 from the existing protocols. However, since MD5 must not be used for digital signatures, for a new protocol design, a ciphersuite with HMAC-MD5 should not be included. Options include HMAC-SHA256 [HMAC] [HMAC-SHA256] and [AES-CMAC] when AES is more readily available than a hash function.

Hence, alternative authenticated ciphers should be explored for the final NTCP2 proposal. Plenty of options (other than the ones listed here) are available and should be researched.

Consider candidates from the currently ongoing competition [CAESAR].

2) SessionCreated

Raw contents:

+----+----+----+----+----+----+----+----+
|                   Y                   |
+       (length implied by options)     +
|                                       |
~               .   .   .               ~
|                                       |
+----+----+----+----+----+----+----+----+
|   Encrypted and authenticated data    |
+   (mode determined by auth option)    +
|                                       |
~               .   .   .               ~
|                                       |
+----+----+----+----+----+----+----+----+
|      Arbitrary amount of padding      |
+                                       +
|                                       |
~               .   .   .               ~
|                                       |
+----+----+----+----+----+----+----+----+

Unencrypted data:

+----+----+----+----+----+----+----+----+
|     ts B     | padLen  |              |
+----+----+----+----+----+              +
|               Signature               |
+   (length determined by RI sigtype)   +
|                                       |
~               .   .   .               ~
|                                       |
+----+----+----+----+----+----+----+----+
|         Padding as necessary          |
+         (fewer than B bytes)          +
|                                       |
~               .   .   .               ~
|                                       |
+----+----+----+----+----+----+----+----+

ts B :: Unix timestamp
        Wraps around in 2106

padLen :: Length of the padding
          (Distribution to be determined, see Appendix A.)

The signature is computed over the following data:

+----+----+----+----+----+----+----+----+
|                   X                   |
+                                       +
|                                       |
~               .   .   .               ~
|                                       |
+----+----+----+----+----+----+----+----+
|                   Y                   |
+                                       +
|                                       |
~               .   .   .               ~
|                                       |
+----+----+----+----+----+----+----+----+
|                                       |
+            Options blocks             +
|                                       |
+----+----+----+----+----+----+----+----+
|        tsB        |
+----+----+----+----+

Notes

  • The main reason for signature encryption is to counter DPI. For unknown key-share attacks this does not seem to be necessary. (It is necessary in the SessionConfirmed message.)
  • Timestamps are included to avoid replay attacks and to detect high clock skew.
  • The entire options block is signed to avoid version downgrade attacks.

Future work

  • Is it good practice to include the IP and port of both parties in the signature to avoid replay attacks within the bounds of what is undetectable with timestamps? This is what SSU does, but it doesn't seem to be necessary as X and Y also have to match.
  • Unlike in NTCP, Bob is not able to sign Alice's RI. This should not be an issue, but further investigations would be appropriate.
  • The arbitrary padding is neither encrypted nor authenticated. This appears to be unnecessary, but it should be investigated. The same applies to all other messages with random padding.

3) SessionConfirmed

Raw contents:

+----+----+----+----+----+----+----+----+
|   Encrypted and authenticated data    |
+   (mode determined by auth option)    +
|                                       |
~               .   .   .               ~
|                                       |
+----+----+----+----+----+----+----+----+
|      Arbitrary amount of padding      |
+                                       +
|                                       |
~               .   .   .               ~
|                                       |
+----+----+----+----+----+----+----+----+

Unencrypted data:

+----+----+----+----+----+----+----+----+
|   size  |                             |
+----+----+                             +
|            Alice's RouterInfo         |
~               .   .   .               ~
|                                       |
+----+----+----+----+----+----+----+----+
|  padLen |                             |
+----+----+                             +
|               Signature               |
+   (length determined by RI sigtype)   +
|                                       |
~               .   .   .               ~
|                                       |
+----+----+----+----+----+----+----+----+
|                                       |
+           Padding as necessary        +
|               (< B bytes)             |
~               .   .   .               ~
|                                       |
+----+----+----+----+----+----+----+----+

size :: Alice's RouterInfo size

padLen :: Length of the padding
          (Maximum to be determined)

The signature is computed over the following data:

+----+----+----+----+----+----+----+----+
|                   X                   |
+                                       +
|                                       |
~               .   .   .               ~
|                                       |
+----+----+----+----+----+----+----+----+
|                   Y                   |
+                                       +
|                                       |
~               .   .   .               ~
|                                       |
+----+----+----+----+----+----+----+----+
|                                       |
+            Options blocks             +
|                                       |
+----+----+----+----+----+----+----+----+
|        tsB        |
+----+----+----+----+

Notes

  • As pointed out in "Basic Protocol", both X and Y are included for reasons of robustness.
  • The reason for signature encryption is to avoid trivial DPI, and to counter unknown key-share attacks.
  • Timestamps are included to avoid replay attacks.

Future work

  • Similar note as for SessionCreated with respect to including the IP and port of both parties in the signature.
  • Unlike in NTCP, Alice does not sign Bob's RI (see also SessionCreated). This should not be an issue, but it can be included if desired.

4) Data and TimeSync

Raw contents:

+----+----+----+----+----+----+----+----+
|   Encrypted and authenticated data    |
+   (mode determined by auth option)    +
|                                       |
~               .   .   .               ~
|                                       |
+----+----+----+----+----+----+----+----+
|      Arbitrary amount of padding      |
+                                       +
|                                       |
~               .   .   .               ~
|                                       |
+----+----+----+----+----+----+----+----+

Unencrypted data:

+----+----+----+----+----+----+----+----+
|  size   | padLen  |       Data        |
+----+----+----+----+                   +
|                                       |
~               .   .   .               ~
|                                       |
+----+----+----+----+----+----+----+----+
|          Padding as necessary         |
+              (< B bytes)              +
|                                       |
~               .   .   .               ~
|                                       |
+----+----+----+----+----+----+----+----+

Special case for time synchronization:

+----+----+----+----+----+----+----+----+
|  size=0 | padLen  |     timestamp     |
+----+----+----+----+----+----+----+----+
|          Padding as necessary         |
+              (< B bytes)              +
|                                       |
~               .   .   .               ~
|                                       |
+----+----+----+----+----+----+----+----+

Future work

  • The padding length is either to be decided on a per-message basis and estimates of the length distribution, or random delays should be added. These countermeasures are to be included to resist DPI, as message sizes would otherwise reveal that I2P traffic is being carried by the transport protocol. The exact padding scheme is an area of future work, Appendix A provides more information on the topic.

Version detection

NTCP and NTCP2 can run on the same port, but the supported protocol versions should be advertised in the RouterAddress.

The RouterAddress transport identifier is "NTCP" for both protocol versions. Routers would publish "ver=1,2" in the RouterAddress (not the RouterInfo) if they support both NTCP 1 and NTCP 2 on the same port. "ver=1" is NTCP 1 only. This is the default if no "ver" is present.

"ver=2" is NTCP 2 only. This can't be used for a long time, as it's not backwards-compatible. But sometime in the future, implementers could support version 2 only.

If new versions are added, this should also be indicated using the "ver" flag in the RouterAddress.

To detect the version of an incoming NTCP connection, Bob proceeds as follows:

  • Decrypt the first 16 bytes of the SessionRequest packet using AES-256 with key RH_B.
  • Check whether the first 2 bytes match a meaningful version number. This fails with probability N / 2^16, where N is the number of protocol versions.
  • If ver = 2, additionally check whether the last 4 bytes are all zero. This fails with probability 1 / 2^24, such that errors are very unlikely. For ver > 2, the procedure will be similar unless the reserved bytes are used.

Appendix A: Padding Scheme

This section discusses an attack on typical padding schemes that allows to reveal the probability distribution of the length of the unpadded messages, by only observing the length of the padded messages. Let N be a random variable describing the number of unpadded bytes, and P likewise for the number of padding bytes. The total message size is then N + P.

Assume that for an unpadded size of n, at least P_min(n) >= 0 and at most P_max(n) >= P_min(n) bytes of padding are added in a padding scheme. The obvious scheme uses padding of length P uniformly chosen at random:

Pr[P = p | N = n] = 1 / (P_max(n) - P_min(n)) if P_min(n) <= p <= P_max(n),
                    0                         otherwise.

A naive padding scheme would simply ensure that the size of the padded message does not exceed N_max:

P_max(n) = N_max - n, n <= N_max
P_min(n) = 0.

However, this leaks information about the unpadded length.

An attacker can easily estimate Pr[x <= N + P <= y], for example by means of a histogram.

  • From this, he can also try to estimate Pr[n_1 <= N <= n_2], indeed:
Pr[N + P = m] = Σ_n Pr[N = n] Pr[P = m - n | N = n].

In the naive scheme,

Pr[N + P = m] = Σ_{n <= m} Pr[N = n] / (N_max - n).

It's pretty obvious, as it was before doing the above calculation, that this leaks information about Pr[N = n]: if the length of packets is almost always more than m, then N + P <= m will almost never be observed. This is not the largest issue though, although being able to observe the minimum message length can be considered to be a problem by itself.

A bigger issue is that it is possible to determine Pr[N = n] exactly:

Pr[N + P = m] - Pr[N + P = m-1] = Pr[N = m] / (N_max - m),

that is

Pr[N = n] = (N_max - n)(Pr[N + P = n] - Pr[N + P = n - 1])

To distinguish NTCP2, then, the attacker can use any of the following:

  • Estimate Pr[kB <= N <= (k + 1)B - 1] for positive integers k. It will always be zero for NTCP2.
  • Estimate Pr[N = kB] and compare with a standard I2P profile.

This simple attack hence partially destroys the purpose of padding, which attempts to obfuscate the size distribution of the unpadded messages. The amount of messages that the attacker has to observe to distinguish the protocol depends on the desired accuracy and on the minimum and maximum unpadded message sizes that occur in practice. Note that it is easy to gather many messages for the attacker, since he can use all traffic sent from and to the particular port that the target is using.

In some forms (e.g. estimation of Pr[kB <= N <= (k + 1)B - 1]) the attack requires only a few bytes of memory (one integer is enough) and it could be argued that such an attack might be included in many slightly more advanced but nevertheless standard DPI frameworks.

This proposal suggests using one of the following countermeasures:

  • Develop an alternate padding scheme that takes into account the (estimated) distribution of N by using a non-uniform padding length distribution. A good padding scheme would probably require maintaining a histogram of the number of blocks per message.
  • Add random delays between (randomly sized) fragments of messages.

The second option is more generally preferred, because it can be simultaneously used as a countermeasure against flow analysis. However, such delays may be out of scope for the NTCP2 protocol, such that the first option, which is also easier to implement, may be preferred instead.

Appendix B: Authenticated Cipher Candidates

1) ChaCha20 or Salsa20/Poly1305

Encrypted and authenticated data format:

+----+----+----+----+----+----+----+----+
|                                       |
+    Random nonce   +----+----+----+----+
|                   |                   |
+----+----+----+----+                   +
|                                       |
+   Poly1305 Tag    +----+----+----+----+
|                   |                   |
+----+----+----+----+                   +
|                                       |
+    ChaCha20/Salsa20 encrypted data    +
|                                       |
~               .   .   .               ~
|                                       |
+----+----+----+----+----+----+----+----+

For ChaCha20, what is described here corresponds to [RFC-7539], which is also used similarly in TLS [RFC-7905].

Notes

  • Since Salsa20 and ChaCha20 are stream ciphers, plaintexts need not be padded. Additional keystream bytes are discarded.
  • The key for the cipher (256 bits) is agreed upon by means of the KDF defined by the KE field. The one-time key for Poly1305 is generated pseudorandomly as in [RFC-7539], i.e. using the Salsa20 or the ChaCha20 block function.

Future work

  • Decide on using Salsa20 or ChaCha20
  • Do not generate the full nonce at random every time.

2) AES-CBC-256/HMAC-SHA1

To be specified.

3) AES-GCM-256

Encrypted and authenticated data format:

+----+----+----+----+----+----+----+----+
|                                       |
+                GMAC Tag               +
|                                       |
+----+----+----+----+----+----+----+----+
|                                       |
+       AES-GCM-256 encrypted data      +
|                                       |
~               .   .   .               ~
|                                       |
+----+----+----+----+----+----+----+----+

The IV used in AES-GCM-256 equals the last 12 bytes of the last encrypted block of the previously sent message.

The initial IV is contained in the first encrypted and authenticated message:

+----+----+----+----+----+----+----+----+
|                                       |
+     Random IV     +----+----+----+----+
|                   |                   |
+----+----+----+----+                   +
|                                       |
+     GMAC Tag      +----+----+----+----+
|                   |                   |
+----+----+----+----+                   +
|                                       |
+       AES-GCM-256 encrypted data      +
|                                       |
~               .   .   .               ~
|                                       |
+----+----+----+----+----+----+----+----+

Notes

  • GCM does not require the IV to be random, it only needs to be unique. This justifies the use of the last 12 bytes of the last encrypted block of the previous message as the IV.
  • "Associated data" is not used, i.e. all data in the AES-GCM-256 block is both encrypted and authenticated.

Appendix C: AEAD Cipher Selection

Which one?

  • ChaCha20/Poly1305
  • IETF implementation [5]
  • AES-GCM

Performance [4]

  • ChaCha20/Poly1305 256 -> 90MBps on phone hardware
  • AES-GCM 256 -> 20MBps on phone hardware

General comments:

  • AES-GCM is potentially more vulnerable to cache timing attacks for software implementations due to using lookup tables [1]
  • AES seems to be universally considered unpleasant [2]
  • AES-GCM is vulnerable to nonce re-use attacks [2]
  • ChaCha20/Poly1305 is not vulnerable to nonce re-use attacks due to fully implicit nonce based on record number, if implemented as in TLS 1.3 [2]
  • Poly1305/ChaCha20 is considered secure if nonces are handled properly [3]

So based on these facts, ChaCha20/Poly1305 seems like the option that is considered better by the cryptographer community.

References

[CAESAR]https://competitions.cr.yp.to/caesar.html
[IACR-1150]https://eprint.iacr.org/2015/1150
[NetDB]https://geti2p.net/en/docs/how/network-database
[NTCP](1, 2) https://geti2p.net/en/docs/transport/ntcp
[Prop104](1, 2) https://geti2p.net/spec/proposals/104-tls-transport
[Prop109]https://geti2p.net/spec/proposals/109-pt-transport
[RFC-2104]https://tools.ietf.org/html/rfc2104
[RFC-3526]https://tools.ietf.org/html/rfc3526
[RFC-6151]https://tools.ietf.org/html/rfc6151
[RFC-7539](1, 2) https://tools.ietf.org/html/rfc7539
[RFC-7905]https://tools.ietf.org/html/rfc7905
[RouterAddress]https://geti2p.net/spec/common-structures#struct-routeraddress
[RouterIdentity](1, 2) https://geti2p.net/spec/common-structures#struct-routeridentity
[SIDH]De Feo, Luca; Jao, Plut., Towards quantum-resistant cryptosystems from supersingular elliptic curve isogenies
[SigningPublicKey]https://geti2p.net/spec/common-structures#type-signingpublickey
[SSU](1, 2) https://geti2p.net/en/docs/transport/ssu
[STS](1, 2) Diffie, W.; van Oorschot P. C.; Wiener M. J., Authentication and Authenticated Key Exchanges
[Ticket1112]https://trac.i2p2.de/ticket/1112
[Ticket1849]https://trac.i2p2.de/ticket/1849
[1]http://www.chesworkshop.org/ches2009/presentations/01_Session_1/CHES2009_ekasper.pdf
[2](1, 2, 3) https://www.blackhat.com/docs/us-16/materials/us-16-Devlin-Nonce-Disrespecting-Adversaries-Practical-Forgery-Attacks-On-GCM-In-TLS.pdf
[3]https://eprint.iacr.org/2014/613.pdf
[4]https://www.imperialviolet.org/2013/10/07/chacha20.html
[5]https://tools.ietf.org/html/rfc7539