Skip to content

Commit d9f46e6

Browse files
committed
add autonat v2 spec (#538)
1 parent 25eabc9 commit d9f46e6

7 files changed

Lines changed: 347 additions & 3 deletions

autonat/README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# NAT Discovery <!-- omit in toc -->
2+
> How we detect if we're behind a NAT.
3+
4+
5+
Specifications:
6+
- [autonat v1](autonat-v1.md)
7+
- [autonat v2](autonat-v2.md)

autonat/autonat-v1.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,3 @@
1-
# NAT Discovery <!-- omit in toc -->
2-
> How we detect if we're behind a NAT.
3-
41
| Lifecycle Stage | Maturity | Status | Latest Revision |
52
|-----------------|----------------|--------|-----------------|
63
| 3A | Recommendation | Active | r1, 2023-02-16 |
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
@startuml
2+
participant Cli
3+
participant Srv
4+
5+
skinparam sequenceMessageAlign center
6+
skinparam defaultFontName monospaced
7+
8+
9+
== Amplification Attack Prevention ==
10+
11+
Cli -> Srv: [conn1: stream: dial] DialRequest:{nonce: 0xabcd, addrs: (addr1, addr2, addr3)}
12+
Srv -> Cli: [conn1: stream: dial] DialDataRequest:{addrIdx: 1, numBytes: 120k}
13+
Cli -> Srv: [conn1: stream: dial] DialDataResponse:{data: 4k bytes},DialDataResponse:{data: 4k bytes},...
14+
Srv -> Cli: [conn2: stream: dial-back]addr2 DialBack:{nonce: 0xabcd}
15+
Srv -> Cli: [conn1: stream: dial] DialResponse:{status: OK, addrIdx: 1, dialStatus: DIAL_STATUS_OK}
16+
@enduml
Lines changed: 1 addition & 0 deletions
Loading

autonat/autonat-v2.md

Lines changed: 301 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,301 @@
1+
# AutonatV2: spec
2+
3+
4+
| Lifecycle Stage | Maturity | Status | Latest Revision |
5+
|-----------------|--------------------------|--------|-----------------|
6+
| 1A | Working Draft | Active | r2, 2023-04-15 |
7+
8+
Authors: [@sukunrt]
9+
10+
Interest Group: [@marten-seemann], [@marcopolo], [@mxinden]
11+
12+
[@sukunrt]: https://github.com/sukunrt
13+
[@marten-seemann]: https://github.com/marten-seemann
14+
[@mxinden]: https://github.com/mxinden
15+
[@marcopolo]: https://github.com/marcopolo
16+
17+
18+
## Overview
19+
20+
A priori, a node cannot know if it is behind a NAT / firewall or if it is
21+
publicly reachable. Moreover, the node may be publicly reachable on some of its
22+
addresses and not on others. Knowing reachability for its addresses is essential
23+
for the node to be well-behaved in the network: A node doesn't need to advertise
24+
its unreachable addresses to the rest of the network, preventing superfluous
25+
dials from other peers. Furthermore, in case it has no publicly reachable
26+
addresses, it might actively seek to improve its connectivity by finding a relay
27+
server, which would allow other peers to establish a relayed connection.
28+
29+
In `autonat v2` client sends a request with a priority ordered list of addresses
30+
and a nonce. On receiving this request the server dials the first address in the
31+
list that it is capable of dialing and provides the nonce. Upon completion of
32+
the dial, the server responds to the client with the response containing the
33+
dial outcome.
34+
35+
As the server dials _exactly_ one address from the list, `autonat v2` allows
36+
nodes to determine reachability for individual addresses. Using `autonat v2`
37+
nodes can build an address pipeline where they can test individual addresses
38+
discovered by different sources like identify, upnp mappings, circuit addresses
39+
etc for reachability. Having a priority ordered list of addresses provides the
40+
ability to verify low priority addresses. Implementations can generate low
41+
priority address guesses and add them to requests for high priority addresses as
42+
a nice to have. This is especially helpful when introducing a new transport.
43+
Initially, such a transport will not be widely supported in the network.
44+
Requests for verifying such addresses can be reused to get information about
45+
other addresses
46+
47+
The client can verify the server did successfully dial an address of the same
48+
transport as it reported in the response by checking the local address of the
49+
connection on which the nonce was received on.
50+
51+
Compared to `autonat v1` there are three major differences
52+
1. `autonat v1` allowed testing reachability for the node. `autonat v2` allows
53+
testing reachability for an individual address.
54+
2. `autonat v2` provides a mechanism for nodes to verify whether the peer
55+
actually successfully dialled an address.
56+
3. `autonat v2` provides a mechanism for nodes to dial an IP address different
57+
from the requesting node's observed IP address without risking amplification
58+
attacks. `autonat v1` disallowed such dials to prevent amplification attacks.
59+
60+
61+
## AutoNAT V2 Protocol
62+
63+
![Autonat V2 Interaction](autonat-v2.svg)
64+
65+
A client node wishing to determine reachability of its addresses sends a
66+
`DialRequest` message to a server on a stream with protocol ID
67+
`/libp2p/autonat/2/dial-request`. Each `DialRequest` is sent on a new stream.
68+
69+
This `DialRequest` message has a list of addresses and a fixed64 `nonce`. The
70+
list is ordered in descending order of priority for verification. AutoNAT V2 is
71+
primarily for testing reachability on Public Internet. Client SHOULD NOT send any
72+
private address as defined in [RFC
73+
1918](https://datatracker.ietf.org/doc/html/rfc1918#section-3) in the list. The Server SHOULD NOT dial any private address.
74+
75+
Upon receiving this request, the server selects an address from the list to
76+
dial. The server SHOULD use the first address it is willing to dial. The server
77+
MUST NOT dial any address other than this one. If this selected address has an
78+
IP address different from the requesting node's observed IP address, server
79+
initiates the Amplification attack prevention mechanism (see [Amplification
80+
Attack Prevention](#amplification-attack-prevention) ). On completion, the
81+
server proceeds to the next step. If the selected address has the same IP
82+
address as the client's observed IP address, server proceeds to the next step
83+
skipping Amplification Attack Prevention steps.
84+
85+
The server dials the selected address, opens a stream with Protocol ID
86+
`/libp2p/autonat/2/dial-back` and sends a `DialBack` message with the nonce
87+
received in the request. The client on receiving this message replies with
88+
a `DialBackResponse` message with the status set to `OK`. The client MUST
89+
close this stream after sending the response. The dial back response provides
90+
the server assurance that the message was delivered so that it can close the
91+
connection.
92+
93+
Upon completion of the dial back, the server sends a `DialResponse` message to
94+
the client node on the `/libp2p/autonat/2/dial-request` stream. The response
95+
contains `addrIdx`, the index of the address the server selected to dial and
96+
`DialStatus`, a dial status indicating the outcome of the dial back. The
97+
`DialStatus` for an address is set according to [Requirements for
98+
DialStatus](#requirements-for-dialstatus). The response also contains an
99+
appropriate `ResponseStatus` set according to [Requirements For
100+
ResponseStatus](#requirements-for-responsestatus).
101+
102+
The client MUST check that the nonce received in the `DialBack` is the same as
103+
the nonce it sent in the `DialRequest`. If the nonce is different, it MUST
104+
discard this response.
105+
106+
The server MUST close the stream after sending the response. The client MUST
107+
close the stream after receiving the response.
108+
109+
110+
### Requirements for DialStatus
111+
112+
On receiving a `DialRequest`, the server first selects an address that it will
113+
dial.
114+
115+
If server chooses to not dial any of the requested addresses, `ResponseStatus`
116+
is set to `E_DIAL_REFUSED`. The fields `addrIdx` and `DialStatus` are
117+
meaningless in this case. See [Requirements For
118+
ResponseStatus](#requirements-for-responsestatus).
119+
120+
If the server selects an address for dialing, `addrIdx` is set to the
121+
index(zero-based) of the address on the list and the `DialStatus` is set
122+
according to the following consideration:
123+
124+
If the server was unable to connect to the client on the selected address,
125+
`DialStatus` is set to `E_DIAL_ERROR`, indicating the selected address is not
126+
publicly reachable.
127+
128+
If the server was able to connect to the client on the selected address, but an
129+
error occured while sending an nonce on the `/libp2p/autonat/2/dial-back`
130+
stream, `DialStatus` is set to `E_DIAL_BACK_ERROR`. This might happen in case of
131+
resource limited situations on client or server, or when either the client or
132+
the server is misconfigured.
133+
134+
If the server was able to connect to the client and successfully send a nonce on
135+
the `/libp2p/autonat/2/dial-back` stream, `DialStatus` is set to `OK`.
136+
137+
### Requirements for ResponseStatus
138+
139+
The `ResponseStatus` sent by the server in the `DialResponse` message MUST be
140+
set according to the following requirements
141+
142+
`E_REQUEST_REJECTED`: The server didn't serve the request because of rate
143+
limiting, resource limit reached or blacklisting.
144+
145+
`E_DIAL_REFUSED`: The server didn't dial back any address because it was
146+
incapable of dialing or unwilling to dial any of the requested addresses.
147+
148+
`E_INTERNAL_ERROR`: Error not classified within the above error codes occured on
149+
server preventing it from completing the request.
150+
151+
`OK`: The server completed the request successfully. A request is considered
152+
a success when the server selects an address to dial and dials it, successfully or unsuccessfully.
153+
154+
Implementations MUST discard responses with status codes they do not understand.
155+
156+
### Amplification Attack Prevention
157+
158+
![Interaction](autonat-v2-amplification-attack-prevention.svg)
159+
160+
When a client asks a server to dial an address that is not the client's observed
161+
IP address, the server asks the client to send some non trivial amount of bytes
162+
as a cost to dial a different IP address. To make amplification attacks
163+
unattractive, servers SHOULD ask for 30k to 100k bytes. Since most handshakes
164+
cost less than 10k bytes in bandwidth, 30kB is sufficient to make attacks
165+
unattractive.
166+
167+
On receiving a `DialRequest`, the server selects the first address it is capable
168+
of dialing. If this selected address has a IP different from the client's
169+
observed IP, the server sends a `DialDataRequest` message with the selected
170+
address's index(zero-based) and `numBytes` set to a sufficiently large value on
171+
the `/libp2p/autonat/2/dial-request` stream
172+
173+
Upon receiving a `DialDataRequest` message, the client decides whether to accept
174+
or reject the cost of dial. If the client rejects the cost, the client resets
175+
the stream and the `DialRequest` is considered aborted. If the client accepts
176+
the cost, the client starts transferring `numBytes` bytes to the server. The
177+
client transfers these bytes wrapped in `DialDataResponse` protobufs where the
178+
`data` field in each individual protobuf is limited to 4096 bytes in length.
179+
This allows implementations to use a small buffer for reading and sending the
180+
data. Only the size of the `data` field of `DialDataResponse` protobufs is
181+
counted towards the bytes transferred. Once the server has received at least
182+
numBytes bytes, it proceeds to dial the selected address. Servers SHOULD allow
183+
the last `DialDataResponse` message received from the client to be larger than
184+
the minimum required amount. This allows clients to serialize their
185+
`DialDataResponse` message once and reuse it for all Requests.
186+
187+
If an attacker asks a server to dial a victim node, the only benefit the
188+
attacker gets is forcing the server and the victim to do a cryptographic
189+
handshake which costs some bandwidth and compute. The attacker by itself can do
190+
a lot of handshakes with the victim without spending any compute by using the
191+
same key repeatedly. The only benefit of going via the server to do this attack
192+
is not spending bandwidth required for a handshake. So the prevention mechanism
193+
only focuses on bandwidth costs. There is a minor benefit of bypassing IP
194+
blocklists, but that's made unattractive by the fact that servers may ask 5x
195+
more data than the bandwidth cost of a handshake.
196+
197+
#### Related Work
198+
199+
UDP based protocol's, like QUIC and DNS-over-UDP, need to prevent similar amplification attacks caused by IP spoofing. To verify that received packets don't have a spoofed IP, the server sends a random token to the client, which echoes the token back. For example, in QUIC, an attacker can use the victim's IP in the initial packet to make it process a much larger `ServerHello` packet. QUIC servers use a Retry Packet containing a token to validate that the client can receive packets at the address it claims. See [QUIC Address Validation](https://datatracker.ietf.org/doc/html/rfc9000#name-address-validation) for details of the scheme.
200+
201+
## Implementation Suggestions
202+
203+
For any given address, client implementations SHOULD do the following
204+
- Periodically recheck reachability status.
205+
- Query multiple servers to determine reachability.
206+
207+
The suggested heuristic for implementations is to consider an address reachable
208+
if more than 3 servers report a successful dial and to consider an address
209+
unreachable if more than 3 servers report unsuccessful dials. Implementations
210+
are free to use different heuristics than this one
211+
212+
Servers SHOULD NOT reuse their listening port when making a dial back. In case
213+
the client has reused their listen port when dialing out to the server, not
214+
reusing the listen port for attempts prevents accidental hole punches. Clients
215+
SHOULD only rely on the nonce and not on the peerID for verifying the dial back
216+
as the server is free to use a separate peerID for the dial backs.
217+
218+
Servers SHOULD determine whether they have IPv6 and IPv4 connectivity. IPv4 only servers SHOULD refuse requests for dialing IPv6 addresses and IPv6 only
219+
servers SHOULD refuse requests for dialing IPv4 addresses.
220+
221+
222+
## RPC Messages
223+
224+
All RPC messages sent over a stream are prefixed with the message length in
225+
bytes, encoded as an unsigned variable length integer as defined by the
226+
[multiformats unsigned-varint spec][uvarint-spec].
227+
228+
All RPC messages on stream `/libp2p/autonat/2/dial-request` are of type
229+
`Message`. A `DialRequest` message is sent as a `Message` with the `msg` field
230+
set to `DialRequest`. `DialResponse` and `DialDataRequest` are handled
231+
similarly.
232+
233+
On stream `/libp2p/autonat/2/dial-back`, a `DialAttempt` message is sent
234+
directly
235+
236+
```proto3
237+
238+
message Message {
239+
oneof msg {
240+
DialRequest dialRequest = 1;
241+
DialResponse dialResponse = 2;
242+
DialDataRequest dialDataRequest = 3;
243+
DialDataResponse dialDataResponse = 4;
244+
}
245+
}
246+
247+
248+
message DialRequest {
249+
repeated bytes addrs = 1;
250+
fixed64 nonce = 2;
251+
}
252+
253+
254+
message DialDataRequest {
255+
uint32 addrIdx = 1;
256+
uint64 numBytes = 2;
257+
}
258+
259+
260+
enum DialStatus {
261+
UNUSED = 0;
262+
E_DIAL_ERROR = 100;
263+
E_DIAL_BACK_ERROR = 101;
264+
OK = 200;
265+
}
266+
267+
268+
message DialResponse {
269+
enum ResponseStatus {
270+
E_INTERNAL_ERROR = 0;
271+
E_REQUEST_REJECTED = 100;
272+
E_DIAL_REFUSED = 101;
273+
OK = 200;
274+
}
275+
276+
ResponseStatus status = 1;
277+
uint32 addrIdx = 2;
278+
DialStatus dialStatus = 3;
279+
}
280+
281+
282+
message DialDataResponse {
283+
bytes data = 1;
284+
}
285+
286+
287+
message DialBack {
288+
fixed64 nonce = 1;
289+
}
290+
291+
message DialBackResponse {
292+
enum DialBackStatus {
293+
OK = 0;
294+
}
295+
296+
DialBackStatus status = 1;
297+
}
298+
```
299+
300+
[uvarint-spec]: https://github.com/multiformats/unsigned-varint
301+

0 commit comments

Comments
 (0)