fix: use pub_key for PeerId equality to fix NAT connection lookup #2163

sanity · 2025-11-27T04:11:26Z

Summary

This PR improves PeerId semantics by basing equality/hashing on public key rather than socket address.

Status: Draft - This is a defensive correctness fix but does NOT resolve the PUT timeout issue under Docker NAT that motivated the investigation.

Changes

Change PeerId Hash/PartialEq to use pub_key instead of addr
Remove the PEER_ID thread-local cache (each random() call now generates unique keypairs)
Update Ord implementation with a comment clarifying it's only for ordering, not equality
Fix runtime panic in freenet-test-network Docker cleanup (use try_current() instead of creating nested runtime)

Rationale

Even though this doesn't fix the PUT timeout:

Semantic correctness: The public key IS the cryptographic identity - addresses are mutable (especially under NAT)
Defensive fix: Prevents potential collisions if two peers ever share an address
Future-proofing: As NAT handling evolves, address-based equality could cause subtle bugs

Testing

Unit tests pass
Docker NAT connectivity test passes (100% connectivity achieved)
River message flow test still fails - PUT timeouts persist, indicating a different root cause

Next Steps

The real PUT timeout issue is still being investigated. The failure occurs even with this fix applied, suggesting the root cause is elsewhere in the PUT operation flow.

🤖 Generated with Claude Code

…n completion When a PUT operation completes with subscribe=true, a child subscription operation is spawned. Previously, the response waited for this subscription to complete before returning to the client, causing PUT timeouts. Sub-operations like subscriptions are "fire and forget" from the client's perspective - they want to know their PUT succeeded, not wait for the subscription to complete. This change returns the finalized state immediately regardless of pending sub-operations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

PeerId equality was based on socket address, causing connection lookup failures when peers behind NAT report different addresses than what the gateway observes. Changed Hash/PartialEq to use pub_key, which is the stable identifier for a peer regardless of NAT. Also updated PeerId::random() and Arbitrary impl to generate unique keypairs, since distinct peers must have distinct pub_keys for proper equality semantics. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

When a NAT peer receives its externally-observed address from a gateway during the connect handshake, update the peer's own PeerId.addr field. Without this fix, NAT peers embed their local address (127.0.0.1) in operation messages. When remote peers receive these messages and try to send responses back, they attempt to connect to 127.0.0.1 which fails. This complements the recent PeerId equality fix (pub_key based) - that fix allows lookups to succeed regardless of address, but we still need the correct address for establishing new connections. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

When a peer receives a PutForward and is the final destination (e.g., has no ring location yet), try_to_broadcast was hitting the catch-all error case because no prior operation state existed. This fix adds a specific match arm to handle this scenario by sending SuccessfulPut back to the upstream peer. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

iduartgomez · 2025-11-27T08:37:18Z

crates/core/src/node/mod.rs

    }
 }

-thread_local! {


the reason this exists is because for many tests where we dont care about the peerid (e.g. no integration or e2e tests) generating a large amount of peers is dramatic (comptutationally expensive). For the cases we care about we use the explicit PeerId::random

Arbitrary is used for the most part for unit tests. Be careful with the impact of this change.

Good point. I investigated this and found that the only problematic usage was in aof.rs tests, which generate 10k-100k log entries, each previously getting a unique PeerId. Those tests validate log serialization/deserialization—they don't actually need unique peer identities.

I've pushed a fix that reuses a single PeerId per test in the three affected functions (test_aof_read_write_complex, test_aof_complex_reconstruction, test_aof_sequential_ids_reconstruction). All other usages in the codebase are small counts (5-45 peers per test), where the overhead is negligible.

With this fix, the cache is no longer needed.

[AI-assisted - Claude]

These tests validate log I/O, not peer identity. Generating 10k-100k unique keypairs was unnecessary and would slow tests significantly after the PeerId::random() change to generate unique keys. [AI-assisted - Claude]

sanity · 2025-11-29T17:25:02Z

Closing this PR as superseded by the PR stack #2167 → #2169 → #2171 → #2172.

That stack takes a more comprehensive approach to NAT routing:

Introduces PeerAddr::Known/Unknown enum (peers behind NAT have Unknown addresses)
Adds ObservedAddr newtype for externally-observed addresses
Uses upstream_addr for connection-based routing

This keeps identity (pub_key) and routing (observed address) as separate concerns rather than changing PeerId equality semantics globally.

[AI-assisted - Claude]

sanity and others added 2 commits November 26, 2025 20:55

sanity marked this pull request as draft November 27, 2025 04:20

sanity and others added 2 commits November 26, 2025 23:05

iduartgomez reviewed Nov 27, 2025

View reviewed changes

perf: reuse single PeerId in aof tests

1a920e2

These tests validate log I/O, not peer identity. Generating 10k-100k unique keypairs was unnecessary and would slow tests significantly after the PeerId::random() change to generate unique keys. [AI-assisted - Claude]

sanity closed this Nov 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: use pub_key for PeerId equality to fix NAT connection lookup #2163

fix: use pub_key for PeerId equality to fix NAT connection lookup #2163

Uh oh!

sanity commented Nov 27, 2025 •

edited

Loading

Uh oh!

iduartgomez Nov 27, 2025

Uh oh!

sanity Nov 28, 2025

Uh oh!

sanity commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

fix: use pub_key for PeerId equality to fix NAT connection lookup #2163

fix: use pub_key for PeerId equality to fix NAT connection lookup #2163

Uh oh!

Conversation

sanity commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Rationale

Testing

Next Steps

Uh oh!

iduartgomez Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

sanity Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

sanity commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sanity commented Nov 27, 2025 •

edited

Loading