Skip to content

Conversation

@sanity
Copy link
Collaborator

@sanity sanity commented Nov 27, 2025

Summary

This PR improves PeerId semantics by basing equality/hashing on public key rather than socket address.

Status: Draft - This is a defensive correctness fix but does NOT resolve the PUT timeout issue under Docker NAT that motivated the investigation.

Changes

  • Change PeerId Hash/PartialEq to use pub_key instead of addr
  • Remove the PEER_ID thread-local cache (each random() call now generates unique keypairs)
  • Update Ord implementation with a comment clarifying it's only for ordering, not equality
  • Fix runtime panic in freenet-test-network Docker cleanup (use try_current() instead of creating nested runtime)

Rationale

Even though this doesn't fix the PUT timeout:

  1. Semantic correctness: The public key IS the cryptographic identity - addresses are mutable (especially under NAT)
  2. Defensive fix: Prevents potential collisions if two peers ever share an address
  3. Future-proofing: As NAT handling evolves, address-based equality could cause subtle bugs

Testing

  • Unit tests pass
  • Docker NAT connectivity test passes (100% connectivity achieved)
  • River message flow test still fails - PUT timeouts persist, indicating a different root cause

Next Steps

The real PUT timeout issue is still being investigated. The failure occurs even with this fix applied, suggesting the root cause is elsewhere in the PUT operation flow.

🤖 Generated with Claude Code

sanity and others added 2 commits November 26, 2025 20:55
…n completion

When a PUT operation completes with subscribe=true, a child subscription
operation is spawned. Previously, the response waited for this subscription
to complete before returning to the client, causing PUT timeouts.

Sub-operations like subscriptions are "fire and forget" from the client's
perspective - they want to know their PUT succeeded, not wait for the
subscription to complete. This change returns the finalized state immediately
regardless of pending sub-operations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
PeerId equality was based on socket address, causing connection lookup
failures when peers behind NAT report different addresses than what the
gateway observes. Changed Hash/PartialEq to use pub_key, which is the
stable identifier for a peer regardless of NAT.

Also updated PeerId::random() and Arbitrary impl to generate unique
keypairs, since distinct peers must have distinct pub_keys for proper
equality semantics.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@sanity sanity marked this pull request as draft November 27, 2025 04:20
sanity and others added 2 commits November 26, 2025 23:05
When a NAT peer receives its externally-observed address from a gateway
during the connect handshake, update the peer's own PeerId.addr field.

Without this fix, NAT peers embed their local address (127.0.0.1) in
operation messages. When remote peers receive these messages and try to
send responses back, they attempt to connect to 127.0.0.1 which fails.

This complements the recent PeerId equality fix (pub_key based) - that
fix allows lookups to succeed regardless of address, but we still need
the correct address for establishing new connections.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
When a peer receives a PutForward and is the final destination (e.g., has
no ring location yet), try_to_broadcast was hitting the catch-all error
case because no prior operation state existed. This fix adds a specific
match arm to handle this scenario by sending SuccessfulPut back to the
upstream peer.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
}
}

thread_local! {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason this exists is because for many tests where we dont care about the peerid (e.g. no integration or e2e tests) generating a large amount of peers is dramatic (comptutationally expensive). For the cases we care about we use the explicit PeerId::random

Arbitrary is used for the most part for unit tests. Be careful with the impact of this change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I investigated this and found that the only problematic usage was in aof.rs tests, which generate 10k-100k log entries, each previously getting a unique PeerId. Those tests validate log serialization/deserialization—they don't actually need unique peer identities.

I've pushed a fix that reuses a single PeerId per test in the three affected functions (test_aof_read_write_complex, test_aof_complex_reconstruction, test_aof_sequential_ids_reconstruction). All other usages in the codebase are small counts (5-45 peers per test), where the overhead is negligible.

With this fix, the cache is no longer needed.

[AI-assisted - Claude]

These tests validate log I/O, not peer identity. Generating 10k-100k
unique keypairs was unnecessary and would slow tests significantly
after the PeerId::random() change to generate unique keys.

[AI-assisted - Claude]
@sanity
Copy link
Collaborator Author

sanity commented Nov 29, 2025

Closing this PR as superseded by the PR stack #2167#2169#2171#2172.

That stack takes a more comprehensive approach to NAT routing:

  • Introduces PeerAddr::Known/Unknown enum (peers behind NAT have Unknown addresses)
  • Adds ObservedAddr newtype for externally-observed addresses
  • Uses upstream_addr for connection-based routing

This keeps identity (pub_key) and routing (observed address) as separate concerns rather than changing PeerId equality semantics globally.

[AI-assisted - Claude]

@sanity sanity closed this Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants