Skip to content

refactor: Restructure peer identity and address handling #2164

@sanity

Description

@sanity

Peer Identity and Address Restructuring

Problem Statement

Current architecture conflates peer identity (pub_key) with network address (SocketAddr) in PeerId. This causes:

  1. Peers embed their own (often incorrect) address in messages
  2. NAT peers report loopback/unspecified addresses before learning their external address
  3. Connection tracking uses PeerId as key, but transport layer identifies by SocketAddr
  4. Workarounds like observed_addr and address correction logic in message handling

Motivation: This refactor was motivated by failures in the Docker NAT test (FREENET_TEST_DOCKER_NAT=1 cargo test --test message_flow river_message_flow_over_freenet) where subscribe operations failed because NAT peers embedded incorrect loopback addresses in messages.

Design Principles

  1. Identity vs Address separation: pub_key is the true identity; SocketAddr is "how to reach them now"
  2. Connections keyed by address: Transport identifies packets by source address
  3. Location derived from address: Ring location is a function of address, computed on demand
  4. Request-response routing: Responses route back over the same connection the request came from - no embedded addresses needed
  5. Connect is special: Only the Connect protocol needs embedded addresses for NAT hole-punching (establishing new direct connections)

Key Insight: Two Categories of Operations

Category 1: Request-Response (Get, Put, Subscribe, Update)

  • Request travels hop-by-hop toward target
  • Response travels back the same path via connection-based routing (each hop maintains operation state tracking which connection the request came from)
  • No embedded originator address needed - each hop just responds to whoever sent the request
  • The "sender" at each hop is just the packet source

Category 2: Connection Establishment (Connect only)

  • Joiner's address must travel to acceptor for NAT hole-punching
  • Acceptor's address must travel back to joiner
  • Embedded addresses required because the goal is to establish a new direct connection, not respond over existing path
  • Use PeerAddr::Unknown when originator creates message, first recipient fills from packet source

Proposed Type Hierarchy

/// Explicit representation of peer address state
pub enum PeerAddr {
    /// Address unknown - will be filled in by first recipient from packet source
    Unknown,
    /// Known address  
    Known(SocketAddr),
}

/// Peer info for routing - identity + address
pub struct PeerKeyLocation {
    pub pub_key: TransportPublicKey,  // The true peer identity
    pub addr: PeerAddr,                // Unknown = "fill in from packet source"
}

impl PeerKeyLocation {
    /// Location is derived from address, computed on demand
    pub fn location(&self) -> Option<Location> {
        match &self.addr {
            PeerAddr::Known(addr) => Some(Location::from_address(addr)),
            PeerAddr::Unknown => None,
        }
    }
}

Key changes:

  • PeerId removed - was conflating identity with address
  • Use TransportPublicKey directly (no wrapper type needed)
  • addr uses PeerAddr enum for explicit semantics (Unknown vs Known)
  • location removed as field - now a getter that computes from addr

Note on Ord: Current PeerId implements Ord by address as a workaround because TransportPublicKey doesn't implement Ord. The BTreeMap<PeerId, Location> usages can be changed to HashMap<TransportPublicKey, ...> since the ordering isn't semantically important.

Connection Management Changes

p2p_protoc.rs

// Before
connections: HashMap<PeerId, PeerConnChannelSender>

// After
connections: HashMap<SocketAddr, ConnectionEntry>

struct ConnectionEntry {
    sender: PeerConnChannelSender,
    pub_key: Option<TransportPublicKey>,  // Learned from first message
}

⚠️ Race Condition Caution: The current code has careful synchronization around connection access (see TOCTOU handling at lines 380-401). The refactoring must preserve these guarantees when changing the HashMap key.

connection_manager.rs

// Address-based lookups (for skip lists - track addresses we've tried)
skip_lists: HashSet<SocketAddr>,

// Peer info by address (changed from BTreeMap to HashMap - ordering not needed)
peers_by_addr: HashMap<SocketAddr, PeerKeyLocation>,

// Reverse lookup: identity to current address
addr_by_pub_key: HashMap<TransportPublicKey, SocketAddr>,

Message Changes by Operation

Connect (needs embedded addresses for NAT hole-punching)

ConnectRequest {
    joiner: PeerKeyLocation,  // joiner.addr = Unknown when joiner creates, filled by gateway
    target: PeerKeyLocation,  // target.addr = Known (we know where we're sending)
}

ConnectResponse {
    acceptor: PeerKeyLocation,  // acceptor.addr = Unknown when acceptor creates, filled by first recipient
    joiner: PeerKeyLocation,    // joiner.addr = Known (known from request)
}

Protocol for Connect:

  1. Joiner creates ConnectRequest with joiner.addr = PeerAddr::Unknown
  2. Gateway receives packet, fills joiner.addr = PeerAddr::Known(packet_source) before forwarding
  3. Request reaches acceptor with correct joiner address
  4. Acceptor creates ConnectResponse with acceptor.addr = PeerAddr::Unknown
  5. First recipient fills acceptor.addr = PeerAddr::Known(packet_source) before forwarding
  6. Response reaches joiner with correct acceptor address
  7. Both can now attempt NAT hole-punching with correct addresses

Get, Put, Subscribe, Update (no embedded originator addresses needed)

These operations use connection-based routing:

  • Each peer maintains operation state (keyed by transaction ID) that tracks which connection the request came from
  • When a response arrives, the peer looks up the transaction and forwards the response back over the original connection
  • No embedded address fields needed - routing is implicit via connection state
// Example: Subscribe
RequestSub {
    key: ContractKey,
    target: PeerKeyLocation,  // Where we're routing toward
    // No subscriber field needed - subscriber is tracked via connection state
}

ReturnSub {
    key: ContractKey,
    subscribed: bool,
    // No target field needed - response goes back via connection state
}

Implementation Phases

Phase 1: New Types

  1. Create PeerAddr enum with Unknown and Known(SocketAddr) variants
  2. Update PeerKeyLocation: replace peer: PeerId with pub_key: TransportPublicKey, use PeerAddr for addr, remove location field and add location() getter
  3. Change BTreeMap<PeerId, ...> to HashMap<TransportPublicKey, ...> where applicable
  4. Add type alias type PeerId = TransportPublicKey temporarily for gradual migration

Phase 2: Connection Management

  1. Change connections HashMap key from PeerId to SocketAddr
  2. Add pub_key: Option<TransportPublicKey> to connection entry
  3. Update skip lists to use HashSet<SocketAddr>
  4. Remove placeholder pub_key logic for transient connections
  5. Carefully preserve synchronization guarantees around connection access

Phase 3: Message Handling - Connect

  1. Update Connect message types to use PeerAddr for joiner/acceptor
  2. Add address fill-in logic for Connect messages only
  3. Remove observed_addr workaround (no longer needed)

Phase 4: Message Handling - Other Operations

  1. Remove unnecessary sender/subscriber/requester fields from Get, Put, Subscribe, Update
  2. Ensure response routing uses connection state, not embedded addresses
  3. Remove existing address correction/fixup logic

Phase 5: Cleanup

  1. Remove PeerId type alias
  2. Update all remaining call sites (~600 usages across 30+ files)

Critical Files

  • crates/core/src/node/mod.rs - PeerId definition (to be removed)
  • crates/core/src/ring/peer_key_location.rs - PeerKeyLocation changes
  • crates/core/src/ring/connection_manager.rs - Connection tracking restructure
  • crates/core/src/node/network_bridge/p2p_protoc.rs - Connection HashMap, address fill-in logic
  • crates/core/src/operations/connect.rs - Connect message changes
  • crates/core/src/operations/subscribe.rs - Remove subscriber field
  • crates/core/src/operations/get.rs - Remove sender field
  • crates/core/src/operations/put.rs - Remove sender/origin fields
  • crates/core/src/operations/update.rs - Remove sender field

Testing

Primary Validation

The Docker NAT test that motivated this refactor must pass:

cd ~/code/freenet/river/main/cli
FREENET_TEST_DOCKER_NAT=1 FREENET_CORE_PATH=/path/to/freenet-core cargo test --test message_flow river_message_flow_over_freenet -- --nocapture

This test:

  • Spins up gateway and peers in Docker containers with NAT simulation
  • Tests that peers behind NAT can subscribe and exchange messages via riverctl
  • Validates the complete flow: connect → subscribe → put → get

Additional Tests

  1. Existing unit tests should continue to pass
  2. freenet-test-network multi-peer tests validate routing still works
  3. Local mode tests (without NAT) ensure no regressions

Post-Implementation Verification Checklist

After implementation, verify the following invariants hold:

Address Resolution

  • No Unknown addresses after first hop: Any message with PeerAddr::Unknown must have it resolved to PeerAddr::Known before forwarding. Add debug assertions that panic if a message is about to be forwarded with Unknown still present.
  • Address fill-in happens exactly once: The first recipient fills in the address; subsequent hops should never see Unknown.
  • No loopback/unspecified addresses in forwarded messages: Ensure 127.0.0.1 or 0.0.0.0 never appear in messages after the first hop.

Connection Management

  • Connection HashMap consistency: connections (keyed by SocketAddr) and any reverse lookups (by pub_key) stay in sync during insert/remove operations.
  • No placeholder pub_keys: The old pattern of using gateway's own pub_key as placeholder for unknown peers should be completely removed.
  • Transient connections properly tracked: Transient connections identified by address, with pub_key learned from first message.

Operation State

  • Transaction state tracks return path: Each operation correctly stores which connection the request arrived on, enabling response routing.
  • Connection drops handled gracefully: If a connection drops mid-operation, the operation fails cleanly rather than routing to wrong peer.

Message Format

  • No orphaned address fields: Sender/subscriber/requester fields removed from non-Connect operations actually used connection-based routing.
  • Connect messages have correct addresses: After full message traversal, joiner and acceptor addresses are correct for NAT hole-punching.

End-to-End Validation

  • Docker NAT test passes: FREENET_TEST_DOCKER_NAT=1 cargo test --test message_flow river_message_flow_over_freenet completes successfully with both peers able to subscribe and exchange messages.
  • Multi-peer network test passes: freenet-test-network tests with multiple peers work correctly.

Logging/Debugging

  • Trace logs show address resolution: Add tracing for when UnknownKnown transition happens.
  • Connection identity updates logged: When a connection's pub_key is learned, log it for debugging.

[AI-assisted - Claude]

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-networkingArea: Networking, ring protocol, peer discoveryE-hardExperience needed to fix/implement: Hard / a lotS-needs-designStatus: Needs architectural design or RFCT-enhancementType: Improvement to existing functionality

    Type

    No type

    Projects

    Status

    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions