Skip to content

fix: use transport peer ID as P2PEvent source for correct message routing#9

Merged
dirvine merged 3 commits intomainfrom
fix/p2p-event-source-peer-id
Jan 30, 2026
Merged

fix: use transport peer ID as P2PEvent source for correct message routing#9
dirvine merged 3 commits intomainfrom
fix/p2p-event-source-peer-id

Conversation

@mickvandijke
Copy link
Collaborator

@mickvandijke mickvandijke commented Jan 30, 2026

Summary

  • P2PEvent::Message was using the logical from field from the JSON message payload as the source, but send_message() looks up peers by their transport-level peer ID
  • This mismatch meant consumers receiving a P2PEvent::Message could not reply using the source field
  • Now converts the transport PeerId to the hex string used in the peers map and uses that as the event source

Test plan

  • Verify cargo test passes
  • Verify cargo clippy -- -D warnings -D clippy::unwrap_used -D clippy::expect_used passes
  • Confirm round-trip message routing works with the corrected peer ID

🤖 Generated with Claude Code

@claude
Copy link

claude bot commented Jan 30, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

@mickvandijke mickvandijke force-pushed the fix/p2p-event-source-peer-id branch from e328261 to 5fd659e Compare January 30, 2026 10:01
…ting

Use the transport-level peer ID (hex-encoded ant-quic PeerId) as
P2PEvent::Message.source instead of the "from" field from the JSON
protocol message. This fixes a routing bug where consumers couldn't
pass the event source back to send_message() because the ID namespace
didn't match the peers map / active_connections keys.

Refactors:
- Hoist ant_peer_id_to_string import to module level
- Promote KEEPALIVE_PAYLOAD to module-level constant
- Extract parse_protocol_message() helper (DRY)
- Remove dead handle_received_message method
- Move inline tokio::time imports to module level

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@mickvandijke mickvandijke force-pushed the fix/p2p-event-source-peer-id branch from 5fd659e to 49421d8 Compare January 30, 2026 10:07
@mickvandijke
Copy link
Collaborator Author

@claude please review this PR

@claude
Copy link

claude bot commented Jan 30, 2026

Claude finished @mickvandijke's task in 1m 36s —— View job


Code Review ✅

Summary

This PR correctly fixes a critical message routing bug. The change ensures that P2PEvent::Message.source uses the transport-level peer ID (from the authenticated QUIC connection) instead of the logical "from" field from the JSON payload. This enables consumers to reply using send_message() with the event source and eliminates a potential identity spoofing vector.

What Changed

  1. New helper function parse_protocol_message() (lines 1750-1784):

    • Extracts protocol message parsing into a reusable function
    • Takes transport peer ID as a parameter and uses it as the authoritative source
    • Still validates that "from" field exists (required by wire protocol) but doesn't use it for routing
    • Returns Option<P2PEvent> for clean error handling
    • Excellent documentation explaining the security rationale
  2. Refactored message receive loop (lines 1229-1256):

    • Converts ant-quic PeerId to hex string at reception point using ant_peer_id_to_string()
    • Passes transport peer ID to parse_protocol_message()
    • Simplified error handling with cleaner logging
  3. Code cleanup:

    • Removed unused handle_received_message() method (54 lines)
    • Moved KEEPALIVE_PAYLOAD constant to module level for reusability
    • Consolidated imports for interval and Duration
    • Simplified repeated ant_peer_id_to_string() calls throughout
  4. Comprehensive test coverage (98 lines of new tests):

    • Regression test confirming transport peer ID is used as source
    • Tests for all required field validations (from, protocol, data)
    • Edge case: invalid JSON, empty payloads
    • Clear documentation of the spoofing prevention requirement

CLAUDE.md Compliance ✅

  • No panics in production code: All .unwrap(), .expect(), and panic!() calls are confined to test functions (#[test])
  • Proper error handling: Uses Option<P2PEvent> return type with the ? operator for clean error propagation
  • Clean code: No println!() - uses tracing macros (debug!, info!, warn!, trace!)

Code Quality ✅

  • Security improvement: Prevents peer ID spoofing by using authenticated transport layer identity
  • Better separation of concerns: Protocol parsing now isolated in a dedicated function
  • Excellent documentation: The function doc comments clearly explain the security rationale
  • Strong test coverage: Tests validate both the fix and guard against regressions
  • No duplication: Eliminated repeated inline parsing logic

Issues Found

None. This is a well-implemented fix with proper testing and documentation.

Recommendations

This PR is ready to merge. The fix correctly addresses the routing mismatch while improving code quality and security.

--- |

…ew() to start()

P2PNode::new() and start() both called start_network_listeners() and
start_message_receiving_system(), spawning duplicate accept and receive
loops competing on the same QUIC endpoint. This caused messages to be
randomly split between loops, leading to flaky delivery.

Now new() only constructs the struct; start() is the single place where
background loops are spawned. All callers updated to call start() after
new().

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…loops

fix: remove duplicate receive loops in P2PNode
Nice one Mick,
@dirvine dirvine merged commit e48fc9c into main Jan 30, 2026
16 of 18 checks passed
@dirvine
Copy link
Collaborator

dirvine commented Jan 30, 2026

Code Review: PR #9 - fix: use transport peer ID as P2PEvent source

Hey @mickvandijke, excellent security fix! Here's my comprehensive review including ADR compliance check.


📝 Summary

This PR fixes a security issue where P2PEvent::Message.source was using the from field from the JSON payload (application layer), allowing peer ID spoofing. Now it correctly uses the transport-level peer ID from the authenticated QUIC connection.


✅ ADR Compliance Check

ADR-001: Multi-Layer Architecture ✅ COMPLIANT

The fix aligns perfectly with the layer separation principle:

"Dependencies flow downward only: Application → Identity → Trust/Placement → DHT → Adaptive → Transport"

Before: P2PEvent source used application-layer from field (could be spoofed)
After: Uses transport-layer peer ID from ant-quic (authenticated by QUIC handshake)

This correctly respects the boundary between layers - the transport layer's authenticated peer ID is the source of truth, not application-layer claims.

ADR-002: Delegated Transport via ant-quic ✅ COMPLIANT

The fix properly uses the adapter layer:

use crate::transport::ant_quic_adapter::{DualStackNetworkNode, ant_peer_id_to_string};
// ...
let transport_peer_id = ant_peer_id_to_string(&peer_id);

This follows the ADR-002 pattern: "The adapter layer translates between saorsa-core abstractions and ant-quic primitives."


🔒 Security Improvements

This is a significant security fix. The old code trusted the from field in the JSON payload:

// OLD (vulnerable)
source: from.to_string(),  // Attacker could spoof any peer ID

The new code uses the authenticated transport peer ID:

// NEW (secure)
source: source.to_string(),  // From QUIC connection, can't be spoofed

The docstring correctly documents this:

"This eliminates a spoofing vector where a peer could claim an arbitrary identity via the JSON payload."


✅ What Looks Good

  1. Clean refactoring: New parse_protocol_message() function is a pure function with clear responsibilities
  2. Comprehensive tests: 6 new unit tests covering:
    • Transport peer ID used as source (regression test)
    • Missing from field rejected
    • Missing protocol field rejected
    • Missing data field rejected
    • Invalid JSON rejected
    • Empty payload handled
  3. Dead code removal: Removed unused handle_received_message function
  4. Constant extraction: KEEPALIVE_PAYLOAD moved to module-level constant
  5. Lifecycle separation: new() vs start() pattern (builds foundation for PR fix: remove duplicate receive loops in P2PNode #10)

⚠️ Minor Observations

1. The from field is still required in wire protocol

// "from" is a required protocol field (messages without it are dropped),
// but we use the transport peer ID as the authoritative source.
let logical_from = value.get("from")?.as_str()?;

This is correct - requiring from maintains backward compatibility with the wire protocol while ignoring its value for security purposes. The debug log shows both for debugging:

debug!("... source: {} (logical: {})", source, logical_from, ...);

2. Import organization

The interval import was moved from local scope to module level - this is a minor cleanup that's fine.


📋 Merge Checklist

  • Builds cleanly
  • Clippy passes (strict mode)
  • Tests pass
  • ADR-001 compliant (layer separation)
  • ADR-002 compliant (transport adapter usage)
  • Security improvement (prevents spoofing)
  • CI integration tests

✅ Verdict

Approved - This is a well-executed security fix that:

  1. Prevents peer ID spoofing attacks
  2. Correctly respects the multi-layer architecture (ADR-001)
  3. Properly uses the transport adapter (ADR-002)
  4. Includes comprehensive regression tests
  5. Cleans up dead code

This should be merged before PR #10, which is correctly stacked on top of this branch.

🤖 Reviewed by Claude

claude bot pushed a commit that referenced this pull request Feb 12, 2026
…ing)

Implements 11 hardening improvements from PR #21 review comments:

1. Request Leak/Cleanup:
   - Added RAII RequestCleanupGuard for automatic cleanup on drop/cancel
   - Atomic check+insert under single write lock to prevent races

2. Input Validation:
   - Added TransportError::ValidationError for input validation errors
   - validate_protocol_name() now uses ValidationError instead of StreamError

3. Timeout Handling:
   - Added MIN_REQUEST_TIMEOUT (100ms) to prevent Duration::ZERO immediate timeout
   - send_request() clamps timeout to [100ms, 5min] range

4. Response Routing:
   - Improved logging for failed pending.send() to clarify timeout scenario

5. Documentation:
   - Added documentation to ReplicationResult clarifying remote-only counts
   - Fixed brittle error assertion in tests to check error variant

6. Testing:
   - New tests/request_response_e2e_test.rs with 7 comprehensive tests:
     * Successful request/response routing
     * Timeout cleanup behavior
     * Invalid protocol rejection (empty, /, \, \0)
     * Protocol validation in send_response()
     * Minimum timeout enforcement
     * Trust reporting on failure

Items already correct (verified):
- #3: Protocol validation in send_response() already present
- #6: Response-origin mismatch uses get() before remove()
- #7: Unmatched /rr/ responses already suppressed
- #9: Trust reporting on send_message() failure already implemented
- #10: PeerStoreOutcome docs correct (no latency mention)

Closes #23

Co-authored-by: David Irvine <dirvine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants