Skip to content

Conversation

@sanity
Copy link
Collaborator

@sanity sanity commented Nov 30, 2025

Problem

In small network tests, peers form a star topology (all connected only to gateway) instead of a mesh topology. The root cause is serialized connection acquisition in the maintenance loop.

Location: crates/core/src/ring/mod.rs

The code used a single live_tx: Option<Transaction> to track pending connections. This means only ONE connection attempt could proceed at a time. If a connection takes 60 seconds (OPERATION_TTL), ALL other attempts are blocked for 60 seconds.

With 25 min_connections target and serial acquisition, reaching minimum connections takes far longer than test windows allow.

Solution

Replace the single live_tx tracking with a concurrent limit using live_tx_tracker.active_transaction_count():

  • Add MAX_CONCURRENT_CONNECTIONS = 3 constant
  • Allow new connection attempts while active_count < MAX_CONCURRENT_CONNECTIONS
  • Remove the now-unused still_alive method from LiveTransactionTracker
  • Add active_transaction_count() method to count all active transactions

Testing

  • All 9 connect unit tests pass
  • Compile check passes with no errors

Stack

This PR stacks on #2172 (seeding/subscriber NAT routing).

Part of #2173 (Priority 1).

[AI-assisted - Claude]

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to enable parallel connection attempts (up to 3 concurrent) to improve network formation speed by replacing a single-transaction tracking mechanism with a concurrent limit approach. However, the implementation does not achieve the intended parallelism.

Key Changes:

  • Added MAX_CONCURRENT_CONNECTIONS constant (set to 3) to limit concurrent connection attempts
  • Replaced single live_tx: Option<Transaction> with active_transaction_count() method for tracking
  • Removed unused still_alive() method from LiveTransactionTracker

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
crates/core/src/ring/mod.rs Added concurrent connection limit constant and modified maintenance loop to check active transaction count instead of single transaction tracking
crates/core/src/ring/live_tx.rs Removed still_alive() method and added active_transaction_count() to sum all active transactions
Comments suppressed due to low confidence (1)

crates/core/src/ring/mod.rs:486

  • The loop only processes one location per iteration, which prevents achieving the desired parallelism. To actually run up to MAX_CONCURRENT_CONNECTIONS concurrent connections, the code should loop while active_count < MAX_CONCURRENT_CONNECTIONS and attempt to start multiple connections until the limit is reached or pending_conn_adds is empty.

Suggested fix:

// Acquire new connections up to MAX_CONCURRENT_CONNECTIONS limit
let mut active_count = live_tx_tracker.active_transaction_count();
while active_count < MAX_CONCURRENT_CONNECTIONS {
    if let Some(ideal_location) = pending_conn_adds.pop_first() {
        // ... existing connection attempt code ...
        if tx.is_some() {
            active_count += 1; // Update count for next iteration
        }
    } else {
        break; // No more pending connections
    }
}
            if let Some(ideal_location) = pending_conn_adds.pop_first() {
                if active_count < MAX_CONCURRENT_CONNECTIONS {
                    tracing::info!(
                        active_connections = active_count,
                        max_concurrent = MAX_CONCURRENT_CONNECTIONS,
                        "Attempting to acquire new connection for location: {:?}",
                        ideal_location
                    );
                    let tx = self
                        .acquire_new(
                            ideal_location,
                            &skip_list,
                            &notifier,
                            &live_tx_tracker,
                            &op_manager,
                        )
                        .await
                        .map_err(|error| {
                            tracing::error!(
                                ?error,
                                "FATAL: Connection maintenance task failed - shutting down"
                            );
                            error
                        })?;
                    if tx.is_none() {
                        let conns = self.connection_manager.connection_count();
                        tracing::warn!(
                            "acquire_new returned None - likely no peers to query through (connections: {})",
                            conns
                        );
                    } else {
                        tracing::info!(
                            active_connections = active_count + 1,
                            "Successfully initiated connection acquisition"
                        );
                    }
                } else {
                    tracing::debug!(
                        active_connections = active_count,
                        max_concurrent = MAX_CONCURRENT_CONNECTIONS,
                        "At max concurrent connections, re-queuing location {}",
                        ideal_location
                    );
                    pending_conn_adds.insert(ideal_location);
                }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +54 to +60
/// Returns the total number of active transactions across all peers.
pub(crate) fn active_transaction_count(&self) -> usize {
self.tx_per_peer
.iter()
.map(|entry| entry.value().len())
.sum()
}
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new active_transaction_count() method lacks test coverage. Consider adding unit tests to verify:

  • Returns 0 when no transactions are tracked
  • Correctly counts a single transaction for one peer
  • Correctly sums transactions across multiple peers
  • Updates correctly after transactions are removed via remove_finished_transaction()

Example test:

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn active_transaction_count_empty() {
        let tracker = LiveTransactionTracker::new();
        assert_eq!(tracker.active_transaction_count(), 0);
    }

    #[test]
    fn active_transaction_count_multiple_peers() {
        let tracker = LiveTransactionTracker::new();
        let addr1 = "127.0.0.1:8080".parse().unwrap();
        let addr2 = "127.0.0.1:8081".parse().unwrap();
        
        tracker.add_transaction(addr1, Transaction::new::<()>());
        tracker.add_transaction(addr1, Transaction::new::<()>());
        tracker.add_transaction(addr2, Transaction::new::<()>());
        
        assert_eq!(tracker.active_transaction_count(), 3);
    }
}

Copilot uses AI. Check for mistakes.
@sanity
Copy link
Collaborator Author

sanity commented Nov 30, 2025

Re: Copilot's suggestion for active_transaction_count() tests

Added comprehensive unit tests for active_transaction_count() in commit 864f72a:

  • active_transaction_count_empty - verifies returns 0 when no transactions
  • active_transaction_count_single_peer - verifies correct count for single peer with multiple transactions
  • active_transaction_count_multiple_peers - verifies correct sum across multiple peers
  • active_transaction_count_after_removal - verifies count updates correctly after remove_finished_transaction()

All 4 tests pass.

[AI-assisted - Claude]

@sanity sanity force-pushed the fix/seeding-subscriber-nat-2164 branch from 4cac273 to 44f2a6d Compare November 30, 2025 22:20
@sanity sanity force-pushed the fix/parallel-connect-priority1-2173 branch from d596c99 to 872bee8 Compare November 30, 2025 22:20
@sanity sanity force-pushed the fix/seeding-subscriber-nat-2164 branch from 44f2a6d to caf6ce8 Compare November 30, 2025 23:02
@sanity sanity force-pushed the fix/parallel-connect-priority1-2173 branch from 872bee8 to 26e7660 Compare November 30, 2025 23:04
@sanity sanity force-pushed the fix/seeding-subscriber-nat-2164 branch from caf6ce8 to bf00c2c Compare November 30, 2025 23:11
@sanity sanity force-pushed the fix/parallel-connect-priority1-2173 branch from 26e7660 to e960382 Compare November 30, 2025 23:14
@sanity sanity force-pushed the fix/seeding-subscriber-nat-2164 branch from bf00c2c to 075fb1f Compare November 30, 2025 23:36
@sanity sanity force-pushed the fix/parallel-connect-priority1-2173 branch from e960382 to 99ab273 Compare November 30, 2025 23:37
@sanity sanity force-pushed the fix/seeding-subscriber-nat-2164 branch from 075fb1f to 9b8f5db Compare November 30, 2025 23:54
@sanity sanity force-pushed the fix/parallel-connect-priority1-2173 branch from 99ab273 to 01943b7 Compare November 30, 2025 23:54
@sanity sanity force-pushed the fix/seeding-subscriber-nat-2164 branch from 9b8f5db to 21f2780 Compare December 1, 2025 00:36
@sanity sanity force-pushed the fix/parallel-connect-priority1-2173 branch from 01943b7 to 3359306 Compare December 1, 2025 00:51
sanity added a commit that referenced this pull request Dec 1, 2025
- Add ObservedAddr type to transport/mod.rs for NAT address tracking
- Update add_subscriber wrapper in ring/mod.rs to accept upstream_addr param
- Add upstream_addr (None) to all local subscription call sites
- Remove duplicate get_peer_location_by_addr functions
- Fix socket_addr() calls (use dereference instead)
- Remove extra is_gateway argument from initiate_join_request

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@sanity sanity force-pushed the fix/parallel-connect-priority1-2173 branch from 3359306 to d01029d Compare December 1, 2025 01:28
@sanity sanity force-pushed the fix/seeding-subscriber-nat-2164 branch from 1c5b683 to daa05b9 Compare December 1, 2025 02:01
@sanity sanity force-pushed the fix/parallel-connect-priority1-2173 branch from d01029d to 0be445d Compare December 1, 2025 02:01
@sanity sanity force-pushed the fix/seeding-subscriber-nat-2164 branch from daa05b9 to 9db2219 Compare December 1, 2025 02:36
sanity and others added 13 commits November 30, 2025 20:58
This commit applies all wire protocol cleanup changes from PR #2169
on top of the rebased PR #2167 base:

- Remove sender field from GetMsg, PutMsg, SubscribeMsg, UpdateMsg, ConnectMsg
- Use upstream_addr for routing responses instead of embedded sender fields
- Delete transient_manager.rs (no longer needed)
- Update freenet-macros code generation for new message structure

The routing logic now derives the response target from the connection's
observed address (upstream_addr) rather than trusting sender fields in
messages. This is more reliable for NAT traversal scenarios.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Phase 1.3 of peer identity restructuring (issue #2164).

Uses rust-analyzer SSR to convert:
- .peer.pub_key -> .pub_key()
- .peer.addr -> .addr() (for read accesses)

Assignment operations (.peer.addr = x) are kept as direct field access
for now since the addr() method returns a copy, not a reference.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
… address

Key changes:
- Replace `peer: PeerId` with `pub_key: TransportPublicKey` + `peer_addr: PeerAddr`
- Add PeerAddr enum with Unknown/Known variants for explicit address state
- Add accessor methods: pub_key(), addr(), socket_addr(), peer()
- Add constructors: new(), with_unknown_addr(), with_location()
- Implement Ord/PartialOrd based on socket address

This separates cryptographic identity (pub_key) from network address (peer_addr),
enabling proper handling of peers behind NAT who don't know their external address.

Part of #2164 peer identity restructuring.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Addresses Nacho's architectural feedback to avoid raw SocketAddr in
protocol layer. Uses ObservedAddr newtype to wrap transport-layer
addresses, making the address semantics explicit at the type level.

Changes:
- Add ObservedAddr newtype in transport/mod.rs
- Update Operation trait to use Option<ObservedAddr> for source_addr
- Update all operation implementations (connect, get, put, subscribe, update)
- Update node/mod.rs and p2p_protoc.rs to use ObservedAddr
- Wrap incoming source_addr in ObservedAddr::new() at transport boundary
- Convert back to SocketAddr at network send boundaries

The conversion to raw SocketAddr happens at transport boundaries:
- build_op_result() converts for target_addr
- network_bridge.send() calls use .socket_addr()

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
This commit applies all wire protocol cleanup changes from PR #2169
on top of the rebased PR #2167 base:

- Remove sender field from GetMsg, PutMsg, SubscribeMsg, UpdateMsg, ConnectMsg
- Use upstream_addr for routing responses instead of embedded sender fields
- Delete transient_manager.rs (no longer needed)
- Update freenet-macros code generation for new message structure

The routing logic now derives the response target from the connection's
observed address (upstream_addr) rather than trusting sender fields in
messages. This is more reliable for NAT traversal scenarios.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Previously the connection maintenance loop only allowed one connection
attempt at a time. If a connection took 60 seconds to complete, all other
attempts were blocked. This caused star topology formation in small
networks since peers couldn't form connections fast enough.

Now we allow up to MAX_CONCURRENT_CONNECTIONS (3) parallel connection
attempts by using live_tx_tracker.active_transaction_count() instead
of tracking a single live_tx.

Fixes the serialized connection acquisition from #2173.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Tests verify the method:
- Returns 0 when no transactions are tracked
- Correctly counts transactions for a single peer
- Correctly sums transactions across multiple peers
- Updates correctly after transactions are removed

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Add ObservedAddr type to transport/mod.rs for NAT address tracking
- Update add_subscriber wrapper in ring/mod.rs to accept upstream_addr param
- Add upstream_addr (None) to all local subscription call sites
- Remove duplicate get_peer_location_by_addr functions

The add_subscriber method now takes an optional ObservedAddr parameter
to track the transport-level address for NAT peers. For local subscriptions
(no remote upstream), None is passed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
The acceptor's PeerAddr is intentionally Unknown (NAT scenario) since
acceptors don't know their external address. Tracing calls now use
pub_key() instead of peer() to avoid panics on unknown addresses.

This fix was needed after rebase because a prior commit replayed changes
that reintroduced the old code.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@sanity sanity force-pushed the fix/parallel-connect-priority1-2173 branch from 0be445d to bf5356f Compare December 1, 2025 03:01
@sanity sanity requested a review from Copilot December 1, 2025 05:18
Copilot finished reviewing on behalf of sanity December 1, 2025 05:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 410 to 424
// Fill in subscriber's external address from transport layer if unknown.
// This is the key step where the first recipient (gateway) determines the
// subscriber's external address from the actual packet source address.
let mut subscriber = subscriber.clone();
if let Some(addr) = source_addr {
subscriber.set_addr(addr);
tracing::debug!(
tx = %id,
%key,
subscriber_addr = %addr,
"subscribe: using transport source_addr for subscriber"
);
if subscriber.peer_addr.is_unknown() {
if let Some(addr) = source_addr {
subscriber.set_addr(addr);
tracing::debug!(
tx = %id,
%key,
subscriber_addr = %addr,
"subscribe: filled subscriber address from source_addr"
);
}
}
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavioral change from the previous implementation removes protection against known-but-incorrect NAT addresses. The old code (visible in removed lines elsewhere in this diff) would always use the transport source address when available, specifically to handle cases where NAT peers embed known-but-wrong addresses (e.g., 127.0.0.1:31337 or internal network addresses like 192.168.x.x).

The new conditional logic only fills in addresses marked as PeerAddr::Unknown, meaning if a NAT peer mistakenly sends PeerAddr::Known(127.0.0.1:31337) instead of PeerAddr::Unknown, the loopback address will not be corrected.

While this change aligns subscribe.rs with the pattern used in connect.rs and put.rs, it removes a safety mechanism that was explicitly documented as "critical for NAT peers" in the old code comments. This could break existing clients behind NAT that haven't been updated to send PeerAddr::Unknown when they don't know their external address.

Consider either:

  1. Reverting to unconditional address correction for backward compatibility, or
  2. Adding validation to reject obviously incorrect addresses (loopback, private ranges) when received from remote peers, or
  3. Documenting this as a breaking change requiring client updates

Copilot uses AI. Check for mistakes.
sanity and others added 2 commits December 1, 2025 16:07
Merges branch fix/seeding-subscriber-nat-2164 to bring in Nacho's
address validation fixes:
- Better tracing for subscribe address updates
- Clearer comments for subscription registration

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
PR #2172 added detailed tracing that calls subscriber.peer() BEFORE
filling in the address from source_addr. This causes a panic when the
subscriber has PeerAddr::Unknown (NAT peers).

The correct order (from PR #2171) is:
1. Check if address is unknown
2. Fill in address from source_addr
3. THEN call .peer() for tracing

This regression was introduced when merging PR #2172's tracing
improvements without preserving the correct order of operations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@sanity
Copy link
Collaborator Author

sanity commented Dec 1, 2025

Re: Copilot's suggestion for active_transaction_count() tests

Already addressed - comprehensive unit tests for active_transaction_count() were added in commit 864f72a:

  • active_transaction_count_empty - verifies returns 0 when no transactions
  • active_transaction_count_single_peer - verifies correct count for single peer with multiple transactions
  • active_transaction_count_multiple_peers - verifies correct sum across multiple peers

The test coverage Copilot suggested is already in place.

Re: NAT address handling concern

Copilot raises a valid concern about backward compatibility with clients that might send PeerAddr::Known(127.0.0.1:31337) instead of PeerAddr::Unknown.

However, this PR is part of a larger refactor (#2164) that introduces the explicit PeerAddr::Unknown type specifically to handle this cleanly. The design decision is:

  1. Correct clients send PeerAddr::Unknown when behind NAT (they don't know their external address)
  2. The first recipient fills in the address from the transport source
  3. This is consistent across connect.rs, put.rs, and subscribe.rs

The "old code" that always overwrote addresses was a workaround for not having an explicit "unknown" state. With PeerAddr::Unknown, we no longer need that workaround.

If backward compatibility with very old clients is a concern, that would be a separate discussion - but for the core network protocol being refactored, using explicit state is the cleaner approach.

[AI-assisted - Claude]

@sanity sanity merged commit 970b172 into fix/seeding-subscriber-nat-2164 Dec 2, 2025
8 checks passed
@sanity sanity deleted the fix/parallel-connect-priority1-2173 branch December 2, 2025 00:17
sanity added a commit that referenced this pull request Dec 2, 2025
Consolidates changes from PRs #2172, #2174, and #2175:

This builds on PR #2191 (wire protocol cleanup) and adds:
- Fix seeding/subscribe operations to handle PeerAddr::Unknown for NAT scenarios
- Gateway properly fills in observed addresses from packet source
- Improved subscriber address tracking in seeding manager
- Update live_tx and connection tests for new address model

NOTE: This PR requires review - previous PRs (#2174, #2175) had
CHANGES_REQUESTED from Nacho. Submitting consolidated changes for
fresh review.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
sanity added a commit that referenced this pull request Dec 2, 2025
Adds NAT address handling to subscribe/seeding operations:

- Subscribers with PeerAddr::Unknown have their address filled in by gateway
- Gateway observes real UDP source address and updates subscriber address
- SeedingManager tracks subscriber addresses properly
- live_tx tests updated for new address model
- In-memory testing infrastructure updated for PeerAddr

Supersedes PRs #2172, #2174, #2175 (which had changes requested).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants