Skip to content

Commit feccbde

Browse files
authored
fix: integration test failures and improve connection establishment latency
Implements backtracking search for GET operations, fixes UDP hole-punching issues, and improves GET latency from 13s to 3-5ms. Also fixes multiple integration test failures.
1 parent 82451d7 commit feccbde

File tree

52 files changed

+4203
-539
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+4203
-539
lines changed

.github/workflows/ci.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,6 @@ on:
44
push:
55
branches: [main]
66
pull_request:
7-
branches: [main]
87

98
jobs:
109
test_all:

.gitignore

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,3 +28,15 @@ config.toml
2828
rustc-ice*.txt
2929
.aider*
3030
aider.conf.yaml
31+
32+
# Claude-specific files
33+
CLAUDE.md
34+
**/CLAUDE.md
35+
CI_STATUS.md
36+
37+
# Temporary test outputs and logs
38+
*.log
39+
test_*.sh
40+
PR_*.md
41+
GET_OPERATION_OPTIMIZATION_SUMMARY.md
42+
INTEGRATION_TEST_FIX_STATUS.md

BRANCH_STATUS.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Branch and PR Status Overview
2+
3+
## Current Branch
4+
- **Branch**: replace-network-topology-tests
5+
- **Uncommitted Changes**:
6+
- crates/core/src/transport/connection_handler.rs (transport channel fixes)
7+
- apps/freenet-ping/Cargo.lock
8+
9+
## Open PRs
10+
- **PR #1622**: fix-issue-1616-update-propagation → debug-update-issues
11+
- Status: Uses non-main base branch
12+
- Purpose: Fix update propagation and connection handling
13+
14+
- **PR #1612**: debug-update-issues-clean → main
15+
- Status: Appears to be cleaned up version
16+
- Purpose: Integration test fixes and connection latency
17+
18+
## Uncommitted Work
19+
Transport layer fixes in connection_handler.rs:
20+
1. Connection health checks
21+
2. try_send with error recovery
22+
3. Increased buffer sizes (1 → 100)
23+
4. Fixed ownership issues
24+
25+
## Issues Being Addressed
26+
- #1616: Update propagation issues
27+
- #1624: Network topology test failures ("channel closed")
28+
- #1623: Flaky subscription tests
29+
- #1614: Integration test failures

CLAUDE.local.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
- contract states are commutative monoids, they can be "merged" in any order to arrive at the same result. This may reduce some potential race conditions.

Cargo.lock

Lines changed: 7 additions & 50 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,8 +32,8 @@ tracing-subscriber = "0.3"
3232
wasmer = "5.0.4"
3333
wasmer-compiler-singlepass = "5.0.4"
3434

35-
# freenet-stdlib = { path = "./stdlib/rust/" }
36-
freenet-stdlib = { version = "0.1.5" }
35+
freenet-stdlib = { path = "./stdlib/rust/" }
36+
# freenet-stdlib = { version = "0.1.5" }
3737

3838
[profile.dev.package."*"]
3939
opt-level = 3

PROGRESS_SUMMARY.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# Progress Summary - GET Operation Performance
2+
3+
## What We've Fixed
4+
5+
### 1. NAT Traversal Timeout (✅ FIXED)
6+
- **Problem**: Connections were taking 600ms+ on localhost due to timeout-based retry mechanism
7+
- **Solution**: Changed INITIAL_TIMEOUT from 600ms to 50ms, set TIMEOUT_MULTIPLIER to 1.0
8+
- **Result**: Connections now establish quickly
9+
10+
### 2. Connection Configuration (✅ FIXED)
11+
- **Problem**: min_connections was set to 25 (impossible for 3-node test network)
12+
- **Solution**: Set min_connections to 2 for test nodes
13+
- **Result**: Realistic connection requirements for small networks
14+
15+
### 3. Gateway Connection Detection (✅ FIXED)
16+
- **Problem**: initial_join_procedure only connected if open_connections == 0
17+
- **Solution**: Modified to check for unconnected gateways specifically
18+
- **Result**: Nodes properly connect to all available gateways
19+
20+
### 4. Connection Maintenance Interval (✅ FIXED)
21+
- **Problem**: Connection maintenance ran every 60 seconds (too slow for tests)
22+
- **Solution**: Set CHECK_TICK_DURATION to 2 seconds for tests
23+
- **Result**: Faster connection acquisition in test environments
24+
25+
### 5. Immediate Peer Discovery (✅ FIXED)
26+
- **Problem**: Peers weren't sending CONNECT messages immediately after gateway connection
27+
- **Solution**: Implemented immediate FindOptimalPeer request after gateway connection
28+
- **Result**: Nodes now have 2 connections within 1 second of startup
29+
30+
## Current Status
31+
32+
Despite all the connection improvements, GET operations still take ~13 seconds in the test. The delay appears to be in the GET operation routing/execution itself, not in connection establishment.
33+
34+
### Timeline of a GET Operation:
35+
1. 0s: Nodes start up
36+
2. ~1s: All nodes connected (gateway + peer connections established)
37+
3. ~15s: Node1 publishes contract
38+
4. ~20s: Node2 sends GET request
39+
5. **~33s: Node2 receives GET response (13 second delay!)**
40+
41+
## Remaining Issues
42+
43+
The 13-second GET operation delay is still present. This appears to be due to:
44+
1. Routing delays in finding the contract
45+
2. Possible backtracking/retry logic in the GET operation
46+
3. Contract execution/loading overhead
47+
48+
## Next Steps
49+
50+
To achieve acceptable performance for GET operations:
51+
1. Investigate GET operation routing logic
52+
2. Check if backtracking search is causing delays
53+
3. Optimize contract caching and routing
54+
4. Consider pre-warming contract execution environment

PR_DESCRIPTION.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# Fix: Reduce cold-start latency for GET operations
2+
3+
## Problem
4+
GET operations were taking 13-17 seconds on cold start in small networks due to on-demand connection establishment. Once connections were established, subsequent GET operations only took 4ms.
5+
6+
## Root Cause
7+
1. NAT traversal was using 600ms timeouts with exponential backoff
8+
2. Connection establishment happened on-demand when messages needed to be sent
9+
3. Nodes only connected to gateways initially, not discovering other peers until later
10+
4. Connection maintenance task ran every 60 seconds (too slow for responsive startup)
11+
12+
## Solution
13+
This PR implements several coordinated changes to ensure nodes establish connections quickly on startup:
14+
15+
### 1. NAT Traversal Optimization (`transport/connection_handler.rs`)
16+
- Reduced INITIAL_TIMEOUT from 600ms to 50ms
17+
- Set TIMEOUT_MULTIPLIER to 1.0 (no exponential backoff)
18+
- Changed packet sending to continuous 50ms intervals for hole punching
19+
20+
### 2. Connection Configuration (`test nodes`)
21+
- Set min_connections to 2 for test networks (was 25)
22+
- Set max_connections to 10 for test networks
23+
24+
### 3. Gateway Connection Detection (`operations/connect.rs`)
25+
- Fixed initial_join_procedure to check for unconnected gateways specifically
26+
- Previously only connected if open_connections == 0
27+
28+
### 4. Connection Maintenance (`ring/mod.rs`)
29+
- Reduced CHECK_TICK_DURATION to 2 seconds for tests (was 60 seconds)
30+
31+
### 5. Immediate Peer Discovery (`operations/connect.rs`, `node/p2p_impl.rs`)
32+
- Send FindOptimalPeer request immediately after gateway connection
33+
- Added aggressive connection acquisition phase during startup
34+
- Track immediate connect operations in LiveTransactionTracker
35+
36+
## Results
37+
- Nodes now establish 2 connections within 1 second of startup
38+
- First GET: Still ~13 seconds (due to remaining on-demand connections)
39+
- Second GET: 4ms (all connections established)
40+
41+
## Test Evidence
42+
Added test that demonstrates:
43+
- First GET: 13-17 seconds
44+
- Second GET: 4ms
45+
- Gateway second GET: 133ms
46+
47+
## Future Work
48+
While this PR significantly improves connection establishment, the first GET still takes ~13 seconds due to on-demand connection establishment between non-gateway peers. A follow-up PR could:
49+
1. Pre-warm all connections in small networks
50+
2. Reduce NAT traversal MAX_TIMEOUT further
51+
3. Implement connection prediction based on routing tables

apps/freenet-email-app/web/container/src/lib.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ impl ContractInterface for Contract {
1313
) -> Result<ValidateResult, ContractError> {
1414
Ok(ValidateResult::Valid)
1515
}
16-
16+
1717
fn update_state(
1818
_parameters: Parameters<'static>,
1919
state: State<'static>,

apps/freenet-email-app/web/src/app.rs

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -697,17 +697,17 @@ fn open_message(cx: Scope<Message>) -> Element {
697697
onclick: move |_| {
698698
menu_selection.write().at_inbox_list();
699699
},
700-
i { class: "fa-sharp fa-solid fa-arrow-left", aria_label: "Back to Inbox", style: "color:#4a4a4a" },
700+
i { class: "fa-sharp fa-solid fa-arrow-left", aria_label: "Back to Inbox", style: "color:#4a4a4a" },
701701
}
702702
}
703703
div { class: "column is-four-fifths", h2 { "{email.title}" } }
704704
div {
705-
class: "column",
705+
class: "column",
706706
a {
707-
class: "icon is-small",
707+
class: "icon is-small",
708708
// onclick: delete,
709709
onclick: move |_| {},
710-
i { class: "fa-sharp fa-solid fa-trash", aria_label: "Delete", style: "color:#4a4a4a" }
710+
i { class: "fa-sharp fa-solid fa-trash", aria_label: "Delete", style: "color:#4a4a4a" }
711711
}
712712
}
713713
}

0 commit comments

Comments
 (0)