Skip to content

fix: clock skew tolerance 30s -> 5min (proven by Mac upload to NAT testnet)#77

Merged
jacderida merged 1 commit intosaorsa-labs:rc-2026.4.1from
grumbach:fix/clock-skew-tolerance
Apr 11, 2026
Merged

fix: clock skew tolerance 30s -> 5min (proven by Mac upload to NAT testnet)#77
jacderida merged 1 commit intosaorsa-labs:rc-2026.4.1from
grumbach:fix/clock-skew-tolerance

Conversation

@grumbach
Copy link
Copy Markdown
Collaborator

@grumbach grumbach commented Apr 8, 2026

Proven by live testnet upload from Mac

3/3 uploads from a macOS client succeeded (61s cold, 30-33s warm) to a 14-node NAT testnet across London, Amsterdam, NYC, and San Francisco. The Mac had 31 seconds of clock skew against the VPS nodes. Without this fix: 0/3 uploads.

The change

- const MAX_FUTURE_SECS: u64 = 30;
+ const MAX_FUTURE_SECS: u64 = 300;

One constant. A decentralized network cannot reject messages from devices with slightly off clocks. 30 seconds was causing the network to partition for any node behind NTP by even half a minute.

Testnet details

  • 7 VMs on DigitalOcean (lon1, ams3, nyc1, sfo3)
  • 14 node processes (3 bootstrap, 6 NAT, 2 public)
  • 3 NAT nodes behind port-restricted iptables namespace simulation
  • Upload client: macOS ARM (this Mac, 31s behind NTP)
  • File: 10MB -> 3 chunks via self-encryption
  • Payment: EVM batch on Arbitrum Sepolia
  • 3 chunks stored on 4 peers each

Reproduction

See saorsa-testnet PR for full reproduction scripts and documented results.

Greptile Summary

Widens the future-timestamp clock-skew window in parse_protocol_message from 30 s to 300 s, making it symmetric with the existing 5-minute past window (MAX_MESSAGE_AGE_SECS). The change is backed by measured 31–42 s of skew between a macOS client and NTP-synced VPS nodes, and validated by 3/3 successful testnet uploads that previously failed 0/3.

Confidence Score: 5/5

Safe to merge — testnet-proven fix with no logic errors; remaining findings are P2 style suggestions only.

Both changes are correct and well-motivated. The only open findings are a hardcoded literal in dead code and missing boundary tests — neither blocks merge.

No files require special attention.

Vulnerabilities

The wider future window (30 s → 5 min) marginally increases the replay-attack surface: a captured message can now be re-submitted up to 5 minutes after it was originally sent (matching the existing past window). However, the QUIC transport layer provides its own replay protection at the connection level, and 5 minutes is a standard industry tolerance (e.g., Kerberos). No higher-severity concerns identified.

Important Files Changed

Filename Overview
src/network.rs Increases MAX_FUTURE_SECS from 30 → 300 (5 min) and expands the constant's doc comment; MAX_MESSAGE_AGE_SECS was already 300, so both windows are now symmetric. No logic errors; boundary tests are absent (P2).
src/adaptive/dht.rs Doc-only fix: replaces broken rustdoc link syntax [MAX_CONSUMER_WEIGHT] with backtick-only MAX_CONSUMER_WEIGHT. No functional change.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Incoming WireMessage bytes] --> B[Deserialize via postcard]
    B --> C{Timestamp < now - 300s?}
    C -- Yes --> D[Reject: stale message]
    C -- No --> E{Timestamp > now + 300s?}
    E -- Yes --> F[Reject: future-dated]
    E -- No --> G{Signature present?}
    G -- Yes --> H[Verify ML-DSA-65 signature]
    H -- Fail --> I[Reject: bad signature]
    H -- Pass --> J[Emit P2PEvent::Message with authenticated PeerId]
    G -- No --> K[Emit P2PEvent::Message source = transport peer ID]
Loading

Comments Outside Diff (1)

  1. src/validation.rs, line 461 (link)

    P2 Hardcoded 300 duplicates the new constant

    validation.rs uses a literal 300 for the same future-timestamp guard that is now backed by MAX_FUTURE_SECS in network.rs. If the tolerance is tuned again, this copy will silently diverge. The struct is #[allow(dead_code)] today, but it's worth keeping the two in sync by referencing the shared constant (or at minimum adding an inline comment tying the value to MAX_FUTURE_SECS).

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/validation.rs
    Line: 461
    
    Comment:
    **Hardcoded `300` duplicates the new constant**
    
    `validation.rs` uses a literal `300` for the same future-timestamp guard that is now backed by `MAX_FUTURE_SECS` in `network.rs`. If the tolerance is tuned again, this copy will silently diverge. The struct is `#[allow(dead_code)]` today, but it's worth keeping the two in sync by referencing the shared constant (or at minimum adding an inline comment tying the value to `MAX_FUTURE_SECS`).
    
    
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/validation.rs
Line: 461

Comment:
**Hardcoded `300` duplicates the new constant**

`validation.rs` uses a literal `300` for the same future-timestamp guard that is now backed by `MAX_FUTURE_SECS` in `network.rs`. If the tolerance is tuned again, this copy will silently diverge. The struct is `#[allow(dead_code)]` today, but it's worth keeping the two in sync by referencing the shared constant (or at minimum adding an inline comment tying the value to `MAX_FUTURE_SECS`).

```suggestion
        if self.timestamp > now + 300 {
            // 5 minutes — matches network::MAX_FUTURE_SECS
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/network.rs
Line: 1627-1629

Comment:
**No test coverage for boundary conditions on the new 5-minute future window**

The existing tests only exercise the happy path (current timestamp). There are no tests that verify a message timestamped at `now + 299` is accepted and one at `now + 301` is rejected. Adding two boundary tests here would lock in the new value and catch any future accidental regression:

```rust
#[test]
fn test_parse_message_just_inside_future_window_is_accepted() {
    let ts = current_timestamp() + MAX_FUTURE_SECS - 1;
    let bytes = make_wire_bytes("test/v1", vec![1], "sender", ts);
    assert!(parse_protocol_message(&bytes, "peer").is_some());
}

#[test]
fn test_parse_message_beyond_future_window_is_rejected() {
    let ts = current_timestamp() + MAX_FUTURE_SECS + 10;
    let bytes = make_wire_bytes("test/v1", vec![1], "sender", ts);
    assert!(parse_protocol_message(&bytes, "peer").is_none());
}
```

(`MAX_FUTURE_SECS` would need to be `pub(crate)` or the literal `300` used in the test.)

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix: increase clock skew tolerance from ..." | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

A decentralized network cannot require participants to have accurate
clocks. Consumer devices commonly drift by minutes (no NTP, suspended
laptops, VMs without guest additions). The future-dated message window
was only 30 seconds while the stale window was 5 minutes. This
asymmetry caused nodes with slightly slow clocks to reject messages
from every node with an accurate clock, breaking identity exchange.

Measured 31-42s of clock skew between macOS client and NTP-synced VPS
nodes during testnet testing. Both windows now symmetric at 5 minutes.
grumbach added a commit to grumbach/ant-client that referenced this pull request Apr 8, 2026
Point at branches with 9 proven fixes (clock skew tolerance +
8 transport fixes) that enabled the first successful Mac upload
to a NAT-protected testnet.

Tested: 3/3 uploads from macOS (31s clock skew) to 14-node
NAT testnet across lon1/ams3/nyc1/sfo3. 30-33s warm uploads.

Dependencies:
- saorsa-core: grumbach/fix/clock-skew-tolerance (PR saorsa-labs/saorsa-core#77)
- saorsa-transport: grumbach/round4-combined (PR saorsa-labs/saorsa-transport#55)

Remove the [patch] section once those PRs are merged into rc-2026.4.1.
Comment on lines 1627 to +1629
const MAX_MESSAGE_AGE_SECS: u64 = 300;
/// Maximum allowed future timestamp (30 seconds to account for clock drift)
const MAX_FUTURE_SECS: u64 = 30;
/// Maximum allowed future timestamp — symmetric with the past window.
const MAX_FUTURE_SECS: u64 = 300;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No test coverage for boundary conditions on the new 5-minute future window

The existing tests only exercise the happy path (current timestamp). There are no tests that verify a message timestamped at now + 299 is accepted and one at now + 301 is rejected. Adding two boundary tests here would lock in the new value and catch any future accidental regression:

#[test]
fn test_parse_message_just_inside_future_window_is_accepted() {
    let ts = current_timestamp() + MAX_FUTURE_SECS - 1;
    let bytes = make_wire_bytes("test/v1", vec![1], "sender", ts);
    assert!(parse_protocol_message(&bytes, "peer").is_some());
}

#[test]
fn test_parse_message_beyond_future_window_is_rejected() {
    let ts = current_timestamp() + MAX_FUTURE_SECS + 10;
    let bytes = make_wire_bytes("test/v1", vec![1], "sender", ts);
    assert!(parse_protocol_message(&bytes, "peer").is_none());
}

(MAX_FUTURE_SECS would need to be pub(crate) or the literal 300 used in the test.)

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/network.rs
Line: 1627-1629

Comment:
**No test coverage for boundary conditions on the new 5-minute future window**

The existing tests only exercise the happy path (current timestamp). There are no tests that verify a message timestamped at `now + 299` is accepted and one at `now + 301` is rejected. Adding two boundary tests here would lock in the new value and catch any future accidental regression:

```rust
#[test]
fn test_parse_message_just_inside_future_window_is_accepted() {
    let ts = current_timestamp() + MAX_FUTURE_SECS - 1;
    let bytes = make_wire_bytes("test/v1", vec![1], "sender", ts);
    assert!(parse_protocol_message(&bytes, "peer").is_some());
}

#[test]
fn test_parse_message_beyond_future_window_is_rejected() {
    let ts = current_timestamp() + MAX_FUTURE_SECS + 10;
    let bytes = make_wire_bytes("test/v1", vec![1], "sender", ts);
    assert!(parse_protocol_message(&bytes, "peer").is_none());
}
```

(`MAX_FUTURE_SECS` would need to be `pub(crate)` or the literal `300` used in the test.)

How can I resolve this? If you propose a fix, please make it concise.

@jacderida jacderida merged commit 112e404 into saorsa-labs:rc-2026.4.1 Apr 11, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants