Skip to content

feat(monero-rpc-pool): Add randomness to node selection, prefer established TCP circuits#508

Merged
binarybaron merged 3 commits intomasterfrom
fix/rpc-pool-efficient-conn-chosing
Aug 10, 2025
Merged

feat(monero-rpc-pool): Add randomness to node selection, prefer established TCP circuits#508
binarybaron merged 3 commits intomasterfrom
fix/rpc-pool-efficient-conn-chosing

Conversation

@binarybaron
Copy link
Copy Markdown

@binarybaron binarybaron commented Aug 10, 2025

Summary by CodeRabbit

  • New Features

    • The Monero RPC pool now prioritizes nodes with pre-established TCP connections and introduces hedged requests with a soft timeout to improve reliability and responsiveness.
  • Improvements

    • Node selection now incorporates both recent success rates and randomness, balancing reliability and distribution.
    • Proxy logic fully buffers responses before processing, enhancing error handling and bandwidth tracking.
    • Increased proxy node pool size for better load distribution.
    • Prioritized nodes with available connections for faster request handling.
    • Logging adjustments for improved observability.
  • Bug Fixes

    • Improved error handling for JSON-RPC responses and partial reads.
  • Removals

    • Removed composite reliability scoring in favor of simpler success rate metrics.
  • Chores

    • Updated documentation and changelog entries.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Aug 10, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Caution

Review failed

The pull request is closed.

Walkthrough

This change set updates the Monero RPC pool's node selection to prioritize nodes with pre-established TCP connections and introduces hedged request logic with soft timeouts. It also replaces strict reliability scoring with success rate-based randomness, increases proxy pool size, and fully buffers responses before processing. Several methods and SQL queries were updated or replaced to support the new node selection and request handling strategies.

Changes

Cohort / File(s) Change Summary
Changelog Update
CHANGELOG.md
Updated unreleased changelog to note that Monero RPC pool now prioritizes nodes with pre-established TCP connections.
SQL Query Replacement
monero-rpc-pool/.sqlx/query-44ddff5bdf5b56e9c1a9848641181de4441c8974b2d1304804874cf620420ad4.json,
monero-rpc-pool/.sqlx/query-4ce7c42906ba69e0c8e1c0dad952956edd582a0edecd45710e22dcb28785eeab.json
Deleted old SQL query ordering nodes by success ratio; added new query that introduces randomness into node selection, balancing reliability with randomization, and returns node connection details.
Stress Test Timeout
monero-rpc-pool/src/bin/stress_test_downloader.rs
Increased HTTP client timeout from 3m30s to 10m30s for stress test downloader.
Connection Pool Enhancement
monero-rpc-pool/src/connection_pool.rs
Added async method has_available_connection to check for available (unlocked) connections for a given stream key.
Database Node Selection
monero-rpc-pool/src/database.rs
Modified node selection to use a CTE that combines success rate ("base_score") with a random factor, ordering nodes by the product, thus introducing randomness while biasing toward reliable nodes.
Proxy Hedged Requests and Prioritization
monero-rpc-pool/src/proxy.rs
Refined proxy logic: added SOFT_TIMEOUT for hedged requests, increased pool size, prioritized nodes with available connections, introduced proxy_to_node_with_hedge for racing requests, fully buffered responses before processing, updated error handling, and adjusted bandwidth and success/failure recording.
Reliability Score Removal
monero-rpc-pool/src/types.rs
Removed reliability_score method from NodeHealthStats and NodeRecord, leaving only the success_rate method.
Seed Choice Assignment Refactor
swap/src/cli/api.rs
Refactored assignment of seed_choice in open_monero_wallet for clarity; no change to logic or control flow.
Logging Category Update
swap/src/common/tracing_util.rs
Moved "monero_rpc_pool" from info-level to general crate logging category by updating OUR_CRATES and clearing INFO_LEVEL_CRATES.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Proxy
    participant NodePool
    participant NodeA
    participant NodeB

    Client->>Proxy: Send RPC request
    Proxy->>NodePool: Get prioritized node list (by available connection)
    NodePool-->>Proxy: Return sorted node list

    loop For each node pair (main, hedge)
        Proxy->>NodeA: Send request to main node
        Note right of Proxy: Wait SOFT_TIMEOUT
        Proxy->>NodeB: (if main not finished) Send request to hedge node
        alt Main node responds first
            NodeA-->>Proxy: Response
            Proxy-->>Client: Return response
        else Hedge node responds first
            NodeB-->>Proxy: Response
            Proxy-->>Client: Return response
        else Both fail
            Proxy-->>Client: Aggregate errors
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

  • feat(monero-rpc-pool): TCP pool #497: Directly related as both PRs modify the Monero RPC pool's connection caching and node selection strategies, focusing on TCP/Tor stream management and prioritization.

Suggested reviewers

  • Einliterflasche

Poem

In the warren where Monero nodes hop,
Connections are primed, no need to stop.
With hedged requests and clever sort,
The fastest bunny wins the port!
Responses buffered, errors tamed,
This pool’s new logic can’t be blamed.
🐇✨

Note

🔌 MCP (Model Context Protocol) integration is now available in Early Access!

Pro users can now connect to remote MCP servers under the Integrations page to get reviews and chat conversations that understand additional development context.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3ae9b48 and e28556a.

📒 Files selected for processing (1)
  • monero-rpc-pool/src/proxy.rs (6 hunks)
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/rpc-pool-efficient-conn-chosing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@binarybaron
Copy link
Copy Markdown
Author

bugbot run

@binarybaron binarybaron marked this pull request as ready for review August 10, 2025 15:39
cursor[bot]

This comment was marked as outdated.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (15)
monero-rpc-pool/.sqlx/query-4ce7c42906ba69e0c8e1c0dad952956edd582a0edecd45710e22dcb28785eeab.json (1)

3-3: Consider smoothing for unseen nodes (base_score=0 → exploration starvation)

Defaulting base_score to 0 fully deprioritizes nodes without recent checks, which can inhibit exploration. Consider Laplace smoothing or a neutral prior (e.g., treat “no data” as 0.5) so new/rarely-checked nodes occasionally get sampled.

monero-rpc-pool/src/bin/stress_test_downloader.rs (1)

146-147: Make HTTP timeout configurable and set an explicit connect timeout

Hard-coding 10m30s can be too high/low depending on environment. Expose as a CLI flag and set a connect timeout to fail fast on dead endpoints.

Patch within this hunk:

-            .timeout(std::time::Duration::from_secs(10 * 60 + 30)) // used in wallet2
+            .connect_timeout(std::time::Duration::from_secs(15))
+            .timeout(std::time::Duration::from_secs(args.http_timeout_secs))

And add a CLI option (apply outside this hunk):

// In Args
#[arg(long, default_value = "630")]
#[arg(help = "HTTP request timeout in seconds (default 10m30s)")]
http_timeout_secs: u64,
swap/src/common/tracing_util.rs (1)

79-80: monero_rpc_pool moved to “our crates” at TRACE/DEBUG levels — verify intended log volume

This promotes monero_rpc_pool from INFO-only to full TRACE/DEBUG in multiple layers. Useful while iterating on the pool, but it may flood logs and Tauri event sinks.

  • Confirm this is intended for production builds.
  • Optionally gate verbose pool logs behind an env var (e.g., MONERO_RPC_POOL_VERBOSE=1) or keep it in INFO_LEVEL_CRATES for the Tauri/terminal layers while retaining TRACE in file layers.

Also applies to: 82-83

CHANGELOG.md (1)

13-13: Clarify entry to reflect selection randomness and hedged requests

Current line mentions prioritizing pre-established TCP connections. Consider also noting randomized success-rate weighting and hedged requests for completeness.

Suggested wording:

  • GUI + CLI + ASB: Monero RPC pool now prioritizes nodes with pre-established TCP circuits, introduces success-rate-weighted randomized selection, and uses hedged requests with soft timeouts to reduce tail latency.
monero-rpc-pool/src/connection_pool.rs (1)

179-191: Non-blocking availability probe looks good; minor cleanup possible

Logic is correct and low-contention. Optional micro-cleanup for readability:

-            let vec = vec_lock.read().await;
-            for sender_mutex in vec.iter() {
-                if sender_mutex.try_lock().is_ok() {
-                    return true;
-                }
-            }
+            let vec = vec_lock.read().await;
+            if vec.iter().any(|m| m.try_lock().is_ok()) {
+                return true;
+            }
monero-rpc-pool/src/database.rs (2)

241-245: Comment clarifies intent; consider recording the randomness approach in changelog/spec

The “max-of-3” bias toward 1 is a sensible way to keep top nodes preferred while adding shuffle. Documenting this briefly in the node-selection docs/changelog will help future maintainers tune it knowingly (e.g., change the number of draws).


247-281: Randomization SQL is fine; validate the “recent 1000 checks” global window and consider per-node capping

  • The subquery limits to the latest 1000 health_checks globally, not per node. Low-activity nodes may end up with base_score=0.0 due to missing rows and then be ordered almost entirely by randomness. If that’s intentional, fine; otherwise consider per-node capping (e.g., last N per node) or a time window (e.g., last 24h) to avoid global skew.
  • The MAX(...) over three ABS(RANDOM())/const draws is valid in SQLite and computed once per row in the CTE, as intended.

If desired, switch to per-node capping:

LEFT JOIN (
  -- Aggregate the last N checks per node instead of global top 1000
  SELECT node_id,
         SUM(CASE WHEN was_successful THEN 1 ELSE 0 END) AS success_count,
         SUM(CASE WHEN NOT was_successful THEN 1 ELSE 0 END) AS failure_count
  FROM (
    SELECT node_id, was_successful
    FROM health_checks hc
    WHERE EXISTS (SELECT 1 FROM monero_nodes n WHERE n.id = hc.node_id AND n.network = ?)
    QUALIFY ROW_NUMBER() OVER (PARTITION BY node_id ORDER BY timestamp DESC) <= 100
  )
  GROUP BY node_id
) stats ON n.id = stats.node_id

Note: SQLite lacks window functions before 3.25; if not available, emulate via correlated subqueries or keep the current global cap.

monero-rpc-pool/src/proxy.rs (8)

24-26: Nit: wording

“we use half of that that” → “we use half of that.”

-/// We assume this is a reasonable timeout. We use half of that that.
+/// We assume this is a reasonable timeout. We use half of that.

37-37: Pool size bump to 20

Fine, but keep an eye on the additional concurrent outbound connections and memory (especially with full buffering). If you see pressure, consider making this configurable.


94-125: Availability prioritization is good; make the checks concurrent to avoid N sequential awaits

The per-node has_available_connection checks run sequentially. With 20 nodes this can add tail latency. Collect them concurrently:

-    let mut nodes_with_availability = Vec::new();
-    for node in nodes.iter() {
-        let key = (node.0.clone(), node.1.clone(), node.2, use_tor);
-        let has_connection = state.connection_pool.has_available_connection(&key).await;
-        nodes_with_availability.push((node.clone(), has_connection));
-    }
+    let keys: Vec<_> = nodes
+        .iter()
+        .map(|n| (n.0.clone(), n.1.clone(), n.2, use_tor))
+        .collect();
+    let checks = futures::future::join_all(
+        keys.iter().map(|k| state.connection_pool.has_available_connection(k))
+    ).await;
+    let mut nodes_with_availability: Vec<_> =
+        nodes.into_iter().zip(checks.into_iter()).collect();

Note: sort_by is stable, so equal-availability nodes keep original order (good).


142-187: Hedged pairing flow is solid; minor error aggregation nuance

When a hedge pair fails, both nodes are recorded as failures with the same physical error, even if the second was never attempted due to immediate failure before hedge start. If you want accuracy in health metrics, only record the hedge as failed if it actually ran (i.e., was started and returned Err).

-            // Pair failed ... Record both nodes.
-            push_error(&mut collected_errors, node, HandlerError::PhyiscalError(e.clone()));
-            if let Some(hedge_node) = next.clone() {
-                push_error(&mut collected_errors, hedge_node, HandlerError::PhyiscalError(e));
-            }
+            // Pair failed; record main. Record hedge only if it actually ran and failed.
+            push_error(&mut collected_errors, node, HandlerError::PhyiscalError(e.clone()));
+            if let Some(hedge_node) = next.clone() {
+                // Optionally track whether the hedge started in proxy_to_node_with_hedge and bubble it up.
+                push_error(&mut collected_errors, hedge_node, HandlerError::PhyiscalError(e));
+            }

208-219: Full buffering mitigates partial-read issues but can blow memory for large responses

Blocks and bulk responses can be large. Consider a size cap and fall back to stream after a small prefix (you already have StreamableResponse for this). For example, buffer up to N MB for error-inspection and then stream the rest while still tracking bandwidth.


221-231: Heuristic “2 prior JSON-RPC errors” bypass

This heuristic is pragmatic, but it mixes transport success with application-level error. If upstreams consistently return JSON-RPC errors, treating them as success may poison client behavior. Consider whitelisting specific idempotent methods or error codes for the bypass, or record a separate “logical error” metric.


482-497: Good: mark connection failed on body read error

This avoids reusing a possibly-corrupted connection. Consider also marking failed on body read timeout if you add a hard timeout.


511-512: JSON-RPC parse fallback

Not treating unparseable JSON as error is acceptable for mixed endpoints. If you only expect JSON-RPC here, you may want to log at debug level to aid troubleshooting.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9ccbb18 and 3ae9b48.

📒 Files selected for processing (10)
  • CHANGELOG.md (1 hunks)
  • monero-rpc-pool/.sqlx/query-44ddff5bdf5b56e9c1a9848641181de4441c8974b2d1304804874cf620420ad4.json (0 hunks)
  • monero-rpc-pool/.sqlx/query-4ce7c42906ba69e0c8e1c0dad952956edd582a0edecd45710e22dcb28785eeab.json (1 hunks)
  • monero-rpc-pool/src/bin/stress_test_downloader.rs (1 hunks)
  • monero-rpc-pool/src/connection_pool.rs (1 hunks)
  • monero-rpc-pool/src/database.rs (1 hunks)
  • monero-rpc-pool/src/proxy.rs (7 hunks)
  • monero-rpc-pool/src/types.rs (0 hunks)
  • swap/src/cli/api.rs (1 hunks)
  • swap/src/common/tracing_util.rs (1 hunks)
💤 Files with no reviewable changes (2)
  • monero-rpc-pool/.sqlx/query-44ddff5bdf5b56e9c1a9848641181de4441c8974b2d1304804874cf620420ad4.json
  • monero-rpc-pool/src/types.rs
🧰 Additional context used
📓 Path-based instructions (1)
CHANGELOG.md

📄 CodeRabbit Inference Engine (AGENT.md)

The CHANGELOG.md file is frequently edited and should be reviewed for accurate and clear changelog entries

Files:

  • CHANGELOG.md
🧠 Learnings (1)
📚 Learning: 2025-07-23T20:01:42.422Z
Learnt from: CR
PR: eigenwallet/core#0
File: AGENT.md:0-0
Timestamp: 2025-07-23T20:01:42.422Z
Learning: Applies to swap/Cargo.toml : The `swap/Cargo.toml` file is frequently edited and should be reviewed carefully for dependency and configuration changes

Applied to files:

  • swap/src/common/tracing_util.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Cursor Bugbot

@@ -0,0 +1,28 @@
{
"db_name": "SQLite",
"query": "\n WITH scored AS (\n SELECT \n n.scheme,\n n.host,\n n.port,\n CASE \n WHEN (COALESCE(stats.success_count, 0) + COALESCE(stats.failure_count, 0)) > 0 \n THEN CAST(COALESCE(stats.success_count, 0) AS REAL) / CAST(COALESCE(stats.success_count, 0) + COALESCE(stats.failure_count, 0) AS REAL)\n ELSE 0.0 \n END as base_score,\n MAX(\n ABS(RANDOM()) / CAST(0x7fffffffffffffff AS REAL),\n ABS(RANDOM()) / CAST(0x7fffffffffffffff AS REAL),\n ABS(RANDOM()) / CAST(0x7fffffffffffffff AS REAL)\n ) as r\n FROM monero_nodes n\n LEFT JOIN (\n SELECT \n node_id,\n SUM(CASE WHEN was_successful THEN 1 ELSE 0 END) as success_count,\n SUM(CASE WHEN NOT was_successful THEN 1 ELSE 0 END) as failure_count\n FROM (\n SELECT node_id, was_successful\n FROM health_checks \n ORDER BY timestamp DESC \n LIMIT 1000\n ) recent_checks\n GROUP BY node_id\n ) stats ON n.id = stats.node_id\n WHERE n.network = ?\n )\n SELECT scheme, host, port\n FROM scored\n ORDER BY (base_score * r) DESC, r DESC\n LIMIT ?\n ",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Harden random factor computation to avoid rare overflow edge case

Using ABS(RANDOM()) risks overflow when RANDOM() returns INT64_MIN. Prefer masking the sign bit instead of ABS, and normalize with a constant literal to keep it purely numeric.

Suggested SQL change (update the source SQL and re-run sqlx prepare):

-                    MAX(
-                        ABS(RANDOM()) / CAST(0x7fffffffffffffff AS REAL),
-                        ABS(RANDOM()) / CAST(0x7fffffffffffffff AS REAL),
-                        ABS(RANDOM()) / CAST(0x7fffffffffffffff AS REAL)
-                    ) as r
+                    MAX(
+                        (RANDOM() & 0x7fffffffffffffff) / 9223372036854775807.0,
+                        (RANDOM() & 0x7fffffffffffffff) / 9223372036854775807.0,
+                        (RANDOM() & 0x7fffffffffffffff) / 9223372036854775807.0
+                    ) as r
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"query": "\n WITH scored AS (\n SELECT \n n.scheme,\n n.host,\n n.port,\n CASE \n WHEN (COALESCE(stats.success_count, 0) + COALESCE(stats.failure_count, 0)) > 0 \n THEN CAST(COALESCE(stats.success_count, 0) AS REAL) / CAST(COALESCE(stats.success_count, 0) + COALESCE(stats.failure_count, 0) AS REAL)\n ELSE 0.0 \n END as base_score,\n MAX(\n ABS(RANDOM()) / CAST(0x7fffffffffffffff AS REAL),\n ABS(RANDOM()) / CAST(0x7fffffffffffffff AS REAL),\n ABS(RANDOM()) / CAST(0x7fffffffffffffff AS REAL)\n ) as r\n FROM monero_nodes n\n LEFT JOIN (\n SELECT \n node_id,\n SUM(CASE WHEN was_successful THEN 1 ELSE 0 END) as success_count,\n SUM(CASE WHEN NOT was_successful THEN 1 ELSE 0 END) as failure_count\n FROM (\n SELECT node_id, was_successful\n FROM health_checks \n ORDER BY timestamp DESC \n LIMIT 1000\n ) recent_checks\n GROUP BY node_id\n ) stats ON n.id = stats.node_id\n WHERE n.network = ?\n )\n SELECT scheme, host, port\n FROM scored\n ORDER BY (base_score * r) DESC, r DESC\n LIMIT ?\n ",
"query": "\n WITH scored AS (\n SELECT \n n.scheme,\n n.host,\n n.port,\n CASE \n WHEN (COALESCE(stats.success_count, 0) + COALESCE(stats.failure_count, 0)) > 0 \n THEN CAST(COALESCE(stats.success_count, 0) AS REAL) / CAST(COALESCE(stats.success_count, 0) + COALESCE(stats.failure_count, 0) AS REAL)\n ELSE 0.0 \n END as base_score,\n MAX(\n (RANDOM() & 0x7fffffffffffffff) / 9223372036854775807.0,\n (RANDOM() & 0x7fffffffffffffff) / 9223372036854775807.0,\n (RANDOM() & 0x7fffffffffffffff) / 9223372036854775807.0\n ) as r\n FROM monero_nodes n\n LEFT JOIN (\n SELECT \n node_id,\n SUM(CASE WHEN was_successful THEN 1 ELSE 0 END) as success_count,\n SUM(CASE WHEN NOT was_successful THEN 1 ELSE 0 END) as failure_count\n FROM (\n SELECT node_id, was_successful\n FROM health_checks \n ORDER BY timestamp DESC \n LIMIT 1000\n ) recent_checks\n GROUP BY node_id\n ) stats ON n.id = stats.node_id\n WHERE n.network = ?\n )\n SELECT scheme, host, port\n FROM scored\n ORDER BY (base_score * r) DESC, r DESC\n LIMIT ?\n ",
🤖 Prompt for AI Agents
In
monero-rpc-pool/.sqlx/query-4ce7c42906ba69e0c8e1c0dad952956edd582a0edecd45710e22dcb28785eeab.json
at line 3, the use of ABS(RANDOM()) can cause overflow if RANDOM() returns
INT64_MIN. To fix this, replace ABS(RANDOM()) with a bitwise operation that
masks the sign bit to ensure a non-negative value, then normalize by dividing by
a constant numeric literal. Update the SQL query accordingly and re-run sqlx
prepare to regenerate the query metadata.

💡 Verification agent

🧩 Analysis chain

Limit “recent checks” to the target network to avoid cross-network bias

The inner LIMIT 1000 orders by timestamp across all nodes. If another network produces many checks, stats for the selected network can be underrepresented, pushing many nodes to base_score=0.

Proposed approach (you’ll need to pass the network parameter to the inner subquery as well, or use a named/ordinal parameter so it can be reused):

LEFT JOIN (
  SELECT node_id,
         SUM(CASE WHEN was_successful THEN 1 ELSE 0 END) AS success_count,
         SUM(CASE WHEN NOT was_successful THEN 1 ELSE 0 END) AS failure_count
  FROM (
    SELECT hc.node_id, hc.was_successful
    FROM health_checks hc
    JOIN monero_nodes n2 ON n2.id = hc.node_id
    WHERE n2.network = ?              -- same network as outer query
    ORDER BY hc.timestamp DESC
    LIMIT 1000
  ) recent_checks
  GROUP BY node_id
) stats ON n.id = stats.node_id

Action to verify:

  • Ensure the Rust call binds the extra parameter (or uses a reusable named/ordinal parameter).
  • Re-run sqlx prepare to refresh this file.

🏁 Script executed:

#!/bin/bash
# Search for references to health_checks across the repository
rg -n "health_checks" --color=never

Length of output: 2193


Add network filter to the inner recent_checks subquery
Limit health checks to the target network in the inner query to avoid cross-network bias.

• In your SQL (in src/database.rs where you build the WITH scored AS (…) query), update the subquery to:

FROM (
-   SELECT node_id, was_successful
-   FROM health_checks
-   ORDER BY timestamp DESC
-   LIMIT 1000
+   SELECT hc.node_id, hc.was_successful
+   FROM health_checks hc
+   JOIN monero_nodes n2 ON n2.id = hc.node_id
+   WHERE n2.network = ?             -- same network parameter as outer query
+   ORDER BY hc.timestamp DESC
+   LIMIT 1000
) recent_checks

• Update the Rust call (e.g. your sqlx::query! or query_file! invocation) to bind the extra network parameter (or switch to a named/ordinal parameter so it’s reused).
• Re-run sqlx prepare to regenerate the JSON under .sqlx/ with the new placeholder.

🤖 Prompt for AI Agents
In monero-rpc-pool/src/database.rs around the query building for the WITH scored
AS (...) clause, the inner recent_checks subquery lacks a filter for the target
network, causing cross-network data mixing. Modify the inner subquery to include
a WHERE clause filtering health_checks by the same network parameter used in the
outer query. Then update the Rust sqlx query invocation to bind this additional
network parameter accordingly, ensuring consistent parameter usage. Finally, run
sqlx prepare to regenerate the updated query JSON file under .sqlx/.

Comment on lines +28 to +30
/// If the main node does not finish within this period, we start a hedged request.
static SOFT_TIMEOUT: Duration = TIMEOUT.checked_div(2).unwrap();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Add a hard timeout for request execution, not just connection establishment

SOFT_TIMEOUT hedges the main request, but there’s no hard cap on the total duration of sending the request and reading the response body; if both main and hedge stall, the request can hang indefinitely. Wrap the entire single-node execution in a hard timeout.

Example (inside proxy_to_single_node):

-    // Forward the request to the node. URI stays relative, so no rewrite.
-    let response = match guarded_sender.send_request(request.to_request()).await {
+    // Forward the request and enforce a hard cap on response time.
+    let response = match tokio::time::timeout(TIMEOUT, guarded_sender.send_request(request.to_request())).await {
-        Ok(response) => response,
+        Ok(Ok(response)) => response,
+        Ok(Err(e)) => {
+            guarded_sender.mark_failed().await;
+            return Err(SingleRequestError::SendRequestError(e.to_string()));
+        }
+        Err(_) => {
+            // Timed out while awaiting response headers
+            guarded_sender.mark_failed().await;
+            return Err(SingleRequestError::Timeout("Request timed out".to_string()));
+        }
     };

And similarly time-bound the full-body collection:

-    let body_bytes = match body.collect().await {
+    let body_bytes = match tokio::time::timeout(TIMEOUT, body.collect()).await {
-        Ok(collected) => collected.to_bytes().to_vec(),
+        Ok(Ok(collected)) => collected.to_bytes().to_vec(),
+        Ok(Err(e)) => { /* existing error path */ }
+        Err(_) => {
+            guarded_sender.mark_failed().await;
+            return Err(SingleRequestError::Timeout("Body read timed out".to_string()));
+        }
     };

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In monero-rpc-pool/src/proxy.rs around lines 28 to 30, the current SOFT_TIMEOUT
only limits the connection establishment phase but does not enforce a hard
timeout on the entire request execution including sending and receiving data. To
fix this, introduce a hard timeout that wraps the entire single-node request
execution inside the proxy_to_single_node function. Use a timeout mechanism to
ensure the full request, including reading the response body, is bounded and
cannot hang indefinitely.

Comment on lines +297 to +383
/// Proxies a singular axum::Request to a given given main node with a specified hegde node
/// If the main nodes response hasn't finished after SOFT_TIMEOUT, we proxy to the hedge node
/// We then race the two responses, and return the one that finishes first (and is not an error)
async fn proxy_to_node_with_hedge(
state: &crate::AppState,
request: CloneableRequest,
main_node: &(String, String, u16),
hedge_node: &(String, String, u16),
) -> Result<(Response, (String, String, u16)), SingleRequestError> {
use std::future::Future;

// Start the main request immediately
let mut main_fut = Box::pin(proxy_to_single_node(state, request.clone(), main_node));

// Hedge request will be started after the soft timeout, unless the main fails first
let mut hedge_fut: Option<
Pin<Box<dyn Future<Output = Result<Response, SingleRequestError>> + Send>>,
> = None;

// Timer to trigger the hedge request
let mut soft_timer = Box::pin(tokio::time::sleep(SOFT_TIMEOUT));
let mut soft_timer_armed = true;

// If the main fails, keep its error to return if hedge also fails
let mut main_error: Option<SingleRequestError> = None;

loop {
// A future that awaits the hedge if present; otherwise stays pending
let mut hedge_wait = futures::future::poll_fn(|cx| {
if let Some(f) = hedge_fut.as_mut() {
f.as_mut().poll(cx)
} else {
std::task::Poll::Pending
}
});

tokio::select! {
res = &mut main_fut => {
match res {
Ok(resp) => return Ok((resp, main_node.clone())),
Err(err) => {
// Start hedge immediately if not yet started
main_error = Some(err);
if hedge_fut.is_none() {
tracing::debug!("Starting hedge request");
hedge_fut = Some(Box::pin(proxy_to_single_node(state, request.clone(), hedge_node)));
}

// If hedge exists, await it and prefer its result
if let Some(hf) = &mut hedge_fut {
let hedge_res = hf.await;
return hedge_res
.map(|resp| (resp, hedge_node.clone()))
.or_else(|_| Err(main_error.take().unwrap()));
} else {
return Err(main_error.take().unwrap());
}
}
}
}

// Start hedge after soft timeout if not already started
_ = &mut soft_timer, if soft_timer_armed => {
// Disarm timer so it does not keep firing
soft_timer_armed = false;
if hedge_fut.is_none() {
tracing::debug!("Starting hedge request");
hedge_fut = Some(Box::pin(proxy_to_single_node(state, request.clone(), hedge_node)));
}
}

// If hedge is started, also race it
res = &mut hedge_wait => {
match res {
Ok(resp) => return Ok((resp, hedge_node.clone())),
Err(_hedge_err) => {
// Hedge failed; if main already failed, return main's error, otherwise keep waiting on main
if let Some(err) = main_error.take() {
return Err(err);
}
hedge_fut = None;
}
}
}
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Cancellation-safety risk: dropping the losing future may release the mutex while a request is mid-flight

Hedging drops the non-winning future when the winner returns. If the losing future is inside proxy_to_single_node and currently in send_request or reading the body, dropping it will drop the GuardedSender, releasing the per-connection mutex before the in-flight exchange is cleanly aborted. That can allow the same connection to be reused prematurely, risking protocol corruption.

Mitigations:

  • Make proxy_to_single_node cancellation-safe by catching cancellation and marking the connection as failed in a Drop guard (or ensure that any early drop marks the sender failed).
  • Or run each single-node request on a dedicated connection (opt-in for hedged paths) to avoid reusing a connection after abort.
  • Or spawn tasks and abort the loser; ensure aborted tasks mark their connection failed.

Minimal change suggestion: have proxy_to_single_node wrap its body with tokio::select! on a cancellation token, and on cancellation call guarded_sender.mark_failed().await before returning.

🤖 Prompt for AI Agents
In monero-rpc-pool/src/proxy.rs between lines 297 and 383, the current hedging
logic drops the losing future immediately, which can prematurely release the
per-connection mutex inside proxy_to_single_node, risking protocol corruption.
To fix this, modify proxy_to_single_node to be cancellation-safe by wrapping its
main body in a tokio::select! that listens for a cancellation signal; upon
cancellation, ensure it calls guarded_sender.mark_failed().await before
returning to mark the connection as failed and prevent reuse. This minimal
change will safely handle dropped futures without corrupting connection state.

Comment on lines +863 to 866
seed_choice =
request_seed_choice(tauri_handle.clone().unwrap(), database)
.await?;
continue;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Avoid unwrap on optional Tauri handle — potential panic path

This unwrap can panic if invariants drift (e.g., future refactor passes seed_choice=Some with tauri_handle=None). Handle gracefully to avoid crashing.

Apply:

-                                seed_choice =
-                                    request_seed_choice(tauri_handle.clone().unwrap(), database)
-                                        .await?;
+                                let handle = match &tauri_handle {
+                                    Some(h) => h.clone(),
+                                    None => {
+                                        tracing::error!("Cannot prompt for wallet selection without a Tauri handle");
+                                        return Err(anyhow::anyhow!("Missing Tauri handle while prompting for wallet selection"));
+                                    }
+                                };
+                                seed_choice = request_seed_choice(handle, database).await?;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
seed_choice =
request_seed_choice(tauri_handle.clone().unwrap(), database)
.await?;
continue;
let handle = match &tauri_handle {
Some(h) => h.clone(),
None => {
tracing::error!("Cannot prompt for wallet selection without a Tauri handle");
return Err(anyhow::anyhow!("Missing Tauri handle while prompting for wallet selection"));
}
};
seed_choice = request_seed_choice(handle, database).await?;
continue;
🤖 Prompt for AI Agents
In swap/src/cli/api.rs around lines 863 to 866, the code unwraps an optional
Tauri handle which can cause a panic if the handle is None. To fix this, replace
the unwrap with proper error handling by checking if the Tauri handle is Some
before calling request_seed_choice. If it is None, handle the error gracefully,
for example by returning an error or skipping the operation, to avoid crashing
the application.

@binarybaron binarybaron merged commit 3163ca7 into master Aug 10, 2025
3 of 31 checks passed
}
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Connection Poisoning in Proxy Hedge

In proxy_to_node_with_hedge, dropping the "losing" request future releases its GuardedSender mutex without draining the in-flight HTTP/1 response or marking the connection as failed. This leaves a poisoned connection in the pool, allowing subsequent requests to reuse it, leading to intermittent protocol errors such as "end of file before message length reached" or send_request errors.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant