feat(monero-rpc-pool): Add randomness to node selection, prefer established TCP circuits#508
Conversation
…lished TCP circuits
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. Caution Review failedThe pull request is closed. WalkthroughThis change set updates the Monero RPC pool's node selection to prioritize nodes with pre-established TCP connections and introduces hedged request logic with soft timeouts. It also replaces strict reliability scoring with success rate-based randomness, increases proxy pool size, and fully buffers responses before processing. Several methods and SQL queries were updated or replaced to support the new node selection and request handling strategies. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Proxy
participant NodePool
participant NodeA
participant NodeB
Client->>Proxy: Send RPC request
Proxy->>NodePool: Get prioritized node list (by available connection)
NodePool-->>Proxy: Return sorted node list
loop For each node pair (main, hedge)
Proxy->>NodeA: Send request to main node
Note right of Proxy: Wait SOFT_TIMEOUT
Proxy->>NodeB: (if main not finished) Send request to hedge node
alt Main node responds first
NodeA-->>Proxy: Response
Proxy-->>Client: Return response
else Hedge node responds first
NodeB-->>Proxy: Response
Proxy-->>Client: Return response
else Both fail
Proxy-->>Client: Aggregate errors
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Possibly related PRs
Suggested reviewers
Poem
Note 🔌 MCP (Model Context Protocol) integration is now available in Early Access!Pro users can now connect to remote MCP servers under the Integrations page to get reviews and chat conversations that understand additional development context. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
bugbot run |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (15)
monero-rpc-pool/.sqlx/query-4ce7c42906ba69e0c8e1c0dad952956edd582a0edecd45710e22dcb28785eeab.json (1)
3-3: Consider smoothing for unseen nodes (base_score=0 → exploration starvation)Defaulting base_score to 0 fully deprioritizes nodes without recent checks, which can inhibit exploration. Consider Laplace smoothing or a neutral prior (e.g., treat “no data” as 0.5) so new/rarely-checked nodes occasionally get sampled.
monero-rpc-pool/src/bin/stress_test_downloader.rs (1)
146-147: Make HTTP timeout configurable and set an explicit connect timeoutHard-coding 10m30s can be too high/low depending on environment. Expose as a CLI flag and set a connect timeout to fail fast on dead endpoints.
Patch within this hunk:
- .timeout(std::time::Duration::from_secs(10 * 60 + 30)) // used in wallet2 + .connect_timeout(std::time::Duration::from_secs(15)) + .timeout(std::time::Duration::from_secs(args.http_timeout_secs))And add a CLI option (apply outside this hunk):
// In Args #[arg(long, default_value = "630")] #[arg(help = "HTTP request timeout in seconds (default 10m30s)")] http_timeout_secs: u64,swap/src/common/tracing_util.rs (1)
79-80: monero_rpc_pool moved to “our crates” at TRACE/DEBUG levels — verify intended log volumeThis promotes monero_rpc_pool from INFO-only to full TRACE/DEBUG in multiple layers. Useful while iterating on the pool, but it may flood logs and Tauri event sinks.
- Confirm this is intended for production builds.
- Optionally gate verbose pool logs behind an env var (e.g., MONERO_RPC_POOL_VERBOSE=1) or keep it in INFO_LEVEL_CRATES for the Tauri/terminal layers while retaining TRACE in file layers.
Also applies to: 82-83
CHANGELOG.md (1)
13-13: Clarify entry to reflect selection randomness and hedged requestsCurrent line mentions prioritizing pre-established TCP connections. Consider also noting randomized success-rate weighting and hedged requests for completeness.
Suggested wording:
- GUI + CLI + ASB: Monero RPC pool now prioritizes nodes with pre-established TCP circuits, introduces success-rate-weighted randomized selection, and uses hedged requests with soft timeouts to reduce tail latency.
monero-rpc-pool/src/connection_pool.rs (1)
179-191: Non-blocking availability probe looks good; minor cleanup possibleLogic is correct and low-contention. Optional micro-cleanup for readability:
- let vec = vec_lock.read().await; - for sender_mutex in vec.iter() { - if sender_mutex.try_lock().is_ok() { - return true; - } - } + let vec = vec_lock.read().await; + if vec.iter().any(|m| m.try_lock().is_ok()) { + return true; + }monero-rpc-pool/src/database.rs (2)
241-245: Comment clarifies intent; consider recording the randomness approach in changelog/specThe “max-of-3” bias toward 1 is a sensible way to keep top nodes preferred while adding shuffle. Documenting this briefly in the node-selection docs/changelog will help future maintainers tune it knowingly (e.g., change the number of draws).
247-281: Randomization SQL is fine; validate the “recent 1000 checks” global window and consider per-node capping
- The subquery limits to the latest 1000 health_checks globally, not per node. Low-activity nodes may end up with base_score=0.0 due to missing rows and then be ordered almost entirely by randomness. If that’s intentional, fine; otherwise consider per-node capping (e.g., last N per node) or a time window (e.g., last 24h) to avoid global skew.
- The MAX(...) over three ABS(RANDOM())/const draws is valid in SQLite and computed once per row in the CTE, as intended.
If desired, switch to per-node capping:
LEFT JOIN ( -- Aggregate the last N checks per node instead of global top 1000 SELECT node_id, SUM(CASE WHEN was_successful THEN 1 ELSE 0 END) AS success_count, SUM(CASE WHEN NOT was_successful THEN 1 ELSE 0 END) AS failure_count FROM ( SELECT node_id, was_successful FROM health_checks hc WHERE EXISTS (SELECT 1 FROM monero_nodes n WHERE n.id = hc.node_id AND n.network = ?) QUALIFY ROW_NUMBER() OVER (PARTITION BY node_id ORDER BY timestamp DESC) <= 100 ) GROUP BY node_id ) stats ON n.id = stats.node_idNote: SQLite lacks window functions before 3.25; if not available, emulate via correlated subqueries or keep the current global cap.
monero-rpc-pool/src/proxy.rs (8)
24-26: Nit: wording“we use half of that that” → “we use half of that.”
-/// We assume this is a reasonable timeout. We use half of that that. +/// We assume this is a reasonable timeout. We use half of that.
37-37: Pool size bump to 20Fine, but keep an eye on the additional concurrent outbound connections and memory (especially with full buffering). If you see pressure, consider making this configurable.
94-125: Availability prioritization is good; make the checks concurrent to avoid N sequential awaitsThe per-node has_available_connection checks run sequentially. With 20 nodes this can add tail latency. Collect them concurrently:
- let mut nodes_with_availability = Vec::new(); - for node in nodes.iter() { - let key = (node.0.clone(), node.1.clone(), node.2, use_tor); - let has_connection = state.connection_pool.has_available_connection(&key).await; - nodes_with_availability.push((node.clone(), has_connection)); - } + let keys: Vec<_> = nodes + .iter() + .map(|n| (n.0.clone(), n.1.clone(), n.2, use_tor)) + .collect(); + let checks = futures::future::join_all( + keys.iter().map(|k| state.connection_pool.has_available_connection(k)) + ).await; + let mut nodes_with_availability: Vec<_> = + nodes.into_iter().zip(checks.into_iter()).collect();Note: sort_by is stable, so equal-availability nodes keep original order (good).
142-187: Hedged pairing flow is solid; minor error aggregation nuanceWhen a hedge pair fails, both nodes are recorded as failures with the same physical error, even if the second was never attempted due to immediate failure before hedge start. If you want accuracy in health metrics, only record the hedge as failed if it actually ran (i.e., was started and returned Err).
- // Pair failed ... Record both nodes. - push_error(&mut collected_errors, node, HandlerError::PhyiscalError(e.clone())); - if let Some(hedge_node) = next.clone() { - push_error(&mut collected_errors, hedge_node, HandlerError::PhyiscalError(e)); - } + // Pair failed; record main. Record hedge only if it actually ran and failed. + push_error(&mut collected_errors, node, HandlerError::PhyiscalError(e.clone())); + if let Some(hedge_node) = next.clone() { + // Optionally track whether the hedge started in proxy_to_node_with_hedge and bubble it up. + push_error(&mut collected_errors, hedge_node, HandlerError::PhyiscalError(e)); + }
208-219: Full buffering mitigates partial-read issues but can blow memory for large responsesBlocks and bulk responses can be large. Consider a size cap and fall back to stream after a small prefix (you already have StreamableResponse for this). For example, buffer up to N MB for error-inspection and then stream the rest while still tracking bandwidth.
221-231: Heuristic “2 prior JSON-RPC errors” bypassThis heuristic is pragmatic, but it mixes transport success with application-level error. If upstreams consistently return JSON-RPC errors, treating them as success may poison client behavior. Consider whitelisting specific idempotent methods or error codes for the bypass, or record a separate “logical error” metric.
482-497: Good: mark connection failed on body read errorThis avoids reusing a possibly-corrupted connection. Consider also marking failed on body read timeout if you add a hard timeout.
511-512: JSON-RPC parse fallbackNot treating unparseable JSON as error is acceptable for mixed endpoints. If you only expect JSON-RPC here, you may want to log at debug level to aid troubleshooting.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
CHANGELOG.md(1 hunks)monero-rpc-pool/.sqlx/query-44ddff5bdf5b56e9c1a9848641181de4441c8974b2d1304804874cf620420ad4.json(0 hunks)monero-rpc-pool/.sqlx/query-4ce7c42906ba69e0c8e1c0dad952956edd582a0edecd45710e22dcb28785eeab.json(1 hunks)monero-rpc-pool/src/bin/stress_test_downloader.rs(1 hunks)monero-rpc-pool/src/connection_pool.rs(1 hunks)monero-rpc-pool/src/database.rs(1 hunks)monero-rpc-pool/src/proxy.rs(7 hunks)monero-rpc-pool/src/types.rs(0 hunks)swap/src/cli/api.rs(1 hunks)swap/src/common/tracing_util.rs(1 hunks)
💤 Files with no reviewable changes (2)
- monero-rpc-pool/.sqlx/query-44ddff5bdf5b56e9c1a9848641181de4441c8974b2d1304804874cf620420ad4.json
- monero-rpc-pool/src/types.rs
🧰 Additional context used
📓 Path-based instructions (1)
CHANGELOG.md
📄 CodeRabbit Inference Engine (AGENT.md)
The
CHANGELOG.mdfile is frequently edited and should be reviewed for accurate and clear changelog entries
Files:
CHANGELOG.md
🧠 Learnings (1)
📚 Learning: 2025-07-23T20:01:42.422Z
Learnt from: CR
PR: eigenwallet/core#0
File: AGENT.md:0-0
Timestamp: 2025-07-23T20:01:42.422Z
Learning: Applies to swap/Cargo.toml : The `swap/Cargo.toml` file is frequently edited and should be reviewed carefully for dependency and configuration changes
Applied to files:
swap/src/common/tracing_util.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Cursor Bugbot
| @@ -0,0 +1,28 @@ | |||
| { | |||
| "db_name": "SQLite", | |||
| "query": "\n WITH scored AS (\n SELECT \n n.scheme,\n n.host,\n n.port,\n CASE \n WHEN (COALESCE(stats.success_count, 0) + COALESCE(stats.failure_count, 0)) > 0 \n THEN CAST(COALESCE(stats.success_count, 0) AS REAL) / CAST(COALESCE(stats.success_count, 0) + COALESCE(stats.failure_count, 0) AS REAL)\n ELSE 0.0 \n END as base_score,\n MAX(\n ABS(RANDOM()) / CAST(0x7fffffffffffffff AS REAL),\n ABS(RANDOM()) / CAST(0x7fffffffffffffff AS REAL),\n ABS(RANDOM()) / CAST(0x7fffffffffffffff AS REAL)\n ) as r\n FROM monero_nodes n\n LEFT JOIN (\n SELECT \n node_id,\n SUM(CASE WHEN was_successful THEN 1 ELSE 0 END) as success_count,\n SUM(CASE WHEN NOT was_successful THEN 1 ELSE 0 END) as failure_count\n FROM (\n SELECT node_id, was_successful\n FROM health_checks \n ORDER BY timestamp DESC \n LIMIT 1000\n ) recent_checks\n GROUP BY node_id\n ) stats ON n.id = stats.node_id\n WHERE n.network = ?\n )\n SELECT scheme, host, port\n FROM scored\n ORDER BY (base_score * r) DESC, r DESC\n LIMIT ?\n ", | |||
There was a problem hiding this comment.
🛠️ Refactor suggestion
Harden random factor computation to avoid rare overflow edge case
Using ABS(RANDOM()) risks overflow when RANDOM() returns INT64_MIN. Prefer masking the sign bit instead of ABS, and normalize with a constant literal to keep it purely numeric.
Suggested SQL change (update the source SQL and re-run sqlx prepare):
- MAX(
- ABS(RANDOM()) / CAST(0x7fffffffffffffff AS REAL),
- ABS(RANDOM()) / CAST(0x7fffffffffffffff AS REAL),
- ABS(RANDOM()) / CAST(0x7fffffffffffffff AS REAL)
- ) as r
+ MAX(
+ (RANDOM() & 0x7fffffffffffffff) / 9223372036854775807.0,
+ (RANDOM() & 0x7fffffffffffffff) / 9223372036854775807.0,
+ (RANDOM() & 0x7fffffffffffffff) / 9223372036854775807.0
+ ) as r📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "query": "\n WITH scored AS (\n SELECT \n n.scheme,\n n.host,\n n.port,\n CASE \n WHEN (COALESCE(stats.success_count, 0) + COALESCE(stats.failure_count, 0)) > 0 \n THEN CAST(COALESCE(stats.success_count, 0) AS REAL) / CAST(COALESCE(stats.success_count, 0) + COALESCE(stats.failure_count, 0) AS REAL)\n ELSE 0.0 \n END as base_score,\n MAX(\n ABS(RANDOM()) / CAST(0x7fffffffffffffff AS REAL),\n ABS(RANDOM()) / CAST(0x7fffffffffffffff AS REAL),\n ABS(RANDOM()) / CAST(0x7fffffffffffffff AS REAL)\n ) as r\n FROM monero_nodes n\n LEFT JOIN (\n SELECT \n node_id,\n SUM(CASE WHEN was_successful THEN 1 ELSE 0 END) as success_count,\n SUM(CASE WHEN NOT was_successful THEN 1 ELSE 0 END) as failure_count\n FROM (\n SELECT node_id, was_successful\n FROM health_checks \n ORDER BY timestamp DESC \n LIMIT 1000\n ) recent_checks\n GROUP BY node_id\n ) stats ON n.id = stats.node_id\n WHERE n.network = ?\n )\n SELECT scheme, host, port\n FROM scored\n ORDER BY (base_score * r) DESC, r DESC\n LIMIT ?\n ", | |
| "query": "\n WITH scored AS (\n SELECT \n n.scheme,\n n.host,\n n.port,\n CASE \n WHEN (COALESCE(stats.success_count, 0) + COALESCE(stats.failure_count, 0)) > 0 \n THEN CAST(COALESCE(stats.success_count, 0) AS REAL) / CAST(COALESCE(stats.success_count, 0) + COALESCE(stats.failure_count, 0) AS REAL)\n ELSE 0.0 \n END as base_score,\n MAX(\n (RANDOM() & 0x7fffffffffffffff) / 9223372036854775807.0,\n (RANDOM() & 0x7fffffffffffffff) / 9223372036854775807.0,\n (RANDOM() & 0x7fffffffffffffff) / 9223372036854775807.0\n ) as r\n FROM monero_nodes n\n LEFT JOIN (\n SELECT \n node_id,\n SUM(CASE WHEN was_successful THEN 1 ELSE 0 END) as success_count,\n SUM(CASE WHEN NOT was_successful THEN 1 ELSE 0 END) as failure_count\n FROM (\n SELECT node_id, was_successful\n FROM health_checks \n ORDER BY timestamp DESC \n LIMIT 1000\n ) recent_checks\n GROUP BY node_id\n ) stats ON n.id = stats.node_id\n WHERE n.network = ?\n )\n SELECT scheme, host, port\n FROM scored\n ORDER BY (base_score * r) DESC, r DESC\n LIMIT ?\n ", |
🤖 Prompt for AI Agents
In
monero-rpc-pool/.sqlx/query-4ce7c42906ba69e0c8e1c0dad952956edd582a0edecd45710e22dcb28785eeab.json
at line 3, the use of ABS(RANDOM()) can cause overflow if RANDOM() returns
INT64_MIN. To fix this, replace ABS(RANDOM()) with a bitwise operation that
masks the sign bit to ensure a non-negative value, then normalize by dividing by
a constant numeric literal. Update the SQL query accordingly and re-run sqlx
prepare to regenerate the query metadata.
💡 Verification agent
🧩 Analysis chain
Limit “recent checks” to the target network to avoid cross-network bias
The inner LIMIT 1000 orders by timestamp across all nodes. If another network produces many checks, stats for the selected network can be underrepresented, pushing many nodes to base_score=0.
Proposed approach (you’ll need to pass the network parameter to the inner subquery as well, or use a named/ordinal parameter so it can be reused):
LEFT JOIN (
SELECT node_id,
SUM(CASE WHEN was_successful THEN 1 ELSE 0 END) AS success_count,
SUM(CASE WHEN NOT was_successful THEN 1 ELSE 0 END) AS failure_count
FROM (
SELECT hc.node_id, hc.was_successful
FROM health_checks hc
JOIN monero_nodes n2 ON n2.id = hc.node_id
WHERE n2.network = ? -- same network as outer query
ORDER BY hc.timestamp DESC
LIMIT 1000
) recent_checks
GROUP BY node_id
) stats ON n.id = stats.node_idAction to verify:
- Ensure the Rust call binds the extra parameter (or uses a reusable named/ordinal parameter).
- Re-run
sqlx prepareto refresh this file.
🏁 Script executed:
#!/bin/bash
# Search for references to health_checks across the repository
rg -n "health_checks" --color=neverLength of output: 2193
Add network filter to the inner recent_checks subquery
Limit health checks to the target network in the inner query to avoid cross-network bias.
• In your SQL (in src/database.rs where you build the WITH scored AS (…) query), update the subquery to:
FROM (
- SELECT node_id, was_successful
- FROM health_checks
- ORDER BY timestamp DESC
- LIMIT 1000
+ SELECT hc.node_id, hc.was_successful
+ FROM health_checks hc
+ JOIN monero_nodes n2 ON n2.id = hc.node_id
+ WHERE n2.network = ? -- same network parameter as outer query
+ ORDER BY hc.timestamp DESC
+ LIMIT 1000
) recent_checks• Update the Rust call (e.g. your sqlx::query! or query_file! invocation) to bind the extra network parameter (or switch to a named/ordinal parameter so it’s reused).
• Re-run sqlx prepare to regenerate the JSON under .sqlx/ with the new placeholder.
🤖 Prompt for AI Agents
In monero-rpc-pool/src/database.rs around the query building for the WITH scored
AS (...) clause, the inner recent_checks subquery lacks a filter for the target
network, causing cross-network data mixing. Modify the inner subquery to include
a WHERE clause filtering health_checks by the same network parameter used in the
outer query. Then update the Rust sqlx query invocation to bind this additional
network parameter accordingly, ensuring consistent parameter usage. Finally, run
sqlx prepare to regenerate the updated query JSON file under .sqlx/.
| /// If the main node does not finish within this period, we start a hedged request. | ||
| static SOFT_TIMEOUT: Duration = TIMEOUT.checked_div(2).unwrap(); | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Add a hard timeout for request execution, not just connection establishment
SOFT_TIMEOUT hedges the main request, but there’s no hard cap on the total duration of sending the request and reading the response body; if both main and hedge stall, the request can hang indefinitely. Wrap the entire single-node execution in a hard timeout.
Example (inside proxy_to_single_node):
- // Forward the request to the node. URI stays relative, so no rewrite.
- let response = match guarded_sender.send_request(request.to_request()).await {
+ // Forward the request and enforce a hard cap on response time.
+ let response = match tokio::time::timeout(TIMEOUT, guarded_sender.send_request(request.to_request())).await {
- Ok(response) => response,
+ Ok(Ok(response)) => response,
+ Ok(Err(e)) => {
+ guarded_sender.mark_failed().await;
+ return Err(SingleRequestError::SendRequestError(e.to_string()));
+ }
+ Err(_) => {
+ // Timed out while awaiting response headers
+ guarded_sender.mark_failed().await;
+ return Err(SingleRequestError::Timeout("Request timed out".to_string()));
+ }
};And similarly time-bound the full-body collection:
- let body_bytes = match body.collect().await {
+ let body_bytes = match tokio::time::timeout(TIMEOUT, body.collect()).await {
- Ok(collected) => collected.to_bytes().to_vec(),
+ Ok(Ok(collected)) => collected.to_bytes().to_vec(),
+ Ok(Err(e)) => { /* existing error path */ }
+ Err(_) => {
+ guarded_sender.mark_failed().await;
+ return Err(SingleRequestError::Timeout("Body read timed out".to_string()));
+ }
};Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In monero-rpc-pool/src/proxy.rs around lines 28 to 30, the current SOFT_TIMEOUT
only limits the connection establishment phase but does not enforce a hard
timeout on the entire request execution including sending and receiving data. To
fix this, introduce a hard timeout that wraps the entire single-node request
execution inside the proxy_to_single_node function. Use a timeout mechanism to
ensure the full request, including reading the response body, is bounded and
cannot hang indefinitely.
| /// Proxies a singular axum::Request to a given given main node with a specified hegde node | ||
| /// If the main nodes response hasn't finished after SOFT_TIMEOUT, we proxy to the hedge node | ||
| /// We then race the two responses, and return the one that finishes first (and is not an error) | ||
| async fn proxy_to_node_with_hedge( | ||
| state: &crate::AppState, | ||
| request: CloneableRequest, | ||
| main_node: &(String, String, u16), | ||
| hedge_node: &(String, String, u16), | ||
| ) -> Result<(Response, (String, String, u16)), SingleRequestError> { | ||
| use std::future::Future; | ||
|
|
||
| // Start the main request immediately | ||
| let mut main_fut = Box::pin(proxy_to_single_node(state, request.clone(), main_node)); | ||
|
|
||
| // Hedge request will be started after the soft timeout, unless the main fails first | ||
| let mut hedge_fut: Option< | ||
| Pin<Box<dyn Future<Output = Result<Response, SingleRequestError>> + Send>>, | ||
| > = None; | ||
|
|
||
| // Timer to trigger the hedge request | ||
| let mut soft_timer = Box::pin(tokio::time::sleep(SOFT_TIMEOUT)); | ||
| let mut soft_timer_armed = true; | ||
|
|
||
| // If the main fails, keep its error to return if hedge also fails | ||
| let mut main_error: Option<SingleRequestError> = None; | ||
|
|
||
| loop { | ||
| // A future that awaits the hedge if present; otherwise stays pending | ||
| let mut hedge_wait = futures::future::poll_fn(|cx| { | ||
| if let Some(f) = hedge_fut.as_mut() { | ||
| f.as_mut().poll(cx) | ||
| } else { | ||
| std::task::Poll::Pending | ||
| } | ||
| }); | ||
|
|
||
| tokio::select! { | ||
| res = &mut main_fut => { | ||
| match res { | ||
| Ok(resp) => return Ok((resp, main_node.clone())), | ||
| Err(err) => { | ||
| // Start hedge immediately if not yet started | ||
| main_error = Some(err); | ||
| if hedge_fut.is_none() { | ||
| tracing::debug!("Starting hedge request"); | ||
| hedge_fut = Some(Box::pin(proxy_to_single_node(state, request.clone(), hedge_node))); | ||
| } | ||
|
|
||
| // If hedge exists, await it and prefer its result | ||
| if let Some(hf) = &mut hedge_fut { | ||
| let hedge_res = hf.await; | ||
| return hedge_res | ||
| .map(|resp| (resp, hedge_node.clone())) | ||
| .or_else(|_| Err(main_error.take().unwrap())); | ||
| } else { | ||
| return Err(main_error.take().unwrap()); | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // Start hedge after soft timeout if not already started | ||
| _ = &mut soft_timer, if soft_timer_armed => { | ||
| // Disarm timer so it does not keep firing | ||
| soft_timer_armed = false; | ||
| if hedge_fut.is_none() { | ||
| tracing::debug!("Starting hedge request"); | ||
| hedge_fut = Some(Box::pin(proxy_to_single_node(state, request.clone(), hedge_node))); | ||
| } | ||
| } | ||
|
|
||
| // If hedge is started, also race it | ||
| res = &mut hedge_wait => { | ||
| match res { | ||
| Ok(resp) => return Ok((resp, hedge_node.clone())), | ||
| Err(_hedge_err) => { | ||
| // Hedge failed; if main already failed, return main's error, otherwise keep waiting on main | ||
| if let Some(err) = main_error.take() { | ||
| return Err(err); | ||
| } | ||
| hedge_fut = None; | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Cancellation-safety risk: dropping the losing future may release the mutex while a request is mid-flight
Hedging drops the non-winning future when the winner returns. If the losing future is inside proxy_to_single_node and currently in send_request or reading the body, dropping it will drop the GuardedSender, releasing the per-connection mutex before the in-flight exchange is cleanly aborted. That can allow the same connection to be reused prematurely, risking protocol corruption.
Mitigations:
- Make
proxy_to_single_nodecancellation-safe by catching cancellation and marking the connection as failed in aDropguard (or ensure that any early drop marks the sender failed). - Or run each single-node request on a dedicated connection (opt-in for hedged paths) to avoid reusing a connection after abort.
- Or spawn tasks and abort the loser; ensure aborted tasks mark their connection failed.
Minimal change suggestion: have proxy_to_single_node wrap its body with tokio::select! on a cancellation token, and on cancellation call guarded_sender.mark_failed().await before returning.
🤖 Prompt for AI Agents
In monero-rpc-pool/src/proxy.rs between lines 297 and 383, the current hedging
logic drops the losing future immediately, which can prematurely release the
per-connection mutex inside proxy_to_single_node, risking protocol corruption.
To fix this, modify proxy_to_single_node to be cancellation-safe by wrapping its
main body in a tokio::select! that listens for a cancellation signal; upon
cancellation, ensure it calls guarded_sender.mark_failed().await before
returning to mark the connection as failed and prevent reuse. This minimal
change will safely handle dropped futures without corrupting connection state.
| seed_choice = | ||
| request_seed_choice(tauri_handle.clone().unwrap(), database) | ||
| .await?; | ||
| continue; |
There was a problem hiding this comment.
Avoid unwrap on optional Tauri handle — potential panic path
This unwrap can panic if invariants drift (e.g., future refactor passes seed_choice=Some with tauri_handle=None). Handle gracefully to avoid crashing.
Apply:
- seed_choice =
- request_seed_choice(tauri_handle.clone().unwrap(), database)
- .await?;
+ let handle = match &tauri_handle {
+ Some(h) => h.clone(),
+ None => {
+ tracing::error!("Cannot prompt for wallet selection without a Tauri handle");
+ return Err(anyhow::anyhow!("Missing Tauri handle while prompting for wallet selection"));
+ }
+ };
+ seed_choice = request_seed_choice(handle, database).await?;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| seed_choice = | |
| request_seed_choice(tauri_handle.clone().unwrap(), database) | |
| .await?; | |
| continue; | |
| let handle = match &tauri_handle { | |
| Some(h) => h.clone(), | |
| None => { | |
| tracing::error!("Cannot prompt for wallet selection without a Tauri handle"); | |
| return Err(anyhow::anyhow!("Missing Tauri handle while prompting for wallet selection")); | |
| } | |
| }; | |
| seed_choice = request_seed_choice(handle, database).await?; | |
| continue; |
🤖 Prompt for AI Agents
In swap/src/cli/api.rs around lines 863 to 866, the code unwraps an optional
Tauri handle which can cause a panic if the handle is None. To fix this, replace
the unwrap with proper error handling by checking if the Tauri handle is Some
before calling request_seed_choice. If it is None, handle the error gracefully,
for example by returning an error or skipping the operation, to avoid crashing
the application.
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Bug: Connection Poisoning in Proxy Hedge
In proxy_to_node_with_hedge, dropping the "losing" request future releases its GuardedSender mutex without draining the in-flight HTTP/1 response or marking the connection as failed. This leaves a poisoned connection in the pool, allowing subsequent requests to reuse it, leading to intermittent protocol errors such as "end of file before message length reached" or send_request errors.
Summary by CodeRabbit
New Features
Improvements
Bug Fixes
Removals
Chores