allow 4000 unstaked connections in TPU#8144
Conversation
|
Hey @alexpyattaev , I have the #6953 which doing the same thing but addressing other aspects as well due to the change. |
6bd5b0e to
513f7fe
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #8144 +/- ##
=========================================
- Coverage 81.9% 81.9% -0.1%
=========================================
Files 862 862
Lines 326864 326858 -6
=========================================
- Hits 267777 267771 -6
Misses 59087 59087 🚀 New features to boost your workflow:
|
| const EMA_WINDOW_MS: u64 = STREAM_LOAD_EMA_INTERVAL_MS * STREAM_LOAD_EMA_INTERVAL_COUNT; | ||
|
|
||
| /// Target TPS for unstaked connections | ||
| const UNSTAKED_MAX_TPS: u64 = 100; |
There was a problem hiding this comment.
Can you explain how you arrive this number?
There was a problem hiding this comment.
Well the original default was 200 TPS per connection, but with 2000 unstaked connections it would be a bit too much overall load if all of them suddenly want to use it (400KTPS, almost all of our overall 500KTPS quota). So toned it down a bit.
There was a problem hiding this comment.
bumped back to 200 (as it was before the changes) to keep things consistent.
b7504b2 to
2d322f0
Compare
@alessandrod with 4 max connections per IP we get almost 2x fewer evictions as expected.
|
|
Rough memory use numbers for unstaked clients, measured at 300 ms latency (maximum possible buffers, and under max load): so roughly 0.3 Mb/client (measured via heaptrack by observing peak heap allocation throughout the run). |
|
Bumped to 4000 unstaked connections allowed in the table, 4 connections per IP - we still have some amount of evictions. This all despite the fact this is the first leader slot since restart. It is far less of a problem than before, of course, but somewhat suspicious that we get evictions anyway. @lijunwangs do you have an idea why this could be happening? We were nowhere near 4000 connections in this test.
|
The pruning logic is simple: whenever we need to evict (table size >= the max connections in the table), we evict to the 90% of the desired size of the table. The open connections can be reporting the count after the evictions. We need a max table size metric in the report window to understand why we have discrepancies between the eviction count and the open connection count. |
Good point the current metric is kinda useless... I will make a PR to fix it & re-measure this. |
2d322f0 to
a292230
Compare
|
I assume you don’t see the full picture in the staked/unstaked metrics, because the rate limiters trigger first. |
755eaeb to
8b5d388
Compare
Showing evictions with max 2500 unstaked connections table size.
@alexpyattaev can we estimate what is the worst case memory usage when increasing to 4000 unstaked connections? |
~ 2 GB (assuming we merge BDP PR as well) |
How did you arrive the 2GB number? |
With current streamer each unstaked connection gets 128 KB of receive window => 512 MB for 4000 of them There is also a bunch of bookkeeping that quinn holds on a per-stream basis (~1 KB per open stream), so we can approximately double those numbers to get a total memory consumption estimate. With staked connections the story is different, they are allowed 4x (8x once BDP is merged) the amount of RX window and 4x the streams, so having 4000 of them around would get fairly expensive memory-wise. |
8b5d388 to
0069fb8
Compare
|
@bw-solana addressed the comment, rebased for nice history. Added note about tpu-max-unstaked-connections in description. |
| const STREAM_LOAD_EMA_INTERVAL_COUNT: u64 = 10; | ||
| const EMA_WINDOW_MS: u64 = STREAM_LOAD_EMA_INTERVAL_MS * STREAM_LOAD_EMA_INTERVAL_COUNT; | ||
|
|
||
| /// Maximum TPS for unstaked connections |
There was a problem hiding this comment.
nit: this comment still makes it a little ambiguous for the casual observer if this TPS limit is per connection or in aggregate across all unstaked connections
There was a problem hiding this comment.
I'm reworking SWQOS anyway, will address this also.
|
Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis. |
(cherry picked from commit 5d4b65d)
This reverts commit 5d4b65d.
…-xyz#9277) This reverts commit 5d4b65d.





Problem
Currently operators may customize this setting via hidden CLI argument --tpu-max-unstaked-connections, though most choose not to. If this change is not desired, it may be reverted at runtime by supplying desired number via CLI.
Summary of Changes
max_unstaked_connectionsparameter.Actual measured TPS for unstaked with this PR is ~205 due to timing errors in streamer impl.
Combined staked node TPS quota slightly increased with this change (20%).
Testing on a staked mainnet node, shows that 2000 connection entries in the table are sufficient to accommodate current mainnet demand during leader slots (2 leader slots captured in the plot below):
