Implement Sequence Distribution feature with Striding#2094
Implement Sequence Distribution feature with Striding#2094vadlakondaswetha wants to merge 1 commit into
Conversation
|
@vadlakondaswetha: which performs I/O within the zone at random offsets but restricted to a given zone until it's "full" (at which point it moves on to the next zone)? Is it that the precise order of access within a zone is under directly user specified rather than being sequential or random? |
Thats right. This provides reading data in a precise order controlled by the user. Right now, repeating the same sequence across the entire file is controlled by random_sequence_stride option. We can remove this and achieve the same by the combination of random_distribution and zone option. Its a bit complex to configure. Eg:
|
|
@vadlakondaswetha:
|
ankit-sam
left a comment
There was a problem hiding this comment.
Hi @vadlakondaswetha I added a couple of review comments, please check them.
Apart from what @sitsofe mentioned about the documentation and example file, I have few other concerns
- The sequence values are not checked, so user can pass duplicate values. This breaks norandommap=0 behavior.
- For mix workloads, do we need separate sequence like random_sequence_read and random_sequence_write? As in that case both read and write offsets will be identical everytime.
- Not urgent, but I think it will be great to support ranges like
0-5, 10-20. we already have options likebsrange, plidsetc.
This change extends fio to support a new random distribution pattern called `sequence`. It allows users to specify a fixed repeating sequence of block indices for I/O operations using the syntax `random_distribution=sequence:2,0,1`. Additionally, it introduces the `random_sequence_stride` boolean option. When enabled (1), the sequence progresses through the file as a Strided Block Group pattern (e.g., 2,0,1, 5,3,4, 8,6,7...), automatically advancing the base block index by the sequence length after each cycle. Integration tests are added in `t/sequence.py`. Signed-off-by: Swetha Vadlakonda <swethv@google.com>
1cd3aaa to
587715e
Compare
Thanks for the review.
|
Thanks for the review :) 1 & 2: Added required files |
|
@sitsofe and @ankit-sam - PTAL, replied to your comments. Thanks. |
|
Hi @vincentkfu / @axboe , Gentle ping on this PR review. PTAL and let me know if there are any concerns/questions for adding this feature. |
This PR introduces a new generic random distribution primitive: sequence.
While fio currently provides primitives for linear, pseudo-random
(randread), and strided access patterns, it lacks a mechanism to simulate
deterministic, local non-linearity within a repeating stride.
This access pattern is required to test modern workloads like:
LLM Inference Weights Loading
Reading .safetensors model files which contain a list of tensor files.
Even though the file is read in sequential mode, within a tensor, data
will be requested in a particular order which is not purely sequential.
Database Engine Log-Merging / LSM-Trees
Scenarios where specific block indices (like parity blocks or metadata
headers) must be systematically read out-of-order within every repeating
chunk or block group.
Existing option is to use read_iolog. Using a generated
read_iologfor multi-terabyte model benchmarks issub-optimal as the trace files become massive, unscalable, and lack
dynamic flexibility across varying block sizes. The
sequencedistribution resolves this by calculating offsets algorithmically on
the fly.
NEW OPTIONS INTRODUCED
sequencedistribution behavior.