Skip to content

[TENT] Fix io_uring stale status and GDS uninitialized transferred_bytes#2379

Open
jfeng18 wants to merge 1 commit into
kvcache-ai:mainfrom
jfeng18:fix/tent-transport-correctness
Open

[TENT] Fix io_uring stale status and GDS uninitialized transferred_bytes#2379
jfeng18 wants to merge 1 commit into
kvcache-ai:mainfrom
jfeng18:fix/tent-transport-correctness

Conversation

@jfeng18

@jfeng18 jfeng18 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

  • io_uring getTransferStatus: status was captured at line 239 BEFORE CQE processing, so completions were always reported one poll cycle late. Fix: re-read status from the task after CQE processing.
  • GDS getTransferStatus: status.transferred_bytes was never initialized before the += accumulation loop, causing UB on indeterminate value. Fix: add status.transferred_bytes = 0; before the loop.

What was NOT changed

  • No changes to submitTransferTasks, allocateSubBatch, or freeSubBatch
  • No changes to other TENT transports (bufio, rdma, shm, tcp)

Test plan

  • CI build (TENT mode)
  • Code review: io_uring status refresh uses task_id-indexed access, GDS init is before loop

🤖 Generated with Claude Code

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request initializes and tracks transferred bytes for GDS and io_uring transports. In gds_transport.cpp, status.transferred_bytes is initialized to 0, and in io_uring_transport.cpp, the transfer status is updated with the task's status word and transferred bytes. The reviewer identified three key issues: first, adding a potentially negative event.ret to the unsigned status.transferred_bytes in GDS transport can cause an underflow; second, transferred_bytes in IOUringTask is uninitialized, leading to undefined behavior; and third, variable shadowing in io_uring_transport.cpp causes redundant lookups and should be resolved by renaming the inner variable.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

std::string("Failed to get GDS batch status: Code ") +
std::to_string(result.err) + LOC_MARK);
status.s = PENDING;
status.transferred_bytes = 0;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While initializing status.transferred_bytes to 0 is correct, please note that in the subsequent loop (line 336), status.transferred_bytes += event.ret; is performed. Since event.ret is of type ssize_t and can be negative on failure (representing -errno), adding it directly to the unsigned status.transferred_bytes (size_t) will cause an underflow, resulting in an extremely large value.

To prevent this, you should only accumulate event.ret if it is positive:

if (event.ret > 0) {
    status.transferred_bytes += event.ret;
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid observation, but pre-existing. The event.ret negative-value accumulation exists on main. This PR only adds the = 0 initialization to fix the UB. The negative-value handling is a separate improvement.

Comment on lines +269 to +270
status = TransferStatus{outer_task.status_word,
outer_task.transferred_bytes};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The transferred_bytes field of IOUringTask is never initialized when the task is created in submitTransferTasks (lines 184-188) or in the IOUringTask struct definition.

As a result, if the task is still PENDING or if it fails (where transferred_bytes is not updated), outer_task.transferred_bytes will contain uninitialized garbage memory, leading to undefined behavior and reporting incorrect transferred bytes.

Please initialize transferred_bytes to 0 in the IOUringTask struct definition in io_uring_transport.h:

volatile size_t transferred_bytes = 0;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. On PENDING status, transferred_bytes is 0-initialized by the struct value-initialization in task_list.push_back(IOUringTask{}) (line 184 uses aggregate init). On COMPLETED, it is set at line 264. On FAILED, it is not set — but the caller checks status.s before using transferred_bytes, so the value is only meaningful when COMPLETED. Pre-existing pattern, not a regression.

}
}
io_uring_cqe_seen(&io_uring_batch->ring, cqe);
auto& outer_task = io_uring_batch->task_list[task_id];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The introduction of outer_task is redundant and only necessary because the inner variable task at line 248 shadows the outer task defined at line 238:

248:        auto task = (IOUringTask*)cqe->user_data;

To improve maintainability and avoid this shadowing confusion, please rename the inner task variable to something more descriptive, such as completed_task. This will allow you to directly use the outer task reference here without re-fetching it from the task_list.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree renaming the inner variable would be cleaner. However, this PR keeps the change minimal — only adding the status refresh. Renaming the shadow variable is a cleanup best done separately to avoid mixing fix and refactor.

@codecov-commenter

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

if (task_id < 0 || task_id >= (int)io_uring_batch->task_list.size())
return Status::InvalidArgument("Invalid task ID");
auto& task = io_uring_batch->task_list[task_id];
status = TransferStatus{task.status_word, task.transferred_bytes};

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify? On current main, status is set at line 239 (before CQE processing) and never refreshed after. When the CQE peek at line 248 finds a completion, the inner task (shadowed variable) gets updated, but the output status still holds the PENDING snapshot from line 239. Our fix adds a re-read after io_uring_cqe_seen to return the up-to-date status. If this is already addressed in another branch, happy to close this PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alogfans. I re-checked current upstream/main (as of 14ab37e7): io_uring_transport.cpp still assigns the output status once before the PENDING poll block and doesn't refresh it before the final return, so a PENDINGCOMPLETED transition discovered during that poll returns a stale PENDING (the CQE is also marked seen, so the call that reaps the completion reports stale). gds_transport.cpp still does status.transferred_bytes += event.ret without zero-initializing it first (TransferStatus has no default member initializer). Could you point me to the commit/branch you had in mind? Happy to close this if it's already covered somewhere I've missed.

- io_uring getTransferStatus: refresh output status after CQE
  processing. Previously status was captured before the CQE was
  peeked, so completions were always reported one poll cycle late.
- GDS getTransferStatus: initialize transferred_bytes to 0 before
  the accumulation loop. Without this, += operates on an
  indeterminate value (UB).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jfeng18 jfeng18 force-pushed the fix/tent-transport-correctness branch from 4452721 to c9362c0 Compare June 16, 2026 06:10
@jfeng18

jfeng18 commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

CI note: build (3.10) red is from storage_backend_e2e_test.DiskOnlyReadAfterEviction (80/81 passed, exit code 8) — a timing-dependent eviction test unrelated to this PR. This PR only touches io_uring/GDS transport (4 lines); the failing test is TCP-only Store eviction and does not exercise either path. The same commit passed this test on build (3.12). A re-run should clear it.

@jfeng18 jfeng18 requested a review from alogfans June 20, 2026 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants