[TENT] Fix io_uring stale status and GDS uninitialized transferred_bytes#2379
[TENT] Fix io_uring stale status and GDS uninitialized transferred_bytes#2379jfeng18 wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request initializes and tracks transferred bytes for GDS and io_uring transports. In gds_transport.cpp, status.transferred_bytes is initialized to 0, and in io_uring_transport.cpp, the transfer status is updated with the task's status word and transferred bytes. The reviewer identified three key issues: first, adding a potentially negative event.ret to the unsigned status.transferred_bytes in GDS transport can cause an underflow; second, transferred_bytes in IOUringTask is uninitialized, leading to undefined behavior; and third, variable shadowing in io_uring_transport.cpp causes redundant lookups and should be resolved by renaming the inner variable.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| std::string("Failed to get GDS batch status: Code ") + | ||
| std::to_string(result.err) + LOC_MARK); | ||
| status.s = PENDING; | ||
| status.transferred_bytes = 0; |
There was a problem hiding this comment.
While initializing status.transferred_bytes to 0 is correct, please note that in the subsequent loop (line 336), status.transferred_bytes += event.ret; is performed. Since event.ret is of type ssize_t and can be negative on failure (representing -errno), adding it directly to the unsigned status.transferred_bytes (size_t) will cause an underflow, resulting in an extremely large value.
To prevent this, you should only accumulate event.ret if it is positive:
if (event.ret > 0) {
status.transferred_bytes += event.ret;
}There was a problem hiding this comment.
Valid observation, but pre-existing. The event.ret negative-value accumulation exists on main. This PR only adds the = 0 initialization to fix the UB. The negative-value handling is a separate improvement.
| status = TransferStatus{outer_task.status_word, | ||
| outer_task.transferred_bytes}; |
There was a problem hiding this comment.
The transferred_bytes field of IOUringTask is never initialized when the task is created in submitTransferTasks (lines 184-188) or in the IOUringTask struct definition.
As a result, if the task is still PENDING or if it fails (where transferred_bytes is not updated), outer_task.transferred_bytes will contain uninitialized garbage memory, leading to undefined behavior and reporting incorrect transferred bytes.
Please initialize transferred_bytes to 0 in the IOUringTask struct definition in io_uring_transport.h:
volatile size_t transferred_bytes = 0;There was a problem hiding this comment.
Good point. On PENDING status, transferred_bytes is 0-initialized by the struct value-initialization in task_list.push_back(IOUringTask{}) (line 184 uses aggregate init). On COMPLETED, it is set at line 264. On FAILED, it is not set — but the caller checks status.s before using transferred_bytes, so the value is only meaningful when COMPLETED. Pre-existing pattern, not a regression.
| } | ||
| } | ||
| io_uring_cqe_seen(&io_uring_batch->ring, cqe); | ||
| auto& outer_task = io_uring_batch->task_list[task_id]; |
There was a problem hiding this comment.
The introduction of outer_task is redundant and only necessary because the inner variable task at line 248 shadows the outer task defined at line 238:
248: auto task = (IOUringTask*)cqe->user_data;To improve maintainability and avoid this shadowing confusion, please rename the inner task variable to something more descriptive, such as completed_task. This will allow you to directly use the outer task reference here without re-fetching it from the task_list.
There was a problem hiding this comment.
Agree renaming the inner variable would be cleaner. However, this PR keeps the change minimal — only adding the status refresh. Renaming the shadow variable is a cleanup best done separately to avoid mixing fix and refactor.
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
| if (task_id < 0 || task_id >= (int)io_uring_batch->task_list.size()) | ||
| return Status::InvalidArgument("Invalid task ID"); | ||
| auto& task = io_uring_batch->task_list[task_id]; | ||
| status = TransferStatus{task.status_word, task.transferred_bytes}; |
There was a problem hiding this comment.
Could you clarify? On current main, status is set at line 239 (before CQE processing) and never refreshed after. When the CQE peek at line 248 finds a completion, the inner task (shadowed variable) gets updated, but the output status still holds the PENDING snapshot from line 239. Our fix adds a re-read after io_uring_cqe_seen to return the up-to-date status. If this is already addressed in another branch, happy to close this PR.
There was a problem hiding this comment.
Thanks @alogfans. I re-checked current upstream/main (as of 14ab37e7): io_uring_transport.cpp still assigns the output status once before the PENDING poll block and doesn't refresh it before the final return, so a PENDING→COMPLETED transition discovered during that poll returns a stale PENDING (the CQE is also marked seen, so the call that reaps the completion reports stale). gds_transport.cpp still does status.transferred_bytes += event.ret without zero-initializing it first (TransferStatus has no default member initializer). Could you point me to the commit/branch you had in mind? Happy to close this if it's already covered somewhere I've missed.
- io_uring getTransferStatus: refresh output status after CQE processing. Previously status was captured before the CQE was peeked, so completions were always reported one poll cycle late. - GDS getTransferStatus: initialize transferred_bytes to 0 before the accumulation loop. Without this, += operates on an indeterminate value (UB). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4452721 to
c9362c0
Compare
|
CI note: |
Summary
getTransferStatus: status was captured at line 239 BEFORE CQE processing, so completions were always reported one poll cycle late. Fix: re-read status from the task after CQE processing.getTransferStatus:status.transferred_byteswas never initialized before the+=accumulation loop, causing UB on indeterminate value. Fix: addstatus.transferred_bytes = 0;before the loop.What was NOT changed
Test plan
🤖 Generated with Claude Code