Implement safe chunked KV writes for CUDA graph capture #6
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Changes
KV Cache
_slice_scale_segmentto support per-segment scale slicing based on entry lengthDeferredWriteManagerwith:_req_start_offsetsto map draft tokens to request offsetscommitnow accepting a sequence of accepted counts instead of a boolean maskGPU Model Runner
_build_nwor_acceptance_masknow returns a tuple(mask, counts)counts_scv_vectorized_masknow returns(mask_work, accepted_counts)when availableaccepted_countstoDeferredWriteManager.commitTests
manager.commit([1])instead of a boolean mask_build_nwor_acceptance_maskreturns(mask, counts)and thatcountsmatch expectationsWhy
API Changes
DeferredWriteManager.commitsignature changed fromcommit(mask: Tensor)tocommit(accepted_counts: Sequence[int])_build_nwor_acceptance_masknow returns(mask: Tensor, counts: list[int])instead of a solemaskBackward Compatibility
accepted_countsinterface.Testing Plan
pytest tests/v1/test_deferred_writer.py[1]produce expected writes and that counts are correctly propagated_build_nwor_acceptance_maskreturns both mask and counts and that counts align with draft token segmentationNotes for reviewers
🌿 Generated by Terry
ℹ️ Tag @terragon-labs to ask questions and address PR feedback
📎 Task: https://www.terragonlabs.com/task/7ed05cf2-ef51-40c2-b908-a395d47b0386