Skip to content

Conversation

@heheda12345
Copy link
Collaborator

@heheda12345 heheda12345 commented Nov 2, 2025

Purpose

Clean up initialize_kv_cache after the fast development of various attention-related features in the previous months to simplify the code and dependency between steps. It mainly touches the logic of

  1. kv sharing
  2. encoder only
  3. hybrid mamba

Key modifications:

  1. Some steps can be done after model loading. Move them to the end of load_models. [may needs some discussion]
  2. Many operation (e.g., compute kernel_block_size) only needs the AttentionBackend type of each layer. We don't have to do them after builder is initialized.
  3. kernel_block_size now includes EncoderOnlyAttention
  4. simplify attn_group split logic
  5. start to move some logic into utils so that it can be used in ModelRunnerV2 in the future.

I can split the above items into separate PRs if needed

depends on
#27929
#27753

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added the v1 label Nov 2, 2025
@mergify
Copy link

mergify bot commented Nov 6, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @heheda12345.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Nov 6, 2025
Signed-off-by: Chen Zhang <[email protected]>
@mergify mergify bot removed the needs-rebase label Nov 7, 2025
Signed-off-by: Chen Zhang <[email protected]>
@mergify
Copy link

mergify bot commented Nov 11, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @heheda12345.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Nov 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant