Skip to content

Conversation

@qnixsynapse
Copy link
Contributor

Describe Your Changes

The KV cache size calculation in estimate_kv_cache_internal now includes a fallback mechanism for models that do not explicitly define key_length and value_length in the GGUF metadata.

If these attention keys are missing, the head dimension (and thus key/value length) is calculated using the formula embedding_length / total_heads. This improves robustness and compatibility with GGUF models that don't have the proper keys in metadata.

Also adds logging of the full model metadata for easier debugging of the estimation process.

Fixes Issues

Self Checklist

  • Added relevant comments, esp in complex areas
  • Updated docs (for bug fixes / features)
  • Created issues for follow-up changes or refactoring needed

The KV cache size calculation in estimate_kv_cache_internal now includes a fallback mechanism for models that do not explicitly define key_length and value_length in the GGUF metadata.

If these attention keys are missing, the head dimension (and thus key/value length) is calculated using the formula embedding_length / total_heads. This improves robustness and compatibility with GGUF models that don't have the proper keys in metadata.

Also adds logging of the full model metadata for easier debugging of the estimation process.
@github-actions
Copy link
Contributor

Barecheck - Code coverage report

Total: 29.66%

Your code coverage diff: 0.01% ▴

✅ All code changes are covered

@Minh141120
Copy link
Member

LGTM on Windows, I'll proceed the merge.
image

image image image

@Minh141120 Minh141120 merged commit 04fcd78 into release/v0.7.0 Sep 30, 2025
17 checks passed
@Minh141120 Minh141120 deleted the fix/6626 branch September 30, 2025 06:42
@github-project-automation github-project-automation bot moved this to QA in Jan Sep 30, 2025
@github-actions github-actions bot added this to the v0.7.0 milestone Sep 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants