Skip to content

Better CPU prompt processing performance for SWA models#702

Closed
ikawrakow wants to merge 4 commits intomainfrom
ik/cpu_swa_v1
Closed

Better CPU prompt processing performance for SWA models#702
ikawrakow wants to merge 4 commits intomainfrom
ik/cpu_swa_v1

Conversation

@ikawrakow
Copy link
Copy Markdown
Owner

This PR is a fixed version of #696, see there for details.

The crashes we were getting on #696 are due to the back-end not allocating a buffer for the tensor containing the mask bounds when this tensor is not used in the graph. Now we only create the mask bounds if they are actually used (SWA models with FA enabled).

Iwan Kawrakow added 4 commits August 18, 2025 07:58
If we allocate the tensor for the mask bounds, but then don't use
it, we get a crash in the back-end. Hence, we only allocate the
bounds tensor when using FA.
@ikawrakow
Copy link
Copy Markdown
Owner Author

Closing in favor of #757

@ikawrakow ikawrakow closed this Sep 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant