Skip to content

Conversation

@TheEpicDolphin
Copy link
Collaborator

@TheEpicDolphin TheEpicDolphin commented Sep 23, 2025

benchmarks comparing standard vs tree spec decode with new FA2 + mask kernel: https://docs.google.com/spreadsheets/d/1imDQmv-5yPbDZwWRD7FslDUw5KQ794ET8RY7701jfmk/edit?usp=sharing

@mergify
Copy link

mergify bot commented Sep 23, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @TheEpicDolphin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@foolusion
Copy link

I had to make a change for the drafter to use the tree, but it didn't seem to improve the performance

out.patch

@TheEpicDolphin TheEpicDolphin force-pushed the test_tree_flash_attn branch 2 times, most recently from 3c94337 to afd134b Compare September 25, 2025 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants