Skip to content

New tuning for small K gemv#2620

Merged
jagrit06 merged 3 commits intomainfrom
gemv-small-k
Sep 23, 2025
Merged

New tuning for small K gemv#2620
jagrit06 merged 3 commits intomainfrom
gemv-small-k

Conversation

@jagrit06
Copy link
Copy Markdown
Member

@jagrit06 jagrit06 commented Sep 23, 2025

Proposed changes

  • Add a new tuning for small k gemv
  • Add a new tuning for small output, long K gemv

Checklist

Put an x in the boxes that apply.

  • I have read the CONTRIBUTING document
  • I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the necessary documentation (if needed)

@jagrit06
Copy link
Copy Markdown
Member Author

jagrit06 commented Sep 23, 2025

Before:

  B,     M,     N,     K,   dtype,  t,   gpbs_mx, glops_mx
  1,     1,  4096,    64, float16, nt,    25.342,    24.946
  1,     1, 12288,    64, float16, nt,    43.023,    42.358
  1,     1,    64,  4096, float16, nt,    15.712,    15.466
  1,     1,    64, 12288, float16, nt,    23.527,    23.163

After:

  B,     M,     N,     K,   dtype,  t,   gpbs_mx, gflops_mx
  1,     1,  4096,    64, float16, nt,    36.587,    36.016
  1,     1, 12288,    64, float16, nt,    70.071,    68.988
  1,     1,    64,  4096, float16, nt,    29.781,    29.316
  1,     1,    64, 12288, float16, nt,    55.257,    54.403

Copy link
Copy Markdown
Member

@awni awni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@jagrit06 jagrit06 merged commit 7c7e48d into main Sep 23, 2025
6 checks passed
@jagrit06 jagrit06 deleted the gemv-small-k branch September 23, 2025 19:28
@ivanfioravanti
Copy link
Copy Markdown
Contributor

A small preview of the effect on mlx-lm 🚀
cat 4k.txt | python -m mlx_lm generate --model mlx-community/Qwen3-30B-A3B-Instruct-2507-4bit -m 200 --temp 0.7 --top-k 20 --top-p 0.8 --prompt -

before
Generation: 200 tokens, 84.179 tokens-per-sec

after
Generation: 200 tokens, 90.274 tokens-per-sec

@awni
Copy link
Copy Markdown
Member

awni commented Sep 24, 2025

Huh - that's surprising! Are you sure it's from this PR? I don't think it should affect generation for that model unless I am missing something

@ivanfioravanti
Copy link
Copy Markdown
Contributor

I installed latest from mlx while doing tests and noticed faster performance and I thought was dut to this commit, probably is another change. 🤔

@ivanfioravanti
Copy link
Copy Markdown
Contributor

You are right @awni #2608 is the game change here!

faisalmemon pushed a commit to faisalmemon/mlx that referenced this pull request Oct 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants