[CUDA] Tune ops per buffer based on device by awni · Pull Request #2761 · ml-explore/mlx

awni · 2025-11-14T01:12:08Z

We need a more sophisticated policy to set ops per buffer based on the device. This is a start to that.

For inference on B200 it helps a lot for inference to increase it at very little memory cost.

mlx_lm.benchmark --model meta-llama/Meta-Llama-3.1-8B --p 128 -g 128 -b 1 -n 4

Pre: generation_tps=244.456, peak_memory=16.166
Post: generation_tps=283.073, peak_memory=16.224

For training 0.6B it's a double win, faster and less RAM 💪

	Toks/sec	Mem (GB)
Pre	60631	54.51
Post	64078	51.63

angeloskath

Beautiful 🚀

awni · 2025-11-16T14:29:46Z

This will probably need more tuning in the future especially for devices that I didn't add yet. But for now I think it's good to merge.

awni changed the title ~~Tune ops per buffer based on device~~ [CUDA] Tune ops per buffer based on device Nov 14, 2025

Awni Hannun added 2 commits November 15, 2025 00:24

tune ops per buffer based on device

48eb7bb

tune memory limit as well

7ed3c82

awni force-pushed the tune_ops_per_buffer branch from 011a737 to 43d2f55 Compare November 15, 2025 00:25

angeloskath approved these changes Nov 15, 2025

View reviewed changes

add tuning for spark

e2694be

awni force-pushed the tune_ops_per_buffer branch from 43d2f55 to e2694be Compare November 15, 2025 04:21

awni merged commit aad49f9 into ml-explore:main Nov 16, 2025
9 checks passed

BrewTestBot mentioned this pull request Nov 20, 2025

mlx 0.30.0 Homebrew/homebrew-core#255173

Merged

1 task

awni deleted the tune_ops_per_buffer branch November 22, 2025 05:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA] Tune ops per buffer based on device#2761

[CUDA] Tune ops per buffer based on device#2761
awni merged 3 commits intoml-explore:mainfrom
awni:tune_ops_per_buffer

awni commented Nov 14, 2025 •

edited

Loading

Uh oh!

angeloskath left a comment

Uh oh!

awni commented Nov 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

awni commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

angeloskath left a comment

Choose a reason for hiding this comment

Uh oh!

awni commented Nov 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

awni commented Nov 14, 2025 •

edited

Loading