Skip to content

[CUDA] Wait for tasks in cuda#2636

Merged
awni merged 1 commit intoml-explore:mainfrom
awni:cuda_wait_tasks
Sep 30, 2025
Merged

[CUDA] Wait for tasks in cuda#2636
awni merged 1 commit intoml-explore:mainfrom
awni:cuda_wait_tasks

Conversation

@awni
Copy link
Copy Markdown
Member

@awni awni commented Sep 30, 2025

This helps keep the work submission from running to far ahead of complete work for the CUDA back-end. I'ts virtually identical to what we do for Metal.

Some benchmarks:

Model Pre It/sec Post it/sec Pre Peak Mem GB Post Peak Mem GB
0.6 B 6.45 6.44 54.3 28.9
0.86 B 3.59 5.82 64.79 36.47

Generation speed is unaffected.

@awni
Copy link
Copy Markdown
Member Author

awni commented Sep 30, 2025

Thanks @nastya236 for diagnosing the issue here.

Copy link
Copy Markdown
Member

@angeloskath angeloskath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏

@awni awni merged commit bbf1423 into ml-explore:main Sep 30, 2025
7 checks passed
@awni awni deleted the cuda_wait_tasks branch October 13, 2025 14:55
faisalmemon pushed a commit to faisalmemon/mlx that referenced this pull request Oct 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants