-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Tinygrad MPS #65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Tinygrad MPS #65
Conversation
^here's my current error, if anyone wants to tap on |
|
Threw in a Tensor recast in Metal. Here's where I'm at: |
|
Ayyy it's running (kinda), gotta confirm output but it's past 0%! Looks like attention needed to be segmented, but MPS doesn't have a segmented attention built-in. Built my own |
|
@lllyasviel , would love your take on this. Got the MPS kernels thrown in, and split the attention up to batch it. I don't know if we should expose a slider through gradio for chunk-sizing on M-series machines, but it's currently diffusing (slowly). One operation only runs on CPU on Mac, so you can run it, but it just takes a while (1h30m for a 25 frame video) |
|
Okay - finally started updating portions of this for more gains.
|
|
Just a Note, there's another fork for mac which seems to be faster https://www.reddit.com/r/StableDiffusion/comments/1k2neim/framepack_on_macos/ https://github.com/brandon929/FramePack
|
|
@e1732a364fed , appreciate it! Looking into it now. |
|
I have made pytorch supporting
250418_230711_181_7405_37.mp4
250420_173459_831_8394_28.mp4I just saw the new PR, hope it would be helpful for that. |
|
@donghao1393 I gotcha - I chunked these buffers down fairly aggressively to generate the output. Output's not good? I tried getting it out with the guy jumping, and it worked well! Let me know what's going wrong. Maybe it's because my machine's just really constrained, I've got a M3 Pro with 18GB of RAM, so I'm really constrained |
|
@mdaiter It's in the first output. The video seems not playing right. 250418_230711_181_7405_37.reoutput.mp4 |
Sorry, the auto AI stuff bonked a lot of my code. Rolling some stuff back, this is just a train of thought. Basically, just disregard this.
I'm gonna start digging to get tinygrad grafted into the transformer bit. Metal blocks (and compacting that thing down in general) should help a lot with speed + memory usage on machines.