-
Notifications
You must be signed in to change notification settings - Fork 483
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
I'm getting super low t/s (< 5t/s for Qwen/Qwen3-30B-A3B) on mac studio with M3 ultra and 512GB unified memory. With llama.cpp, I'm getting over 50t/s. Also, running the model in gguf format isn't working: "called Result::unwrap() on an Err value: Unknown GGUF architecture qwen3moe"
Latest commit or version
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working