Skip to content

Low t/s for Qwen3 on M3 ultra #1348

@otarkhan

Description

@otarkhan

Describe the bug

I'm getting super low t/s (< 5t/s for Qwen/Qwen3-30B-A3B) on mac studio with M3 ultra and 512GB unified memory. With llama.cpp, I'm getting over 50t/s. Also, running the model in gguf format isn't working: "called Result::unwrap() on an Err value: Unknown GGUF architecture qwen3moe"

Latest commit or version

380da23

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions