fix(studio): build llama.cpp from master for Gemma 4 support#4790
fix(studio): build llama.cpp from master for Gemma 4 support#4790danielhanchen merged 1 commit intomainfrom
Conversation
The latest ggml-org/llama.cpp release (b8635) does not include Gemma 4 support (ggml-org/llama.cpp#21309 merged after the release was cut). This causes `llama-server` to fail with "unknown model architecture: gemma4" when loading Gemma 4 GGUFs. Temporarily default _DEFAULT_LLAMA_TAG to "master" so all new installs build from the llama.cpp master branch which includes Gemma 4 support. Once a new upstream release is cut with Gemma 4, this can be reverted back to "latest". Changes: - setup.sh: add _DEFAULT_LLAMA_TAG="master" maintainer default - setup.ps1: add $DefaultLlamaTag="master" maintainer default - install_llama_prebuilt.py: change DEFAULT_LLAMA_TAG fallback to "master" Users can still override via UNSLOTH_LLAMA_TAG env var.
There was a problem hiding this comment.
Code Review
This pull request updates the default llama.cpp tag from "latest" to "master" across the Python, PowerShell, and Shell setup scripts. Feedback highlights that switching to "master" while using the ggml-org repository forces source builds for all users, leading to significant performance regressions and reduced installation stability. Additionally, there is a suggestion to improve environment variable handling in the Python script to robustly handle empty strings and a concern regarding the maintenance overhead of duplicating the default tag across multiple files.
| # ────────────────────────────────────────────────────────────────────────── | ||
| _DEFAULT_LLAMA_PR_FORCE="" | ||
| _DEFAULT_LLAMA_SOURCE="https://github.com/ggml-org/llama.cpp" | ||
| _DEFAULT_LLAMA_TAG="master" |
There was a problem hiding this comment.
Setting the default tag to "master" while using ggml-org/llama.cpp as the helper repository effectively forces a source build for all users, as ggml-org does not host the Unsloth-specific manifest and checksum files required for prebuilt resolution. This results in a significant performance regression (5-10 minutes per setup run) because the scripts lack logic to skip the build if the existing installation is already at the correct commit. Furthermore, "master" is a moving target, making installations less reproducible and potentially unstable.
| # -------------------------------------------------------------------------- | ||
| $DefaultLlamaPrForce = "" | ||
| $DefaultLlamaSource = "https://github.com/ggml-org/llama.cpp" | ||
| $DefaultLlamaTag = "master" |
There was a problem hiding this comment.
Setting the default tag to "master" while using ggml-org/llama.cpp as the helper repository effectively forces a source build for all users, as ggml-org does not host the Unsloth-specific dummy manifest and checksum files required for prebuilt resolution. This results in a significant performance regression (5-10 minutes per setup run) because the scripts lack logic to skip the build if the existing installation is already at the correct commit. Furthermore, "master" is a moving target, making installations less reproducible and potentially unstable.
|
|
||
|
|
||
| DEFAULT_LLAMA_TAG = os.environ.get("UNSLOTH_LLAMA_TAG", "latest") | ||
| DEFAULT_LLAMA_TAG = os.environ.get("UNSLOTH_LLAMA_TAG", "master") |
There was a problem hiding this comment.
The default tag is now duplicated across three files (install_llama_prebuilt.py, setup.sh, and setup.ps1), which increases maintenance overhead. Additionally, using os.environ.get(..., "master") will return an empty string if the environment variable is set but empty, which might lead to unexpected behavior in normalized_requested_llama_tag (which defaults to 'latest' for empty strings). Using or ensures that empty strings also fall back to the intended default value.
| DEFAULT_LLAMA_TAG = os.environ.get("UNSLOTH_LLAMA_TAG", "master") | |
| DEFAULT_LLAMA_TAG = os.environ.get("UNSLOTH_LLAMA_TAG") or "master" |
Summary
latestrelease tag tomasterbranchllama-serverto fail withunknown model architecture: 'gemma4'when loading any Gemma 4 GGUFChanges
setup.sh: add_DEFAULT_LLAMA_TAG="master"maintainer-editable defaultsetup.ps1: add$DefaultLlamaTag="master"maintainer-editable defaultinstall_llama_prebuilt.py: changeDEFAULT_LLAMA_TAGfallback from"latest"to"master"Users can still override via
UNSLOTH_LLAMA_TAGenv var. Once a new upstream release is cut with Gemma 4 support, revert these back to"latest".Test plan
install.sh --localbuilds llama.cpp from master HEADUNSLOTH_LLAMA_TAG=b8635override still works (builds from that tag)