Skip to content

Releases: oobabooga/textgen

v4.4 - MCP server support!

07 Apr 00:56
9dcf574

Choose a tag to compare

Changes

  • MCP server support: Use remote MCP servers from the UI. Just add one server URL per line in the new "MCP servers" field in the Chat tab and send a message. Tools will be discovered automatically and used alongside local tools. [Tutorial]
  • Several UI improvements, further modernizing the theme:
    • Improve hover menu appearance in the Chat tab.
    • Improve scrollbar styling (thinner, more rounded).
    • Improve message text contrast and heading colors.
    • Improve message action icon visibility in light mode.
    • Make blockquote, table, and hr borders more subtle and consistent.
    • Improve accordion outline styling.
    • Reduce empty space between chat input and message contents.
    • Hide spin buttons on all sliders (these looked ugly on Windows).
    • Show filename tooltip on file attachments in the chat input.
  • Add Windows + ROCm portable builds.
  • Image generation: Embed metadata in API responses. PNG images returned by the API now include generation settings (model, seed, dimensions, steps, CFG scale, sampler) in the file metadata.
  • API: Add instruction_template and instruction_template_str parameters in the model load endpoint.
  • API: Remove the deprecated settings parameter from the model load endpoint.
  • Move the cpu-moe checkbox to extra flags (no longer needed now that --fit exists).

Bug fixes

  • Fix inline LaTeX rendering: $...$ expressions are now protected from being parsed as markdown (#7423).
  • Fix crash when truncating prompts with tool call messages.
  • Fix "address already in use" on server restart (Linux/macOS).
  • Fix GPT-OSS reasoning tags briefly leaking into streamed output between thinking and tool calls.
  • Fix tool call check sometimes truncating visible text at end of generation.
  • Fix image generation failing with Flash Attention 2 errors by defaulting attention to SDPA.
  • Fix loader args leaking between sequential API model loads.
  • Fix IPv6 address formatting in the API.

Dependency updates


Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip/extract, and run.

Note

NVIDIA GPU: If nvidia-smi reports CUDA Version >= 13.1, use the cuda13.1 build. Otherwise, use cuda12.4.

ik_llama.cpp is a llama.cpp fork with new quant types. If unsure, use the llama.cpp column.

Windows

GPU/Platform llama.cpp ik_llama.cpp
NVIDIA (CUDA 12.4) Download (777 MB) Download (1.09 GB)
NVIDIA (CUDA 13.1) Download (698 MB) Download (1.19 GB)
AMD/Intel (Vulkan) Download (207 MB)
AMD (ROCm 7.2) Download (516 MB)
CPU only Download (191 MB) Download (192 MB)

Linux

GPU/Platform llama.cpp ik_llama.cpp
NVIDIA (CUDA 12.4) Download (761 MB) Download (1.09 GB)
NVIDIA (CUDA 13.1) Download (712 MB) Download (1.21 GB)
AMD/Intel (Vulkan) Download (223 MB)
AMD (ROCm 7.2) Download (329 MB)
CPU only Download (207 MB) Download (217 MB)

macOS

Architecture llama.cpp
Apple Silicon (arm64) Download (181 MB)
Intel (x86_64) Download (187 MB)

Updating a portable install:

  1. Download and extract the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Starting with 4.0, you can also move user_data one folder up, next to the install folder. It will be detected automatically, making updates easier:

text-generation-webui-4.0/
text-generation-webui-4.1/
user_data/                    <-- shared by both installs

v4.3.3 - Gemma 4 support!

04 Apr 00:05
62e67ad

Choose a tag to compare

Changes

  • Gemma 4 support with tool-calling in the API and UI. 🆕 - v4.3.1.
  • ik_llama.cpp support: Add ik_llama.cpp as a new backend through new textgen-portable-ik portable builds and a new --ik flag for full installs. ik_llama.cpp is a fork by the author of the imatrix quants, including support for new quant types, significantly more accurate KV cache quantization (via Hadamard KV cache rotation, enabled by default), and optimizations for MoE models and CPU inference.
  • API: Add echo + logprobs for /v1/completions. The completions endpoint now supports the echo and logprobs parameters, returning token-level log probabilities for both prompt and generated tokens. Token IDs are also included in the output via a new top_logprobs_ids field.
  • Further optimize my custom gradio fork, saving up to 50 ms per UI event (button click, etc).
  • Transformers: Autodetect torch_dtype from model config instead of always forcing bfloat16/float16. The --bf16 flag still works as an override.
  • Remove the obsolete models/config.yaml file. Instruction templates are now detected from model metadata instead of filename patterns.
  • Rename "truncation length" to "context length" in the terminal log message.

Security

  • Gradio fork: Fix ACL bypass via case-insensitive path matching on Windows/macOS.
  • Gradio fork: Add server-side validation for Dropdown, Radio, and CheckboxGroup.
  • Sanitize filenames in all prompt file operations (CWE-22). Thanks, @ffulbtech. 🆕 - v4.3.3.
  • Fix SSRF in superbooga extensions: URLs fetched by superbooga/superboogav2 are now validated to block requests to private/internal networks.

Bug fixes

  • Fix --idle-timeout failing on encode/decode requests and not tracking parallel generation properly.
  • Fix stopping string detection for chromadb/context-1 (<|return|> vs <|result|>).
  • Fix Qwen3.5 MoE failing to load via ExLlamav3_HF.
  • Fix ban_eos_token not working for ExLlamav3. EOS is now suppressed at the logit level.
  • Fix "Value: None is not in the list of choices: []" Gradio error introduced in v4.3. 🆕 - v4.3.2.
  • Fix Dropdown/Radio/CheckboxGroup crash when choices list is empty. 🆕 - v4.3.3.
  • Fix API crash when parsing tool calls from non-dict JSON model output. 🆕 - v4.3.3.
  • Fix llama.cpp crashing due to failing to parse the Gemma 4 template (even though we don't use llama.cpp's jinja parser). 🆕 - v4.3.2.

Dependency updates


Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip/extract, and run.

Note

NVIDIA GPU: If nvidia-smi reports CUDA Version >= 13.1, use the cuda13.1 build. Otherwise, use cuda12.4.

ik_llama.cpp is a llama.cpp fork with new quant types. If unsure, use the llama.cpp column.

Windows

GPU/Platform llama.cpp ik_llama.cpp
NVIDIA (CUDA 12.4) Download (758 MB) Download (1.12 GB)
NVIDIA (CUDA 13.1) Download (681 MB) Download (1.17 GB)
AMD/Intel (Vulkan) Download (191 MB)
AMD (ROCm 7.2) Download (499 MB)
CPU only Download (175 MB) Download (175 MB)

Linux

GPU/Platform llama.cpp ik_llama.cpp
NVIDIA (CUDA 12.4) Download (753 MB) Download (1.12 GB)
NVIDIA (CUDA 13.1) Download (706 MB) Download (1.2 GB)
AMD/Intel (Vulkan) Download (217 MB)
AMD (ROCm 7.2) Download (323 MB)
CPU only Download (201 MB) Download (211 MB)

macOS

Architecture llama.cpp
Apple Silicon (arm64) Download (173 MB)
Intel (x86_64) Download (179 MB)

Updating a portable install:

  1. Download and extract the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Starting with 4.0, you can also move user_data one folder up, next to the install folder. It will be detected automatically, making updates easier:

text-generation-webui-4.0/
text-generation-webui-4.1/
user_data/                    <-- shared by both installs

v4.3.2

03 Apr 17:08
0050a33

Choose a tag to compare

v4.3.1

03 Apr 03:54
b11379f

Choose a tag to compare

v4.3

03 Apr 01:22
9374a4e

Choose a tag to compare

v4.2

24 Mar 19:39
dd9d254

Choose a tag to compare

Before After
before after

Changes

  • Anthropic-compatible API: A new /v1/messages endpoint lets you connect Claude Code, Cursor, and other Anthropic API clients. Supports system messages, content blocks, tool use, tool results, image inputs, and thinking blocks. To use with Claude Code: ANTHROPIC_BASE_URL=http://127.0.0.1:5000 claude.
  • Updated UI theme: New colors, borders, and button styles across light and dark modes.
  • --extra-flags now supports literal flags: You can now pass flags directly, e.g. --extra-flags "--rpc 192.168.1.100:50052 --jinja". The old key=value format is still accepted for backwards compatibility.
  • Training
    • Enable gradient_checkpointing by default for lower VRAM usage during training.
    • Remove the arbitrary higher_rank_limit parameter.
    • Reorganize the training UI.
  • Strip thinking blocks before tool-call parsing to prevent false-positive tool call detection from <think> content.
  • Move the OpenAI-compatible API from extensions/openai to modules/api. The old --extensions openai flag is still accepted as an alias for --api.
  • Set top_p=0.95 as the default sampling parameter for API requests.
  • Remove 52 obsolete instruction templates from 2023 (Airoboros, Baichuan, Guanaco, Koala, Vicuna v0, MOSS, etc.).
  • Reduce portable build sizes by using a stripped Python distribution.

Bug fixes

  • Fix prompt corruption when continuing a chat with context truncation (#7439). Thanks, @Phrosty1.
  • Fix multi-turn thinking block corruption for Kimi models.
  • Fix AMD installer failing to resolve ROCm triton dependency.
  • Fix the --share feature in the Gradio fork.
  • Fix --extra-flags breaking short long-form-only flags like --rpc.
  • Fix the instruction template delete dialog not appearing.
  • Fix file handle leaks and redundant re-reads in model metadata loading (#7422). Thanks, @alvinttang.
  • Fix superboogav2 broken delete endpoint (#6010). Thanks, @Raunak-Kumar7.
  • Fix leading spaces in post-reasoning content in API responses.
  • Fix Cloudflare tunnel retry logic raising after the first failed attempt instead of exhausting retries.
  • Fix OPENEDAI_DEBUG=0 being treated as truthy.
  • Fix mutable default argument in LogitsBiasProcessor (#7426). Thanks, @Jah-yee.

Dependency updates


Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip/extract, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda13.1, or cuda12.4 if you have older drivers.
    • AMD/Intel GPU: Use vulkan builds.
    • AMD GPU (ROCm): Use rocm builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.
    • Intel: Use macos-x86_64.

Updating a portable install:

  1. Download and extract the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Starting with 4.0, you can also move user_data one folder up, next to the install folder. It will be detected automatically, making updates easier:

text-generation-webui-4.0/
text-generation-webui-4.1/
user_data/                    <-- shared by both installs

v4.1.1

18 Mar 05:33

Choose a tag to compare

print

Changes

  • Tool-calling in the UI!: Models can now call custom functions during chat. Each tool is a single .py file in user_data/tools, and five examples are provided: web_search, fetch_webpage, calculate, get_datetime, and roll_dice. During streaming, each tool call appears as a collapsible accordion similar to the existing thinking blocks, showing the called function, the arguments chosen by the LLM, and the output. [Tutorial]
  • Replace html2text with trafilatura for extracting text from web pages, reducing boilerplate like navigation bars significantly and saving tokens in agentic tool-calling loops.
  • OpenAI API improvements:
    • Rewrite logprobs support for full spec compliance across llama.cpp, ExLlamaV3, and Transformers backends. Both streaming and non-streaming responses now return token-by-token logprobs.
    • Add a reasoning_content field for thinking blocks in both streaming and non-streaming chat completions. Now thinking blocks go exclusively in this field, and content only shows the post-thinking reply, even when tool calls are present.
    • Add tool_choice support and fix the tool_calls response format for strict spec compliance.
    • Put mid-conversation system messages in the correct positions in the prompt instead of collapsing all system messages at the top.
    • Add support for the developer role, which is mapped to system.
    • Add max_completion_tokens as an alias for max_tokens.
    • Include /v1 in the API URL printed to the terminal since that's what most clients expect.
    • Make the /v1/models endpoint show only the currently loaded model.
    • Add stream_options support with include_usage for streaming responses.
    • Return finish_reason: tool_calls when tool calls are detected.
    • Several other spec compliance improvements after careful auditing.
  • llama.cpp
    • Set ctx-size to 0 (auto) by default. Note: this only works when --gpu-layers is also set to -1, which is the default value. When using other loaders, 0 maps to 8192.
    • Reduce the --fit-target default from 1024 MiB to 512 MiB.
    • Use --fit-ctx 8192 to set 8192 as the minimum acceptable ctx size for --fit on (llama.cpp uses 4096 by default).
    • Make logit_bias and logprobs functional in API calls.
    • Add missing custom_token_bans parameter in the UI.
  • ExLlamaV3
    • Add native logit_bias and logprobs support.
    • Load the vision model and the draft model before the main model so memory auto-splitting accounts for them.
  • New default preset: "Top-P" (top_p: 0.95), following recommendations for several SOTA open-weights models. The old "Qwen3 - Thinking", "Qwen3 - No Thinking", "min_p", and "Instruct" presets have been removed.
  • Refactor reasoning/thinking extraction into a standalone module supporting multiple model formats (Qwen, GPT-OSS, Solar, seed:think, and others). Also detect when a chat template appends <think> to the prompt and prepend it to the reply, so the thinking block appears immediately during streaming.
  • Incognito chat: This option has been added next to the existing "New chat" button. Incognito chats are temporary, live in RAM and are never saved to disk.
  • Optimize chat streaming performance by updating the DOM only once per animation frame.
  • Increase the ctx-size slider maximum to 1M tokens in the UI, with 1024 step.
  • Add a new drag-and-drop UI component for reordering "Sampler priority" items.
  • Make all chat styles consistent with instruct style in spacings, line heights, etc., improving the quality and consistency of those styles.
  • Remove the gradio import in --nowebui mode, saving some 0.5-0.8 seconds on startup.
  • Force-exit the webui on repeated Ctrl+C.
  • Improve the --multi-user warning to make the known limitations transparent.
  • Remove the rope scaling parameters (alpha_value, rope_freq_base, compress_pos_emb). Models now have 128k+ context, and those parameters are from the 4096 context era; the parameters can still be passed to llama.cpp through --extra-flags if needed.
  • Optimize wheel downloads in the one-click installer to only download wheels that actually changed between updates. Previously all wheels would get downloaded if at least 1 of them had changed.
  • Update the Intel Arc PyTorch installation command in the one-click installer, removing the dependency on Intel oneAPI conda packages.
  • Security: server-side file save roots, image URL SSRF protection, extension allowlist (new in 4.1.1)

Bug fixes

  • Fix pip accidentally installing to the system Miniconda on Windows instead of the project environment.
  • Fix crash on non-UTF-8 Windows locales (e.g. Chinese GBK).
  • Fix passing adaptive-p to llama-server.
  • Fix truncation_length not propagating correctly when ctx_size is set to auto (0).
  • Fix dark theme using light theme syntax highlighting.
  • Fix word breaks in tables. Tables now scroll horizontally instead of breaking words.
  • Fix the OpenAI API server not respecting --listen-host.
  • Fix a crash loading the MiniMax-M2.5 jinja template.
  • Fix reasoning_effort not appearing in the UI for ExLlamaV3.
  • Fix ExLlamaV3 draft cache size to match main cache.
  • Fix ExLlamaV3 EOS handling for models with multiple end-of-sequence tokens.
  • Fix ExLlamaV3 perplexity evaluation giving incorrect values for sequences longer than 2048 tokens.

Dependency updates


Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip/extract, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda13.1, or cuda12.4 if you have older drivers.
    • AMD/Intel GPU: Use vulkan builds.
    • AMD GPU (ROCm): Use rocm builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.
    • Intel: Use macos-x86_64.

Updating a portable install:

  1. Download and extract the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Starting with 4.0, you can also move user_data one folder up, next to the install folder. It will be detected automatically, making updates easier:

text-generation-webui-4.0/
text-generation-webui-4.1/
user_data/                    <-- shared by both installs

v4.1

16 Mar 15:53
88a3188

Choose a tag to compare

v4.0

07 Mar 14:34
3b7cf44

Choose a tag to compare

Changes

  • Custom Gradio fork: Gradio has been replaced with a custom fork at oobabooga/gradio where major performance optimizations were made. The UI now does far less redundant work on every update, startup is faster, SSE message delivery is instant instead of polling every 50 ms, and a new zero-rendering gr.Headless component reduces overhead during chat streaming. Analytics, unused dependencies, and unused assets have also been removed from the wheel.
  • Tool-calling overhaul: Now tool-calling actually works for Qwen 3.5, Devstral 2, GPT-OSS, DeepSeek V3.2, GLM 5, MiniMax M2.5, Kimi K2/K2.5, and Llama 4 models. Several improvements have been made for strict OpenAI format compliance. Extensive testing has been done to make sure tool-calling works flawlessly for the supported models. [Documentation]
  • Parallel API requests: For llama.cpp, ExLlamaV3, and TensorRT-LLM loaders, it is now possible to make concurrent API requests for maximum throughput. For llama.cpp, it is necessary to use the --parallel N option and multiply the context length by N. [Documentation]
  • Training overhaul (documentation): The training code has been completely rewritten. It is now fully in line with axolotl for both raw text training and chat training.
    • For chat training, datasets in OpenAI messages format or ShareGPT conversations format are now used. Multi-turn chats are supported, with correct masking of user inputs so that training only happens on assistant messages. See user_data/training/example_messages.json and user_data/training/example_sharegpt.json for examples.
    • For raw text training, JSONL files are used, with correct BOS and EOS addition for each sub-document. See user_data/training/example_text.json for an example input.
    • Chat training now uses jinja2 templates for formatting prompts. You can use either the model's built-in template (if it has one) or a custom user-provided template.
    • New "Target all linear layers" checkbox that applies LoRA to every nn.Linear layer except lm_head. It works for any model architecture.
    • Checkpoint resumption: HF Trainer checkpoint directories are detected automatically and training resumes with full optimizer/scheduler state.
    • All training input parameters now have good, reviewed default values.
    • Conversations exceeding the cutoff length are now dropped instead of silently truncated (configurable).
    • Dynamic padding (chat datasets): batches are now padded to the longest sequence in each batch instead of always padding to cutoff_len, reducing wasted computation.
  • llama.cpp
    • --fit support: GPU layers now default to -1 (auto), letting llama.cpp determine the optimal number of layers and GPU split automatically. The new --fit-target parameter controls how much VRAM headroom to leave per GPU (default: 1024 MiB). Context size can also be set to 0 to let llama.cpp determine that automatically as well.
    • Integrate N-gram speculative decoding support for faster generation without the need for a draft model, through the --spec-type, --spec-ngram-size-n, --spec-ngram-size-m, and --spec-ngram-min-hits parameters. Good defaults are provided, just change --spec-type to ngram-mod to activate.
    • Binaries now work for any CPU instruction set (AVX, AVX2, AVX-512) by autodetecting at runtime, replacing the old separate AVX/AVX2 builds.
    • Add ROCm portable builds for Windows.
    • Add CUDA 13.1 portable builds.
    • Add back macOS x86_64 (Intel) portable builds.
    • Smaller CUDA binaries after improving compilation flags.
    • Compilation workflows at oobabooga/llama-cpp-binaries have been fully audited and aligned with upstream.
    • Handle SIGTERM to properly stop llama-server on pkill.
    • llama-server is now spawned on port 5005 by default instead of a random port.
  • Adaptive-p sampler for llama.cpp, Transformers, ExLlamaV3, and ExLlamaV3_HF loaders. This sampler reshapes the logit distribution to favor tokens near a target probability.
  • New CLI flags to set default API generation parameters: --temperature, --min-p, --top-k, --repetition-penalty, etc., and also --enable-thinking, --reasoning-effort, and --chat-template-file. The last parameter accepts .jinja or .yaml files.
  • Chat completion requests are now ~85 ms faster after optimizations.
  • SSE separator for streaming over the API changed from \r\n to \n to match OpenAI.
  • Migrate TensorRT-LLM from the old ModelRunner API to the new LLM API, which can take any Transformers model as input and has more sampling parameters.
  • Security
    • Prevent path traversal on file save/delete operations for characters, users, and uploaded files.
    • Restrict model loading over API to block extra_flags and trust_remote_code parameters.
    • Restrict file writes to the user_data_dir.
  • New --user-data-dir flag to customize the user data directory location. Now the program also auto-detects a ../user_data folder in portable mode if present, making updates easier.
  • User persona support: A new dropdown in the Character settings tab lets you save and load user profiles (name, bio, profile picture), so you can switch between different personas without re-entering your details (#7367). Thanks, @q5sys.
  • Replace PyPDF2 with pymupdf for much more accurate conversion of PDF inputs to text.
  • Markdown rendering improvements. All by @mamei16:
    • Re-introduce inline LaTeX rendering with more robust exception handling (#7402).
    • Disable uncommonly used indented codeblocks (#7401).
    • Improve process_markdown_content (#7403).
  • Add Qwen 3.5 thinking block support to the UI.
  • Add Solar Open thinking block support to the UI.
  • Update the entire documentation to match the current code.
  • Update all dockerfiles. [Documentation]
  • Update the Google Colab notebook.
  • Remove the ExLlamaV2 loader, which has been archived. EXL2 users should migrate to EXL3, which has much better quantization accuracy.
  • Remove the Training_PRO extension, which has become obsolete after the Training tab updates.
  • Remove obsolete DeepSpeed inference code from 2023.
  • Remove unused colorama and psutil dependencies.
  • Update outdated GitHub Actions versions (#7384). Thanks, @pgoslatara.

Bug fixes

  • Fix temperature_last having no effect in llama.cpp server sampler order.
  • Fix code block copy button not working over HTTP (Clipboard API fallback) (#7358). Thanks, @jakubartur.
  • Fix message copy buttons not working over HTTP (extend Clipboard API fallback).
  • Fix ExLlamaV3 CFG cache initialization and speculative decoding parameter handling.
  • Fix blank prompt dropdown in Notebook/Default tabs on first startup.
  • Use absolute Python path in Windows batch scripts to fix some rare edge cases.
  • Bump sentence-transformers from 2.2.2 to 3.3.1 in superbooga (#7406). Thanks, @OiPunk.
  • Fix installer state being saved before requirements were fully installed.
  • Fix ExLlamav3 race condition that could cause AssertionError or hangs during generation.
  • Fix API server continuing to generate tokens after client disconnects for non-streaming requests.

Dependency updates


Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip/extract, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda13.1, or cuda12.4 if you have older drivers.
    • AMD/Intel GPU: Use vulkan builds.
    • AMD GPU (ROCm): Use rocm builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.
    • Intel: Use macos-x86_64.

Updating a portable install:

  1. Download and extract the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Starting with 4.0, you can also move user_data one folder up, next to the install folder. It will be detected automatically, making updates easier:

text-generation-webui-4.0/
text-generation-webui-4.1/
user_data/                    <-- shared by both installs

v3.23

08 Jan 20:54
910456b

Choose a tag to compare

Changes

  • Improve the style of tables and horizontal separators in chat messages

Bug fixes

  • Fix loading models which have their eos token disabled (#7363). Thanks, @jin-eld.
  • Fix a symbolic link issue in llama-cpp-binaries while updating non-portable installs

Backend updates


Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda12.4.
    • AMD/Intel GPU: Use vulkan builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.

Updating a portable install:

  1. Download and unzip the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.