31 Oct 21:34

mudler

9ecfdc5

v3.7.0 Latest

Latest

🚀 LocalAI 3.7.0

Welcome to LocalAI 3.7.0 👋

This release introduces Agentic MCP support with full WebUI integration, a brand-new neutts TTS backend, fuzzy model search, long-form TTS chunking for chatterbox, and a complete WebUI overhaul.

We’ve also fixed critical bugs, improved stability, and enhanced compatibility with OpenAI’s APIs.

📌 TL;DR – What’s New in LocalAI 3.7.0

Feature	Summary
🤖 Agentic MCP Support (WebUI-enabled)	Build AI agents that use real tools (web search, code exec). Fully-OpenAI compatible and integrated into the WebUI.
🎙️ neutts TTS Backend (Neuphonic-powered)	Generate natural, high-quality speech with low-latency audio — ideal for voice assistants.
🖼️ WebUI enhancements	Faster, cleaner UI with real-time updates and full YAML model control.
💬 Long-Text TTS Chunking (Chatterbox)	Generate natural-sounding long-form audio by intelligently splitting text and preserving context.
🧩 Advanced Agent Controls	Fine-tune agent behavior with new options for retries, reasoning, and re-evaluation.
📸 New Video Creation Endpoint	We now support the OpenAI-compatible `/v1/videos` endpoint for text-to-video generation.
🐍 Enhanced Whisper compatibility	Whisper.cpp is now supported on various CPU variants (AVX, AVX2, etc.) to prevent `illegal instruction` crashes.
🔍 Fuzzy Gallery Search	Find models in the gallery even with typos (e.g., `gema` finds `gemma`).
📦 Easier Model & Backend Management	Import, edit, and delete models directly via clean YAML in the WebUI.
▶️ Realtime Example	Check out the new realtime voice assistant example (multilingual).
⚠️ Security, Stability & API Compliance	Fixed critical crashes, deadlocks, session events, OpenAI compliance, and JSON schema panics.
🧠 Qwen 3 VL	Support for Qwen 3 VL with llama.cpp/gguf models

🔥 What’s New in Detail

🤖 Agentic MCP Support – Build Intelligent, Tool-Using AI Agents

We're proud to announce full Agentic MCP support a feature for building AI agents that can reason, plan, and execute actions using external tools like web search, code execution, and data retrieval. You can use standard chat/completions endpoint, but powered by an agent in the background.

Full documentation is available here

✅ Now in WebUI: A dedicated toggle appears in the chat interface when a model supports MCP. Just click to enable agent mode.

✨ Key Features:

New Endpoint: POST /mcp/v1/chat/completions (OpenAI-compatible).

Flexible Tool Configuration:

mcp:
  stdio: |
    {
      "mcpServers": {
        "searxng": {
          "command": "docker",
          "args": ["run", "-i", "--rm", "ghcr.io/mudler/mcps/duckduckgo:master"]
        }
      }
    }

Advanced Agent Control via agent config:
```
agent:
  max_attempts: 3
  max_iterations: 5
  enable_reasoning: true
  enable_re_evaluation: true
```
- max_attempts: Retry failed tool calls up to N times.
- max_iterations: Limit how many times the agent can loop through reasoning.
- enable_reasoning: Allow step-by-step thought processes (e.g., chain-of-thought).
- enable_re_evaluation: Re-analyze decisions when tool results are ambiguous.

You can find some plug-n-play MCPs here: https://github.com/mudler/MCPs
Under the hood, MCP functionality is powered by https://github.com/mudler/cogito

🖼️ WebUI enhancements

WebUI had a major overhaul:

The chat view now has an MCP toggle in the chat for models that have mcp settings enabled in the model config file.
The Editor mask of the model has now been simplified to show/edit the YAML settings of the model
More reactive, dropped HTMX in favor of Alpine.js and vanilla javascript
Various fixes including deletion of models

🎙️ Introducing neutts TTS Backend – Natural Speech, Low Latency

Say hello to neutts a new, lightweight TTS backend powered by Neuphonic, delivering high-quality, natural-sounding speech with minimal overhead.

🎛️ Setup Example

name: neutts-english
backend: neutts
parameters:
  model: neuphonic/neutts-air
tts:
  audio_path: "./output.wav"
  streaming: true
options:
  # text transcription of the provided audio file
  - ref_text: "So I'm live on radio..."
known_usecases:
  - tts

🐍 Whisper.cpp enhancements

whisper.cpp CPU variants are now available for:

avx
avx2
avx512
fallback (no optimized instructions available)

These variants are optimized for specific instruction sets and reduce crashes on older or non-AVX CPUs.

🔍 Smarter Gallery Search: Fuzzy & Case-Insensitive Matching

Searching for gemma now finds gemma-3, gemma2, etc. — even with typos like gemaa or gema.

🧩 Improved Tool & Schema Handling – No More Crashes

We’ve fixed multiple edge cases that caused crashes or silent failures in tool usage.

✅ Fixes:

Nullable JSON Schemas: "type": ["string", "null"] now works without panics.
Empty Parameters: Tools with missing or empty parameters now handled gracefully.
Strict Mode Enforcement: When strict_mode: true, the model must pick a tool — no more skipping.
Multi-Type Arrays: Safe handling of ["string", "null"] in function definitions.

🔄 Interaction with Grammar Triggers: strict_mode and grammar rules work together — if a tool is required and the function definition is invalid, the server returns a clear JSON error instead of crashing.

📸 New Video Creation Endpoint: OpenAI-Compatible

LocalAI now supports OpenAI’s /v1/videos endpoint for generating videos from text prompts.

📌 Usage Example:

curl http://localhost:8080/v1/videos \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-..." \
  -d '{
    "model": "sora",
    "prompt": "A cat walking through a forest at sunset",
    "size": "1024x576",
  }'

🧠 Qwen 3 VL in llama.cpp

Support has been added for Qwen 3 VL in llama.cpp. We have updated llama.cpp to latest! As a reminder, Qwen 3 VL and multimodal models are also compatible with our vLLM and MLX backends. Qwen 3 VL models are already available in the model gallery:

qwen3-vl-30b-a3b-instruct
qwen3-vl-30b-a3b-thinking
qwen3-vl-4b-instruct
qwen3-vl-32b-instruct
qwen3-vl-4b-thinking
qwen3-vl-2b-thinking
qwen3-vl-2b-instruct

Note: upgrading the llama.cpp backend is necessary if you already have a LocalAI installation.

🚀 (CI) Gallery Updater Agent: Auto-Detect & Suggest New Models

We’ve added an autonomous CI agent that scans Hugging Face daily for new models and opens PRs to update the gallery.

✨ How It Works:

Scans HF for new, trending models
Extracts base model, quantization, and metadata.
Uses cogito (our agentic framework) to assign the model to the correct family and to obtain the model informations.
Opens a PR with:
- Suggested name, family, and usecases
- Link to HF model
- YAML snippet for import

🔧 Critical Bug Fixes & Stability Improvements

Issue	Fix	Impact
📌 WebUI Crash on Model Load	Fixed `can't evaluate field Name in type string` error	Models now render even without config files
🔁 Deadlock in Model Load/Idle Checks	Guarded against race conditions during model loading	Improved performance under load
📞 Realtime API Compliance	Added `session.created` event; removed redundant `conversation.created`	Works with VoxInput, OpenAI clients, and more
📥 MCP Response Formatting	Output wrapped in `message` field	Matches OpenAI spec — better client compatibility
🛑 JSON Error Responses	Now return clean JSON instead of HTML	Scripts and libraries no longer break on auth failures
🔄 Session Registration	Fixed initial MCP calls failing due to cache issues	Reliable first-time use
🎧 `kokoro` TTS	Returns full audio, not partial	Better for long-form TTS

🚀 The Complete Local Stack for Privacy-First AI

<...

LocalAI

The free, Open Source OpenAI alternative. Acts as a drop-in replacement REST API compatible with OpenAI specifications for local AI inferencing. No GPU required.

Link: https://github.com/mudler/LocalAI

LocalAGI

A powerful Local AI agent management platform. Serves as a drop-in replacement for OpenAI's Responses API, supercharged with advanced agentic capabilities and a no-code UI.

Link: https://github.com/mudler/LocalAGI

Contributors

blob42, richiejp, and 6 other contributors

Assets 10

0 Join discussion

03 Oct 13:08

mudler

v3.6.0

8fb9568

v3.6.0

What's Changed

Bug fixes 🐛

fix: reranking models limited to 512 tokens in llama.cpp backend by @jongames in #6344

Exciting New Features 🎉

feat(kokoro): add support for l4t devices by @mudler in #6322
feat(chatterbox): support multilingual by @mudler in #6240

🧠 Models

chore(model gallery): add qwen-image-edit-2509 by @mudler in #6336
chore(models): add whisper-turbo via whisper.cpp by @mudler in #6340
chore(model gallery): add ibm-granite_granite-4.0-h-small by @mudler in #6373
chore(model gallery): add ibm-granite_granite-4.0-h-tiny by @mudler in #6374
chore(model gallery): add ibm-granite_granite-4.0-h-micro by @mudler in #6375
chore(model gallery): add ibm-granite_granite-4.0-micro by @mudler in #6376

👒 Dependencies

chore(deps): bump grpcio from 1.74.0 to 1.75.0 in /backend/python/transformers by @dependabot[bot] in #6332
chore(deps): bump securego/gosec from 2.22.8 to 2.22.9 by @dependabot[bot] in #6324
chore(deps): bump llama.cpp to '72b24d96c6888c609d562779a23787304ae4609c' by @mudler in #6349
chore(deps): bump grpcio from 1.74.0 to 1.75.1 in /backend/python/coqui by @dependabot[bot] in #6353
chore(deps): bump transformers from 4.48.3 to 4.56.2 in /backend/python/coqui by @dependabot[bot] in #6330
chore(deps): bump grpcio from 1.74.0 to 1.75.1 in /backend/python/diffusers by @dependabot[bot] in #6361
chore(deps): bump grpcio from 1.74.0 to 1.75.1 in /backend/python/rerankers by @dependabot[bot] in #6360
chore(deps): bump grpcio from 1.74.0 to 1.75.1 in /backend/python/common/template by @dependabot[bot] in #6358
chore(deps): bump grpcio from 1.74.0 to 1.75.1 in /backend/python/vllm by @dependabot[bot] in #6357
chore(deps): bump grpcio from 1.74.0 to 1.75.1 in /backend/python/bark by @dependabot[bot] in #6359
chore(deps): bump grpcio from 1.75.0 to 1.75.1 in /backend/python/transformers by @dependabot[bot] in #6362
chore(deps): bump grpcio from 1.74.0 to 1.75.1 in /backend/python/exllama2 by @dependabot[bot] in #6356

Other Changes

chore: ⬆️ Update ggml-org/llama.cpp to 7f766929ca8e8e01dcceb1c526ee584f7e5e1408 by @localai-bot in #6319
docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #6318
chore: ⬆️ Update ggml-org/llama.cpp to da30ab5f8696cabb2d4620cdc0aa41a298c54fd6 by @localai-bot in #6321
chore: ⬆️ Update ggml-org/llama.cpp to 1d0125bcf1cbd7195ad0faf826a20bc7cec7d3f4 by @localai-bot in #6335
chore(cudss): add cudds to l4t images by @mudler in #6338
chore: ⬆️ Update ggml-org/llama.cpp to 4ae88d07d026e66b41e85afece74e88af54f4e66 by @localai-bot in #6339
CI: disable build-testing on PRs against arm64 by @mudler in #6341
chore(deps): bump llama.cpp to '835b2b915c52bcabcd688d025eacff9a07b65f52' by @mudler in #6347
chore: ⬆️ Update ggml-org/llama.cpp to 4807e8f96a61b2adccebd5e57444c94d18de7264 by @localai-bot in #6350
chore: ⬆️ Update ggml-org/llama.cpp to bd0af02fc96c2057726f33c0f0daf7bb8f3e462a by @localai-bot in #6352
Revert "chore(deps): bump transformers from 4.48.3 to 4.56.2 in /backend/python/coqui" by @mudler in #6363
chore: ⬆️ Update ggml-org/whisper.cpp to 32be14f8ebfc0498c2c619182f0d7f4c822d52c4 by @localai-bot in #6354
chore: ⬆️ Update ggml-org/llama.cpp to 5f7e166cbf7b9ca928c7fad990098ef32358ac75 by @localai-bot in #6355
chore: ⬆️ Update ggml-org/llama.cpp to b2ba81dbe07b6dbea9c96b13346c66973dede32c by @localai-bot in #6366
chore: ⬆️ Update ggml-org/whisper.cpp to 8c0855fd6bb115e113c0dca6255ea05f774d35f7 by @localai-bot in #6365
chore: ⬆️ Update ggml-org/whisper.cpp to 7849aff7a2e1f4234aa31b01a1870906d5431959 by @localai-bot in #6367
chore: ⬆️ Update ggml-org/llama.cpp to 1fe4e38cc20af058ed320bd46cac934991190056 by @localai-bot in #6368
chore: ⬆️ Update ggml-org/llama.cpp to d64c8104f090b27b1f99e8da5995ffcfa6b726e2 by @localai-bot in #6371

New Contributors

@jongames made their first contribution in #6344

Full Changelog: v3.5.4...v3.6.0

Contributors

mudler, jongames, and 2 other contributors

Assets 10

20 Sep 07:49

mudler

v3.5.4

f7f26b8

v3.5.4

What's Changed

Bug fixes 🐛

fix(python): make option check uniform across backends by @mudler in #6314

Other Changes

chore: ⬆️ Update ggml-org/whisper.cpp to 44fa2f647cf2a6953493b21ab83b50d5f5dbc483 by @localai-bot in #6317
chore: ⬆️ Update ggml-org/llama.cpp to f432d8d83e7407073634c5e4fd81a3d23a10827f by @localai-bot in #6316
docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #6315

Full Changelog: v3.5.3...v3.5.4

Contributors

mudler and localai-bot

Assets 10

19 Sep 17:10

mudler

v3.5.3

c27da0a

v3.5.3

What's Changed

Bug fixes 🐛

fix(diffusers): fix float detection by @mudler in #6313

🧠 Models

chore(model gallery): add mistralai_magistral-small-2509 by @mudler in #6309
chore(model gallery): add impish_qwen_14b-1m by @mudler in #6310
chore(model gallery): add aquif-3.5-a4b-think by @mudler in #6311

👒 Dependencies

chore: ⬆️ Update ggml-org/llama.cpp to 3edd87cd055a45d885fa914d879d36d33ecfc3e1 by @localai-bot in #6308

Other Changes

docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #6307

Full Changelog: v3.5.2...v3.5.3

Contributors

mudler and localai-bot

Assets 10

18 Sep 07:37

mudler

v3.5.2

902e47f

v3.5.2

What's Changed

👒 Dependencies

Revert "feat(nvidia-gpu): bump images to cuda 12.8" by @mudler in #6303

Other Changes

docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #6305
chore: ⬆️ Update ggml-org/llama.cpp to 0320ac5264279d74f8ee91bafa6c90e9ab9bbb91 by @localai-bot in #6306

Full Changelog: v3.5.1...v3.5.2

Contributors

mudler and localai-bot

Assets 10

17 Sep 17:03

mudler

v3.5.1

44bbf4d

v3.5.1

What's Changed

Bug fixes 🐛

fix: make sure to turn down all processes on exit by @mudler in #6200
fix(p2p): automatically install llama-cpp for p2p workers by @mudler in #6199
Point to LocalAI-examples repo for llava by @mauromorales in #6241
fix: runtime capability detection for backends by @sozercan in #6149
fix(chat): use proper finish_reason for tool/function calling by @imkira in #6243
fix(rocm): Rename tag suffix for hipblas whisper build to match backend config by @KingJ in #6247
fix(llama-cpp): correctly calculate embeddings by @mudler in #6259

Exciting New Features 🎉

feat(launcher): show welcome page by @mudler in #6234
feat: support HF_ENDPOINT env for the HuggingFace endpoint by @qxo in #6220

🧠 Models

chore(model gallery): add nousresearch_hermes-4-14b by @mudler in #6197
chore(model gallery): add MiniCPM-V-4.5-8b-q4_K_M by @M0Rf30 in #6205
chore(model-gallery): ⬆️ update checksum by @localai-bot in #6211
feat(whisper): Add diarization (tinydiarize) by @richiejp in #6184
chore(model gallery): add baidu_ernie-4.5-21b-a3b-thinking by @mudler in #6267
chore(model gallery): add aquif-ai_aquif-3.5-8b-think by @mudler in #6269
chore(model gallery): add qwen3-stargate-sg1-uncensored-abliterated-8b-i1 by @mudler in #6270
chore(model gallery): add k2-think-i1 by @mudler in #6288
chore(model gallery): add holo1.5-72b by @mudler in #6289
chore(model gallery): add holo1.5-7b by @mudler in #6290
chore(model gallery): add holo1.5-3b by @mudler in #6291
chore(model gallery): add alibaba-nlp_tongyi-deepresearch-30b-a3b by @mudler in #6295
chore(model gallery): add webwatcher-7b by @mudler in #6297
chore(model gallery): add webwatcher-32b by @mudler in #6298
chore(model gallery): add websailor-32b by @mudler in #6299
chore(model gallery): add websailor-7b by @mudler in #6300

📖 Documentation and examples

chore(docs): add MacOS dmg download button by @mudler in #6233

👒 Dependencies

chore(deps): bump github.com/opencontainers/image-spec from 1.1.0 to 1.1.1 by @dependabot[bot] in #6223
chore(deps): bump actions/stale from 9.1.0 to 10.0.0 by @dependabot[bot] in #6227
chore(deps): bump go.opentelemetry.io/otel/exporters/prometheus from 0.50.0 to 0.60.0 by @dependabot[bot] in #6226
chore(deps): bump oras.land/oras-go/v2 from 2.5.0 to 2.6.0 by @dependabot[bot] in #6225
chore(deps): bump github.com/swaggo/swag from 1.16.3 to 1.16.6 by @dependabot[bot] in #6222
chore(deps): bump actions/labeler from 5 to 6 by @dependabot[bot] in #6229
feat(nvidia-gpu): bump images to cuda 12.8 by @mudler in #6239
feat(chatterbox): add MPS, and CPU, pin version by @mudler in #6242

Other Changes

chore: ⬆️ Update ggml-org/llama.cpp to 0fce7a1248b74148c1eb0d368b7e18e8bcb96809 by @localai-bot in #6193
chore: ⬆️ Update leejet/stable-diffusion.cpp to 2eb3845df5675a71565d5a9e13b7bad0881fafcd by @localai-bot in #6192
docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #6201
chore: ⬆️ Update ggml-org/llama.cpp to fb15d649ed14ab447eeab911e0c9d21e35fb243e by @localai-bot in #6202
Fix Typos in Docs by @alizfara112 in #6204
chore: ⬆️ Update ggml-org/whisper.cpp to bb0e1fc60f26a707cabf724edcf7cfcab2a269b6 by @localai-bot in #6203
chore: ⬆️ Update ggml-org/llama.cpp to 408ff524b40baf4f51a81d42a9828200dd4fcb6b by @localai-bot in #6207
chore: ⬆️ Update ggml-org/llama.cpp to c4df49a42d396bdf7344501813e7de53bc9e7bb3 by @localai-bot in #6209
chore: ⬆️ Update leejet/stable-diffusion.cpp to d7f430cd693f2e12ecbaa0ce881746cf305c3b1f by @richiejp in #6213
chore: ⬆️ Update leejet/stable-diffusion.cpp to c648001030d4c2cc7c851fdaf509ee36d642dc99 by @localai-bot in #6215
chore: ⬆️ Update ggml-org/llama.cpp to 3976dfbe00f02a62c0deca32c46138e4f0ca81d8 by @localai-bot in #6214
chore: ⬆️ Update leejet/stable-diffusion.cpp to abb115cd021fc2beed826604ed1a479b6a77671c by @localai-bot in #6236
chore: ⬆️ Update ggml-org/whisper.cpp to edea8a9c3cf0eb7676dcdb604991eb2f95c3d984 by @localai-bot in #6237
chore: ⬆️ Update leejet/stable-diffusion.cpp to b0179181069254389ccad604e44f17a2c25b4094 by @localai-bot in #6246
chore: ⬆️ Update ggml-org/llama.cpp to 0e6ff0046f4a2983b2c77950aa75960fe4b4f0e2 by @localai-bot in #6235
chore: ⬆️ Update leejet/stable-diffusion.cpp to fce6afcc6a3250a8e17923608922d2a99b339b47 by @richiejp in #6256
chore: ⬆️ Update ggml-org/llama.cpp to 40be51152d4dc2d47444a4ed378285139859895b by @localai-bot in #6260
chore: ⬆️ Update ggml-org/llama.cpp to aa0c461efe3603639af1a1defed2438d9c16ca0f by @localai-bot in #6261
chore(aio): upgrade minicpm-v model to latest 4.5 by @M0Rf30 in #6262
chore: ⬆️ Update ggml-org/llama.cpp to 0fa154e3502e940df914f03b41475a2b80b985b0 by @localai-bot in #6263
chore: ⬆️ Update ggml-org/llama.cpp to 6c019cb04e86e2dacfe62ce7666c64e9717dde1f by @localai-bot in #6265
chore: ⬆️ Update leejet/stable-diffusion.cpp to 0ebe6fe118f125665939b27c89f34ed38716bff8 by @richiejp in #6271
chore: ⬆️ Update ggml-org/llama.cpp to b907255f4bd169b0dc7dca9553b4c54af5170865 by @localai-bot in #6287
chore: ⬆️ Update ggml-org/llama.cpp to 8ff206097c2bf3ca1c7aa95f9d6db779fc7bdd68 by @localai-bot in #6292

New Contributors

@alizfara112 made their first contribution in #6204
@qxo made their first contribution in #6220
@imkira made their first contribution in #6243
@KingJ made their first contribution in #6247

Full Changelog: v3.5.0...v3.5.1

Contributors

mauromorales, imkira, and 9 other contributors

Assets 10

03 Sep 20:23

mudler

v3.5.0

034b9b6

v3.5.0

🚀 LocalAI 3.5.0

Welcome to LocalAI 3.5.0! This release focuses on expanding backend support, improving usability, refining the overall experience, and keeping reducing footprint of LocalAI, to make it a truly portable, privacy-focused AI stack. We’ve added several new backends, enhanced the WebUI with new features, made significant performance improvements under the hood, and simplified LocalAI management with a new Launcher app (Alpha) available for Linux and MacOS.

TL;DR – What’s New in LocalAI 3.5.0 🎉

🖼️ Expanded Backend Support: Welcome to MLX! mlx, mlx-audio, mlx-vlm are now all available in LocalAI. We also added support to WAN for video generation, and a CPU and MPS version of the diffusers backend! Now you can generate and edit images from MacOS or if you don't have any GPU (albeit slow).
✨ WebUI Enhancements: Download model configurations, a manual model refresh button, streamlined error streaming during SSE events, and a stop button for running backends. Models now can also be imported and edited via the WebUI.
🚀 Performance & Architecture: Whisper backend has been rewritten in Purego with integrated Voice Activity Detection (VAD) for improved efficiency and stability. Stablediffusion also benefits from the Purego conversion.
🛠️ Simplified Management: New LocalAI Launcher App (Alpha) for easy installation, startup, updates, and access to the WebUI.
✅ Bug Fixes & Stability: Resolutions to AMD RX 9060XT ROCm errors, libomp linking issues, model loading problems on macOS, CUDA device detection improvements, and more.
Enhanced support for MacOS: whisper, diffusers, llama.cpp, MLX (VLM, Audio, LLM), stable-diffusion.cpp will now work on MacOS!

What’s New in Detail

🚀 New Backends and Model Support

We've significantly expanded the range of models you can run with LocalAI!

mlx-audio: Bring text to life with Kokoro’s voice models on MacOS with the power of MLX!. Install with the mlx-audio backend. Example configuration:
```
backend: mlx-audio
name: kokoro-mlx
parameters:
  model: prince-canuma/Kokoro-82M
  voice: "af_heart"
  known_usecases:
    - tts
```

mlx-vlm: Experiment with the latest VLM models. While we don't have any models in the gallery, it's really easy to configure, see #6119 for more details.

name: mlx-gemma
backend: mlx-vlm
parameters:
  model: "mlx-community/gemma-3n-E2B-it-4bit"
template:
  use_tokenizer_template: true
known_usecases:
- chat

WAN: Generate videos with Wan2.1 or Wan 2.2 models using the diffusers backend, supporting both I2V and T2V. Example configuration:

name: wan21
f16: true
backend: diffusers
known_usecases:
  - video
parameters:
  model: Wan-AI/Wan2.1-T2V-1.3B-Diffusers
diffusers:
  cuda: true
  pipeline_type: WanPipeline
  step: 40
options:
    - guidance_scale:5.0
    - num_frames:81
    - torch_dtype:bf16

Diffusers CPU and MacOS Support: Run diffusers models directly on your CPU without a GPU or with a Mac! This opens up LocalAI to a wider range of hardware configurations.

✨ WebUI Improvements

We've added several new features to make using LocalAI even easier:

Download Model Config: A "Get Config" button in the model gallery lets you download a model’s configuration file without installing the full model. This is perfect for custom setups and easier integration.
Manual Model Refresh: A new button allows you to manually refresh the on-disk YAML configuration, ensuring the WebUI always has the latest model information.
Streamlined Error Handling: Errors during SSE streaming events are now displayed directly to the user, providing better visibility and debugging information.
Backend Stop Button: Quickly stop running backends directly from the WebUI.

Model import and edit: Now models can be edited and imported directly from the WebUI.

Screenshot 2025-08-14 at 22-28-59 LocalAI - Import Model

Screenshot 2025-08-14 at 22-28-47 LocalAI - Edit Model gpt-oss-20b

Installed Backend List: Now displays installed backends in the WebUI for easier access and management.

🚀 Performance & Architecture Improvements

Purego Whisper Backend: The Whisper backend has been rewritten in Purego for increased performance and stability. This also includes integrated Voice Activity Detection (VAD) for detecting speech.
Purego Stablediffusion: Similar to Whisper, Stablediffusion has been converted to Purego, improving its overall architecture and enabling better compatibility.

🛠️ Simplified Management – Introducing the LocalAI Launcher (Alpha)

We're excited to introduce the first version of the LocalAI Launcher! This application simplifies:

Installation
Startup/Shutdown
Updates
Access to the WebUI and Application Folder

Please note: The launcher is in Alpha and may have bugs. The macOS build requires workarounds to run due to binaries not yet signed, and specific steps for running it are needed: https://discussions.apple.com/thread/253714860?answerId=257037956022#257037956022.

✅ Bug Fixes & Stability Improvements

AMD RX 9060XT ROCm Error: Fixed an issue causing errors with AMD RX 9060XT GPUs when using ROCm. This error, "ROCm error: invalid device function", occurred because of device function incompatibility. The fix involves updating the ROCm image and ensuring the correct GPU targets are specified during compilation. Recommended kernel versions and verification steps for GPU detection are available [here](link to troubleshooting doc if created).
libomp Linking: Resolved a missing libomp.so issue on macOS Docker containers.
macOS Model Loading: Addressed a problem where models could not be loaded on macOS. This was resolved by bundling necessary libutf8 libraries.
CUDA Device Detection: Improved detection of available GPU resources.
Flash Attention: Set auto for flash_attention in llama.cpp, allowing the system to optimize performance.

Additional Improvements

System Backend: Added a new "system" backend path (LOCALAI_BACKENDS_SYSTEM_PATH or via command-line arguments) defaulting to /usr/share/localai/backends. This allows specifying a read-only directory for backends, useful for package management and system-wide installations.
P2P Model Sync: Implemented automatic synchronization of installed models between LocalAI instances within a federation. Currently limited to models installed through the gallery, and configuration changes are not synced. Future improvements will address these limitations.
Diffusers Image Source Handling: Enhanced image source selection in the diffusers backend, prioritizing ref_images over src for more robust loading behavior.
Darwin CI Builds: Added support for building some Go-based backends (Stablediffusion and Whisper) on Darwin (macOS) in the CI pipeline.

🚨 Important Notes

Launcher (Alpha): The LocalAI Launcher is in its early stages of development. Please report any issues you encounter. The MacOS build requires additional steps due to code signing.
Model Configuration Updates: Changes to model configuration files are not currently synchronized when using P2P model sync.

The Complete Local Stack for Privacy-First AI

LocalAI

The free, Open Source OpenAI alternative. Acts as a drop-in replacement REST API compatible with OpenAI specifications for local AI inferencing. No GPU required.

Link: https://github.com/mudler/LocalAI

LocalAGI

A powerful Local AI agent management platform. Serves as a drop-in replacement for OpenAI's Responses API, supercharged with advanced agentic capabilities and a no-code UI.

Link:

Contributors

lxnay, richiejp, and 4 other contributors

Assets 10

0 Join discussion

12 Aug 07:13

mudler

v3.4.0

b2e8b6d

v3.4.0

🚀 LocalAI 3.4.0

What’s New in LocalAI 3.4.0 🎉

WebUI improvements: now size can be set during image generation
New backends: KittenTTS, kokoro and dia now are available as backends and models can be installed directly from the gallery
Note: these backends needs to be warmed up during the first call to download the model files.
Support for reasoning effort in the OpenAI chat completion
Diffusers backend now is available for l4t images and devices
During backend installation from the CLI can be supplied alias and name (--alias and --name`) to override configurations
Backends now can be sideloaded from the system: you can drag-and-drop the backends in the backends folder and they will just work!

The Complete Local Stack for Privacy-First AI

LocalAI

The free, Open Source OpenAI alternative. Acts as a drop-in replacement REST API compatible with OpenAI specifications for local AI inferencing. No GPU required.

Link: https://github.com/mudler/LocalAI

LocalAGI

A powerful Local AI agent management platform. Serves as a drop-in replacement for OpenAI's Responses API, supercharged with advanced agentic capabilities and a no-code UI.

Link: https://github.com/mudler/LocalAGI

LocalRecall

A RESTful API and knowledge base management system providing persistent memory and storage capabilities for AI agents. Designed to work alongside LocalAI and LocalAGI.

Link: https://github.com/mudler/LocalRecall

Thank you! ❤️

A massive THANK YOU to our incredible community and our sponsors! LocalAI has over 34,500 stars, and LocalAGI has already rocketed past 1k+ stars!

As a reminder, LocalAI is real FOSS (Free and Open Source Software) and its sibling projects are community-driven and not backed by VCs or a company. We rely on contributors donating their spare time and our sponsors to provide us the hardware! If you love open-source, privacy-first AI, please consider starring the repos, contributing code, reporting bugs, or spreading the word!

👉 Check out the reborn LocalAGI v2 today: https://github.com/mudler/LocalAGI

Full changelog 👇

👉 Click to expand 👈

What's Changed

Bug fixes 🐛

fix(llama.cpp): do not default to linear rope by @mudler in #5982

Exciting New Features 🎉

feat(webui): allow to specify image size by @mudler in #5976
feat(backends): add KittenTTS by @mudler in #5977
feat(kokoro): complete kokoro integration by @mudler in #5978
feat: add reasoning effort and metadata to template by @mudler in #5981
feat(transformers): add support to Dia by @mudler in #5991
feat(diffusers): add builds for nvidia-l4t by @mudler in #6004
feat(backends install): allow to specify name and alias during manual installation by @mudler in #5971

🧠 Models

chore(models): add gpt-oss-20b by @mudler in #5973
chore(models): add gpt-oss-120b by @mudler in #5974
feat(models): add support to qwen-image by @mudler in #5975
chore(model gallery): add openai_gpt-oss-20b-neo by @mudler in #5986
fix(harmony): improve template by adding reasoning effort and system_prompt by @mudler in #5985
chore(model gallery): add qwen_qwen3-4b-instruct-2507 by @mudler in #5987
chore(model gallery): add qwen_qwen3-4b-thinking-2507 by @mudler in #5988
chore(model gallery): add huihui-ai_huihui-gpt-oss-20b-bf16-abliterated by @mudler in #5995
chore(model gallery): add openai-gpt-oss-20b-abliterated-uncensored-neo-imatrix by @mudler in #5996
chore(model gallery): add tarek07_nomad-llama-70b by @mudler in #5997
chore: add Dia to the model gallery, fix backend by @mudler in #5998
chore(model gallery): add chatterbox by @mudler in #5999
chore(model gallery): add outetts by @mudler in #6000
chore(model gallery): add impish_nemo_12b by @mudler in #6007
chore(model-gallery): ⬆️ update checksum by @localai-bot in #6010

👒 Dependencies

chore(deps): bump edgevpn by @mudler in #6001

Other Changes

docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #5967
chore: ⬆️ Update ggml-org/llama.cpp to 41613437ffee0dbccad684fc744788bc504ec213 by @localai-bot in #5968
chore(deps): bump torch and diffusers by @mudler in #5970
chore(deps): bump torch and sentence-transformers by @mudler in #5969
chore: ⬆️ Update ggml-org/llama.cpp to fd1234cb468935ea087d6929b2487926c3afff4b by @localai-bot in #5972
chore: ⬆️ Update ggml-org/llama.cpp to e725a1a982ca870404a9c4935df52466327bbd02 by @localai-bot in #5984
feat(swagger): update swagger by @localai-bot in #5983
chore: ⬆️ Update ggml-org/llama.cpp to a0552c8beef74e843bb085c8ef0c63f9ed7a2b27 by @localai-bot in #5992
chore: ⬆️ Update ggml-org/whisper.cpp to 4245c77b654cd384ad9f53a4a302be716b3e5861 by @localai-bot in #5993
docs: update links in documentation by @lnnt in #5994
chore: ⬆️ Update ggml-org/llama.cpp to cd6983d56d2cce94ecb86bb114ae8379a609073c by @localai-bot in #6003
fix(l4t-diffusers): add sentencepiece by @mudler in #6005
chore: ⬆️ Update ggml-org/llama.cpp to 79c1160b073b8148a404f3dd2584be1606dccc66 by @localai-bot in #6006
chore: ⬆️ Update ggml-org/whisper.cpp to b02242d0adb5c6c4896d59ac86d9ec9fe0d0fe33 by @localai-bot in #6009
chore: ⬆️ Update ggml-org/llama.cpp to be48528b068111304e4a0bb82c028558b5705f05 by @localai-bot in #6012

New Contributors

@lnnt made their first contribution in #5994

Full Changelog: v3.3.2...v3.4.0

Contributors

mudler, lnnt, and localai-bot

Assets 8

0 Join discussion

04 Aug 14:52

mudler

v3.3.2

d6274ea

v3.3.2

What's Changed

Exciting New Features 🎉

feat(backends): install from local path by @mudler in #5962
feat(backends): allow backends to not have a metadata file by @mudler in #5963

📖 Documentation and examples

fix(docs): Improve responsiveness of tables by @dedyf5 in #5954

👒 Dependencies

chore(stable-diffusion): bump, set GGML_MAX_NAME by @mudler in #5961
chore(build): Rename sycl to intel by @richiejp in #5964

Other Changes

docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #5956
chore: ⬆️ Update ggml-org/whisper.cpp to 0becabc8d68d9ffa6ddfba5240e38cd7a2642046 by @localai-bot in #5958
chore: ⬆️ Update ggml-org/llama.cpp to 5c0eb5ef544aeefd81c303e03208f768e158d93c by @localai-bot in #5959
chore: ⬆️ Update ggml-org/llama.cpp to d31192b4ee1441bbbecd3cbf9e02633368bdc4f5 by @localai-bot in #5965

Full Changelog: v3.3.1...v3.3.2

Contributors

richiejp, mudler, and 2 other contributors

Assets 8

01 Aug 13:02

mudler

v3.3.1

0b08508

v3.3.1

This is a minor release, however we have addressed some important bug regarding Intel-GPU Images, and we have changed naming of the container images.

This release also adds support for Flux Kontext and Flux krea!

⚠️ Breaking change

Intel GPU images has been renamed from latest-gpu-intel-f32 and latest-gpu-intel-f16 to a single one, latest-gpu-intel, for example:

docker run -ti --name local-ai -p 8080:8080 --device=/dev/dri/card1 --device=/dev/dri/renderD128 localai/localai:latest-gpu-intel

and for AIO (All-In-One) images:

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-gpu-intel

🖼️ Flux kontext

From this release LocalAI supports Flux Kontext and can be used to edit images via the API:

Install with:

local-ai run flux.1-kontext-dev

To test:

curl http://localhost:8080/v1/images/generations -H "Content-Type: application/json" -d '{
  "model": "flux.1-kontext-dev",
  "prompt": "change 'flux.cpp' to 'LocalAI'",
  "size": "256x256",
  "ref_images": [
  	"https://raw.githubusercontent.com/leejet/stable-diffusion.cpp/master/assets/flux/flux1-dev-q8_0.png"
  ]
}'

What's Changed

Breaking Changes 🛠

fix(intel): Set GPU vendor on Intel images and cleanup by @richiejp in #5945

Exciting New Features 🎉

feat(stablediffusion-ggml): add support to ref images (flux Kontext) by @mudler in #5935

🧠 Models

chore(model gallery): add qwen_qwen3-30b-a3b-instruct-2507 by @mudler in #5936
chore(model gallery): add arcee-ai_afm-4.5b by @mudler in #5938
chore(model gallery): add qwen_qwen3-30b-a3b-thinking-2507 by @mudler in #5939
chore(model gallery): add flux.1-dev-ggml-q8_0 by @mudler in #5947
chore(model gallery): add flux.1-dev-ggml-abliterated-v2-q8_0 by @mudler in #5948
chore(model gallery): add flux.1-krea-dev-ggml by @mudler in #5949

Other Changes

docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #5929
chore: ⬆️ Update ggml-org/llama.cpp to 8ad7b3e65b5834e5574c2f5640056c9047b5d93b by @localai-bot in #5931
chore: ⬆️ Update leejet/stable-diffusion.cpp to f6b9aa1a4373e322ff12c15b8a0749e6dd6f0253 by @localai-bot in #5930
chore: ⬆️ Update ggml-org/whisper.cpp to d0a9d8c7f8f7b91c51d77bbaa394b915f79cde6b by @localai-bot in #5932
chore: ⬆️ Update ggml-org/llama.cpp to aa79524c51fb014f8df17069d31d7c44b9ea6cb8 by @localai-bot in #5934
chore: ⬆️ Update ggml-org/llama.cpp to e9192bec564780bd4313ad6524d20a0ab92797db by @localai-bot in #5940
chore: ⬆️ Update ggml-org/whisper.cpp to f7502dca872866a310fe69d30b163fa87d256319 by @localai-bot in #5941
chore: update swagger by @mudler in #5946
feat(stablediffusion-ggml): allow to load loras by @mudler in #5943
chore(capability): improve messages by @mudler in #5944
feat(swagger): update swagger by @localai-bot in #5950
chore: ⬆️ Update ggml-org/llama.cpp to daf2dd788066b8b239cb7f68210e090c2124c199 by @localai-bot in #5951

Full Changelog: v3.3.0...v3.3.1

Contributors

richiejp, mudler, and localai-bot

Assets 8

Uh oh!

Releases: mudler/LocalAI

v3.7.0

🚀 LocalAI 3.7.0

📌 TL;DR – What’s New in LocalAI 3.7.0

🔥 What’s New in Detail

🤖 Agentic MCP Support – Build Intelligent, Tool-Using AI Agents

✨ Key Features:

🖼️ WebUI enhancements

🎙️ Introducing neutts TTS Backend – Natural Speech, Low Latency

🎛️ Setup Example

🐍 Whisper.cpp enhancements

🔍 Smarter Gallery Search: Fuzzy & Case-Insensitive Matching

🧩 Improved Tool & Schema Handling – No More Crashes

✅ Fixes:

📸 New Video Creation Endpoint: OpenAI-Compatible

📌 Usage Example:

🧠 Qwen 3 VL in llama.cpp

🚀 (CI) Gallery Updater Agent: Auto-Detect & Suggest New Models

✨ How It Works:

🔧 Critical Bug Fixes & Stability Improvements

🚀 The Complete Local Stack for Privacy-First AI

LocalAI

LocalAGI

Contributors

Uh oh!

v3.6.0

What's Changed

Bug fixes 🐛

Exciting New Features 🎉

🧠 Models

👒 Dependencies

Other Changes

New Contributors

Contributors

Uh oh!

v3.5.4

What's Changed

Bug fixes 🐛

Other Changes

Contributors

Uh oh!

v3.5.3

What's Changed

Bug fixes 🐛

🧠 Models

👒 Dependencies

Other Changes

Contributors

Uh oh!

v3.5.2

What's Changed

👒 Dependencies

Other Changes

Contributors

Uh oh!

v3.5.1

What's Changed

Bug fixes 🐛

Exciting New Features 🎉

🧠 Models

📖 Documentation and examples

👒 Dependencies

Other Changes

New Contributors

Contributors

Uh oh!

v3.5.0

🚀 LocalAI 3.5.0

TL;DR – What’s New in LocalAI 3.5.0 🎉

What’s New in Detail

🚀 New Backends and Model Support

✨ WebUI Improvements

🚀 Performance & Architecture Improvements

🛠️ Simplified Management – Introducing the LocalAI Launcher (Alpha)

✅ Bug Fixes & Stability Improvements

Additional Improvements

🚨 Important Notes

The Complete Local Stack for Privacy-First AI

LocalAI