refactor: Simplify Tauri plugin calls and update 'FA' setting #6779

qnixsynapse · 2025-10-13T14:02:52Z

Describe Your Changes

This commit introduces significant improvements to the llama.cpp extension, focusing on the 'Flash Attention' setting and refactoring Tauri plugin interactions for better code clarity and maintenance.

The backend interaction is streamlined by removing the unnecessary libraryPath argument from the Tauri plugin commands for loading models and listing devices.

Simplified API Calls: The loadLlamaModel, unloadLlamaModel, and get_devices functions in both the extension and the Tauri plugin now manage the library path internally based on the backend executable's location.
Decoupled Logic: The extension (src/index.ts) now uses the new, simplified Tauri plugin functions, which enhances modularity and reduces boilerplate code in the extension.
Type Consistency: Added UnloadResult interface to guest-js/index.ts for consistency.
Updated UI Control: The 'Flash Attention' setting in settings.json is changed from a boolean checkbox to a string-based dropdown, offering 'auto', 'on', and 'off' options.
Improved Logic: The extension logic in src/index.ts is updated to correctly handle the new string-based flash_attn configuration. It now passes the string value ('auto', 'on', or 'off') directly as a command-line argument to the llama.cpp backend, simplifying the version-checking logic previously required for older llama.cpp versions. The old, complex logic tied to specific backend versions is removed.

This refactoring cleans up the extension's codebase and moves environment and path setup concerns into the Tauri plugin where they are most relevant.

Fixes Issues

Closes #
Closes #

Self Checklist

Added relevant comments, esp in complex areas
Updated docs (for bug fixes / features)
Created issues for follow-up changes or refactoring needed

github-actions · 2025-10-13T14:08:40Z

Barecheck - Code coverage report

Total: 30.06%

Your code coverage diff: -0.01% ▾

✅ All code changes are covered

…etting This commit introduces significant improvements to the llama.cpp extension, focusing on the 'Flash Attention' setting and refactoring Tauri plugin interactions for better code clarity and maintenance. The backend interaction is streamlined by removing the unnecessary `libraryPath` argument from the Tauri plugin commands for loading models and listing devices. * **Simplified API Calls:** The `loadLlamaModel`, `unloadLlamaModel`, and `get_devices` functions in both the extension and the Tauri plugin now manage the library path internally based on the backend executable's location. * **Decoupled Logic:** The extension (`src/index.ts`) now uses the new, simplified Tauri plugin functions, which enhances modularity and reduces boilerplate code in the extension. * **Type Consistency:** Added `UnloadResult` interface to `guest-js/index.ts` for consistency. * **Updated UI Control:** The 'Flash Attention' setting in `settings.json` is changed from a boolean checkbox to a string-based dropdown, offering **'auto'**, **'on'**, and **'off'** options. * **Improved Logic:** The extension logic in `src/index.ts` is updated to correctly handle the new string-based `flash_attn` configuration. It now passes the string value (`'auto'`, `'on'`, or `'off'`) directly as a command-line argument to the llama.cpp backend, simplifying the version-checking logic previously required for older llama.cpp versions. The old, complex logic tied to specific backend versions is removed. This refactoring cleans up the extension's codebase and moves environment and path setup concerns into the Tauri plugin where they are most relevant.

This commit introduces a functional flag for embedding models and refactors the backend detection logic for cleaner implementation. Key changes: - Embedding Support: The loadLlamaModel API and SessionInfo now include an isEmbedding: boolean flag. This allows the core process to differentiate and correctly initialize models intended for embedding tasks. - Backend Naming Simplification (Refactor): Consolidated the CPU-specific backend tags (e.g., win-noavx-x64, win-avx2-x64) into generic *-common_cpus-x64 variants (e.g., win-common_cpus-x64). This streamlines supported backend detection. - File Structure Update: Changed the download path for CUDA runtime libraries (cudart) to place them inside the specific backend's directory (/build/bin/) rather than a shared lib folder, improving asset isolation.

Previously the condition for `flash_attn` was always truthy, causing unnecessary or incorrect `--flash-attn` arguments to be added. The `main_gpu` check also used a loose inequality which could match values that were not intended. The updated logic uses strict comparison and correctly handles the empty string case, ensuring the command line arguments are generated only when appropriate.

louis-jan

LGTM

github-project-automation bot added this to Jan Oct 13, 2025

github-actions bot assigned qnixsynapse Oct 13, 2025

qnixsynapse force-pushed the refactor/backend branch from f1877f8 to 7a4383d Compare October 15, 2025 04:04

qnixsynapse added 4 commits October 29, 2025 08:00

fix: compare

7b6e4cd

fix mmap settings and adjust flash attention

1f4977c

qnixsynapse force-pushed the refactor/backend branch from 34446b1 to 1f4977c Compare October 29, 2025 02:32

qnixsynapse mentioned this pull request Nov 1, 2025

bug: looks like llamacpp-extension is always setting --no-mmap #6801

Closed

qnixsynapse linked an issue Nov 1, 2025 that may be closed by this pull request

bug: looks like llamacpp-extension is always setting --no-mmap #6801

Closed

louis-jan approved these changes Nov 1, 2025

View reviewed changes

qnixsynapse merged commit b2a8efd into dev Nov 1, 2025
17 checks passed

qnixsynapse deleted the refactor/backend branch November 1, 2025 18:00

github-project-automation bot moved this to QA in Jan Nov 1, 2025

github-actions bot added this to the v0.7.4 milestone Nov 1, 2025

louis-jan mentioned this pull request Nov 3, 2025

bug: can't load models on dev #6856

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: Simplify Tauri plugin calls and update 'FA' setting #6779

refactor: Simplify Tauri plugin calls and update 'FA' setting #6779

Uh oh!

qnixsynapse commented Oct 13, 2025

Uh oh!

github-actions bot commented Oct 13, 2025 •

edited

Loading

Uh oh!

louis-jan left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

refactor: Simplify Tauri plugin calls and update 'FA' setting #6779

refactor: Simplify Tauri plugin calls and update 'FA' setting #6779

Uh oh!

Conversation

qnixsynapse commented Oct 13, 2025

Describe Your Changes

Fixes Issues

Self Checklist

Uh oh!

github-actions bot commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Barecheck - Code coverage report

Uh oh!

louis-jan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Oct 13, 2025 •

edited

Loading