Skip to content

Conversation

@dinhlongviolin1
Copy link
Contributor

Describe Your Changes

Fixes Issues

  • Closes #
  • Closes #

Self Checklist

  • Added relevant comments, esp in complex areas
  • Updated docs (for bug fixes / features)
  • Created issues for follow-up changes or refactoring needed

urmauur and others added 30 commits September 12, 2025 10:58
…-dom

fix: avoid error validate nested DOM
#6431)

* fix: correct context shift flag handling in LlamaCPP extension

The previous implementation added the `--no-context-shift` flag when `cfg.ctx_shift` was disabled, which conflicted with the llama.cpp CLI where the presence of `--context-shift` enables the feature.
The logic is updated to push `--context-shift` only when `cfg.ctx_shift` is true, ensuring the extension passes the correct argument and behaves as expected.

* feat: detect model out of context during generation

---------

Co-authored-by: Dinh Long Nguyen <[email protected]>
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
- Removed the unused `getKVCachePerToken` helper and replaced it with a unified `estimateKVCache` that returns both total size and per‑token size.
- Fixed the KV cache size calculation to account for all layers, correcting previous under‑estimation.
- Added proper clamping of user‑requested context lengths to the model’s maximum.
- Refactored VRAM budgeting: introduced explicit reserves, fixed engine overhead, and separate multipliers for VRAM and system RAM based on memory mode.
- Implemented a more robust planning flow with clear GPU, Hybrid, and CPU pathways, including fallback configurations when resources are insufficient.
- Updated default context length handling and safety buffers to prevent OOM situations.
- Adjusted usable memory percentage to 90 % and refined logging for easier debugging.
The Llama.cpp backend can emit the phrase “failed to allocate” when it runs out of memory.
Adding this check ensures such messages are correctly classified as out‑of‑memory errors,
providing more accurate error handling CPU backends.
Use fallback value 'high' for memory_util config and remove unused GgufMetadata import.
fix: immediate update value model selection
fix: validate mmproj from general basename
github-roushan and others added 20 commits September 19, 2025 11:36
…nd-videos

docs: update url for gifs and videos
fix(number-input): preserve '0.0x' format when typing (#6520)
feat: fix remote provider vision capability
* fix: prevent consecutive messages with same role

* fix: tests

* fix: first message should not be assistant

* fix: tests
…e-mcp

enhancement: toaster delete mcp server
* feat: Prompt progress when streaming

- BE changes:
    - Add a `return_progress` flag to `chatCompletionRequest` and a corresponding `prompt_progress` payload in `chatCompletionChunk`. Introduce `chatCompletionPromptProgress` interface to capture cache, processed, time, and total token counts.
    - Update the Llamacpp extension to always request progress data when streaming, enabling UI components to display real‑time generation progress and leverage llama.cpp’s built‑in progress reporting.

* Make return_progress optional

* chore: update ui prompt progress before streaming content

* chore: remove log

* chore: remove progress when percentage >= 100

* chore: set timeout prompt progress

* chore: move prompt progress outside streaming content

* fix: tests

---------

Co-authored-by: Faisal Amir <[email protected]>
Co-authored-by: Louis <[email protected]>
* feat: add getTokensCount method to compute token usage

Implemented a new async `getTokensCount` function in the LLaMA.cpp extension.
The method validates the model session, checks process health, applies the request template, and tokenizes the resulting prompt to return the token count. Includes detailed error handling for crashed models and API failures, enabling callers to assess token usage before sending completions.

* Fix: typos

* chore: update ui token usage

* chore: remove unused code

* feat: add image token handling for multimodal LlamaCPP models

Implemented support for counting image tokens when using vision-enabled models:
- Extended `SessionInfo` with optional `mmprojPath` to store the multimodal project file.
- Propagated `mmproj_path` from the Tauri plugin into the session info.
- Added import of `chatCompletionRequestMessage` and enhanced token calculation logic in the LlamaCPP extension:
- Detects image content in messages.
- Reads GGUF metadata from `mmprojPath` to compute accurate image token counts.
- Provides a fallback estimation if metadata reading fails.
- Returns the sum of text and image tokens.
- Introduced helper methods `calculateImageTokens` and `estimateImageTokensFallback`.
- Minor clean‑ups such as comment capitalization and debug logging.

* chore: update FE send params message include content type image_url

* fix mmproj path from session info and num tokens calculation

* fix: Correct image token estimation calculation in llamacpp extension

This commit addresses an inaccurate token count for images in the llama.cpp extension.

The previous logic incorrectly calculated the token count based on image patch size and dimensions. This has been replaced with a more precise method that uses the clip.vision.projection_dim value from the model metadata.

Additionally, unnecessary debug logging was removed, and a new log was added to show the mmproj metadata for improved visibility.

* fix per image calc

* fix: crash due to force unwrap

---------

Co-authored-by: Faisal Amir <[email protected]>
Co-authored-by: Louis <[email protected]>
* fix: custom fetch for all providers

* fix: run in development should use built-in fetch
* fix: prevent relocation to root directories

* Update web-app/src/locales/zh-TW/settings.json

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

---------

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* feat: implement conversation endpoint

* use conversation aware endpoint

* fetch message correctly

* preserve first message

* fix logout

* fix broadcast issue locally + auth not refreshing profile on other tabs+ clean up and sync messages

* add is dev tag
@ellipsis-dev
Copy link
Contributor

ellipsis-dev bot commented Sep 23, 2025

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at [email protected]


Generated with ❤️ by ellipsis.dev

Copy link
Contributor

@louis-jan louis-jan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dinhlongviolin1 dinhlongviolin1 merged commit 7413f13 into dev-web Sep 23, 2025
10 of 11 checks passed
@github-project-automation github-project-automation bot moved this to QA in Jan Sep 23, 2025
@github-actions github-actions bot added this to the v0.7.0 milestone Sep 23, 2025
@github-actions
Copy link
Contributor

Preview URL: https://6dce4553.docs-9ba.pages.dev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

10 participants