Skip to content

UPSTREAM PR #18059: webui: Client-side implementation of tool calling (with two tools)#1191

Open
loci-dev wants to merge 1 commit intomainfrom
loci/pr-18059-master
Open

UPSTREAM PR #18059: webui: Client-side implementation of tool calling (with two tools)#1191
loci-dev wants to merge 1 commit intomainfrom
loci/pr-18059-master

Conversation

@loci-dev
Copy link

Note

Source pull request: ggml-org/llama.cpp#18059

This PR allows webui to give models access to two tools: a calculator and a code interpreter. The calculator is a simple expression calculator, used to enhance math abilities. The code interpreter runs arbitrary JavaScript in a (relatively isolated) Web Worker, and returns the output to the model, which can be used for more advanced analysis.

This PR also lays the groundwork for a modular tool system, such that one could easily imagine adding a Canvas tool or a Web Search tool.

AI Disclosure: I spent about 8 hours yesterday developing this with significant assistance from AI. I'm perfectly capable of writing this kind of frontend code if I had the time, but this was just a fun project. I'm sharing this PR because the result generally worked well, but I have not had time to ensure that all of the code meets my quality standards. I chose to share it anyways because I feel like tool calling is an essential feature that has been missing so far, and this implementation results in an elegant, effective user experience. I may have more time to carefully review the code changes in the near future, in which case I will update this description and the PR as needed, but I figured there was no harm in making this available in case other people were interested in having tool calling in their llama-server webui.

When an assistant message emits tool calls, the web UI...

  • Executes any enabled tools locally in the browser
  • Persists the results as role: tool messages linked via tool_call_id (including execution duration)
  • Automatically continues generation with a follow-up completion request that includes the tool outputs

Included tools

  • Calculator: evaluates a constrained math expression syntax (operators + selected Math.* functions/constants).
  • Code Interpreter (JavaScript): runs arbitrary JS in a Web Worker with a configurable timeout, capturing console
    output + the final evaluated value, with improved error reporting (line/column/snippet).

UX changes

  • Collapses assistant→tool→assistant chains into a single assistant “reasoning” thread and renders tool calls inline
    (arguments + result + timing) to avoid extra message bubbles.
    • This is probably where most of the complexity in this PR is, but it is essential to getting a good UX here. The simplest possible implementation involved creating a message bubble as the model started reasoning, then creating a separate message bubble for a tool call, then another message bubble as the model continued reasoning, and so on. It was essentially unusable. Having the UI layer collapse all of these related messages into one continuous message mirrors the experience that users expect.

Configuration & extensibility

  • Introduces a small tool registry so tools self-register with their schema + settings; the Settings UI auto-populates
    a Tools section (toggles + per-tool fields like timeout), and defaults are derived from tool registrations.

Tests

  • Adds unit + browser/e2e coverage for interpreter behavior, inline tool rendering, timeout settings UI, streaming
    reactivity/regressions, etc. These tests were created when bugs were encountered. I would be perfectly fine with throwing most of them away, but I figured there was no harm in including them.

Videos

Calculator tool

Screen.Recording.2025-12-15.at.8.10.04.AM.mov

Code Interpreter tool

Screen.Recording.2025-12-15.at.8.11.08.AM.mov

Code interpreter and calculator, including the model recovering from a syntax error in its first code interpreter attempt

Screen.Recording.2025-12-15.at.8.14.48.AM.mov

Demonstrating how tool calling works for an Instruct model

Screen.Recording.2025-12-15.at.8.12.32.AM.mov

Demonstrating how the regenerate button will correctly treat the entire response as one message, instead of regenerating just the last segment after the last tool call.

Screen.Recording.2025-12-15.at.8.39.48.AM.mov

Deleting an entire response

Screen.Recording.2025-12-15.at.8.53.48.AM.mov

Screenshots

New Settings Screen for Tools

image

Known Bugs

  1. The delete button dialog pops up a count of messages that will be deleted, but the user would only expect that they are deleting "one" message.
  2. Sometimes the server returns that there was an error in the input stream after a tool call, and I haven't been able to reliably reproduce that.

@loci-review
Copy link

loci-review bot commented Feb 20, 2026

No meaningful performance changes were detected across 111666 analyzed functions in the following binaries: build.bin.libllama.so, build.bin.llama-tts, build.bin.llama-cvector-generator, build.bin.libmtmd.so, build.bin.llama-tokenize, build.bin.llama-bench, build.bin.libggml.so, build.bin.libggml-base.so, build.bin.libggml-cpu.so, build.bin.llama-gemma3-cli, build.bin.llama-gguf-split, build.bin.llama-llava-cli, build.bin.llama-minicpmv-cli, build.bin.llama-quantize, build.bin.llama-qwen2vl-cli.

🔎 Full breakdown: Loci Inspector.
💬 Questions? Tag @loci-dev.

@loci-dev loci-dev force-pushed the main branch 9 times, most recently from 6495042 to 61b4303 Compare February 28, 2026 02:16
@loci-dev loci-dev force-pushed the main branch 7 times, most recently from 9f4f332 to 4298c74 Compare March 6, 2026 02:17
@loci-review
Copy link

loci-review bot commented Mar 7, 2026

No meaningful performance changes were detected across 112638 analyzed functions in the following binaries: build.bin.llama-cvector-generator, build.bin.libllama.so, build.bin.llama-tts, build.bin.libmtmd.so, build.bin.llama-tokenize, build.bin.llama-bench, build.bin.libggml-base.so, build.bin.libggml-cpu.so, build.bin.libggml.so, build.bin.llama-gemma3-cli, build.bin.llama-gguf-split, build.bin.llama-llava-cli, build.bin.llama-minicpmv-cli, build.bin.llama-quantize, build.bin.llama-qwen2vl-cli.

🔎 Full breakdown: Loci Inspector
💬 Questions? Tag @loci-dev

@loci-dev loci-dev force-pushed the main branch 7 times, most recently from 61601b2 to 56aaa36 Compare March 13, 2026 02:16
@loci-dev loci-dev force-pushed the main branch 11 times, most recently from 945fa3a to 0e8e1d6 Compare March 20, 2026 02:17
@loci-dev loci-dev force-pushed the main branch 10 times, most recently from d997939 to 8527fd7 Compare March 27, 2026 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants