UPSTREAM PR #18059: webui: Client-side implementation of tool calling (with two tools) by loci-dev · Pull Request #585 · auroralabs-loci/llama.cpp

loci-dev · 2025-12-16T03:04:09Z

This PR allows webui to give models access to two tools: a calculator and a code interpreter. The calculator is a simple expression calculator, used to enhance math abilities. The code interpreter runs arbitrary JavaScript in a (relatively isolated) Web Worker, and returns the output to the model, which can be used for more advanced analysis.

This PR also lays the groundwork for a modular tool system, such that one could easily imagine adding a Canvas tool or a Web Search tool.

AI Disclosure: I spent about 8 hours yesterday developing this with significant assistance from AI. I'm perfectly capable of writing this kind of frontend code if I had the time, but this was just a fun project. I'm sharing this PR because the result generally worked well, but I have not had time to ensure that all of the code meets my quality standards. I chose to share it anyways because I feel like tool calling is an essential feature that has been missing so far, and this implementation results in an elegant, effective user experience. I may have more time to carefully review the code changes in the near future, in which case I will update this description and the PR as needed, but I figured there was no harm in making this available in case other people were interested in having tool calling in their llama-server webui.

When an assistant message emits tool calls, the web UI...

Executes any enabled tools locally in the browser
Persists the results as role: tool messages linked via tool_call_id (including execution duration)
Automatically continues generation with a follow-up completion request that includes the tool outputs

Included tools

Calculator: evaluates a constrained math expression syntax (operators + selected Math.* functions/constants).
Code Interpreter (JavaScript): runs arbitrary JS in a Web Worker with a configurable timeout, capturing console
output + the final evaluated value, with improved error reporting (line/column/snippet).

UX changes

Collapses assistant→tool→assistant chains into a single assistant “reasoning” thread and renders tool calls inline
(arguments + result + timing) to avoid extra message bubbles.
- This is probably where most of the complexity in this PR is, but it is essential to getting a good UX here. The simplest possible implementation involved creating a message bubble as the model started reasoning, then creating a separate message bubble for a tool call, then another message bubble as the model continued reasoning, and so on. It was essentially unusable. Having the UI layer collapse all of these related messages into one continuous message mirrors the experience that users expect.

Configuration & extensibility

Introduces a small tool registry so tools self-register with their schema + settings; the Settings UI auto-populates
a Tools section (toggles + per-tool fields like timeout), and defaults are derived from tool registrations.

Tests

Adds unit + browser/e2e coverage for interpreter behavior, inline tool rendering, timeout settings UI, streaming
reactivity/regressions, etc. These tests were created when bugs were encountered. I would be perfectly fine with throwing most of them away, but I figured there was no harm in including them.

Videos

Calculator tool

Screen.Recording.2025-12-15.at.8.10.04.AM.mov

Code Interpreter tool

Screen.Recording.2025-12-15.at.8.11.08.AM.mov

Code interpreter and calculator, including the model recovering from a syntax error in its first code interpreter attempt

Screen.Recording.2025-12-15.at.8.14.48.AM.mov

Demonstrating how tool calling works for an Instruct model

Screen.Recording.2025-12-15.at.8.12.32.AM.mov

Demonstrating how the regenerate button will correctly treat the entire response as one message, instead of regenerating just the last segment after the last tool call.

Screen.Recording.2025-12-15.at.8.39.48.AM.mov

Deleting an entire response

Screen.Recording.2025-12-15.at.8.53.48.AM.mov

Screenshots

New Settings Screen for Tools

Known Bugs

The delete button dialog pops up a count of messages that will be deleted, but the user would only expect that they are deleting "one" message.
Sometimes the server returns that there was an error in the input stream after a tool call, and I haven't been able to reliably reproduce that.

loci-review · 2025-12-16T03:20:32Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary: PR #585

Scope: Web UI tool calling implementation (client-side only)
Analysis Result: No performance impact on core inference engine

Analysis Findings

This PR modifies exclusively frontend TypeScript/Svelte code in tools/server/webui/. No changes were made to the core llama.cpp inference binaries (libllama.so, libggml.so, libggml-cpu.so) or C/C++ source files in src/ or ggml/ directories.

Performance Metrics Status:

Function-level comparison data unavailable for specified versions
Power consumption analysis shows 0% change for all GGML core libraries
Six binaries show 100% power consumption reduction, indicating data collection issues rather than actual code removal

Code Changes:

Added client-side tool execution framework (Calculator, JavaScript Code Interpreter)
Implemented message chain collapsing for assistant-tool-assistant sequences
Modified Svelte reactivity patterns for streaming updates
Enhanced UI rendering for inline tool call display

Inference Performance Impact:

Tokens per second: No impact. Functions responsible for tokenization and inference (llama_decode, llama_encode, llama_tokenize) are unchanged
Response Time: No impact on server-side processing
Throughput: No impact on model execution

Browser Performance:

Chain merging algorithm adds O(n × m) complexity during message rendering
Array immutability pattern increases memory allocations during streaming
Tool execution adds 50-500ms latency per code interpreter call (client-side only)

Power Consumption:

All core binaries (libggml-base.so, libggml-cpu.so, libggml.so) show 0% change
Data unavailability for libllama.so prevents complete analysis
No changes to computational workload in inference pipeline

This PR enhances user experience without affecting the underlying LLM inference performance.

loci-review · 2025-12-16T18:41:19Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary: PR #585 - WebUI Tool Calling

Condition 1 Applied: No Performance Impact

PR #585 implements client-side tool calling (calculator and JavaScript code interpreter) entirely within the WebUI frontend. All 3,605 additions and 413 deletions are confined to TypeScript/Svelte files under tools/server/webui/. Zero modifications to C++ inference engine, GGML tensor operations, or model execution paths. Power consumption analysis confirms all binaries unchanged: libllama.so, libggml.so, libggml-cpu.so, and inference tools show 0.0% variation. No functions within Performance-Critical Areas (matrix operations, attention mechanisms, KV cache, quantization kernels) are modified. Token generation throughput remains unaffected as llama_decode, llama_encode, and llama_tokenize are untouched. This is a pure UI feature addition with no backend performance implications.

…l and (javascript) code interpreter tool

loci-review · 2026-01-05T07:33:11Z

Explore the complete analysis inside the Version Insights

I've generated a summary report for your project. The analysis shows that Pull Request #585 in the llama.cpp repository (owned by auroralabs-loci) has no significant performance impact.

Key findings:

✅ No modified functions show performance changes greater than 2%
✅ Both response time and throughput time remain stable
✅ The PR is performance-neutral and safe to merge from a performance perspective

The comparison between the base version (a7cb4ab1-e9f6-11f0-81f2-dbb430499cb5) and target version (bec12ad1-ea02-11f0-81f2-dbb430499cb5) indicates that the changes maintain performance stability without introducing any concerning regressions.

loci-dev temporarily deployed to PROD__AL_DEMO December 16, 2025 03:04 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 6 times, most recently from ed74e7e to 765e416 Compare December 16, 2025 14:10

loci-dev force-pushed the upstream-PR18059-branch_coder543-master branch from 3651c2e to 90ec9d1 Compare December 16, 2025 17:39

loci-dev temporarily deployed to PROD__AL_DEMO December 16, 2025 17:39 — with GitHub Actions Inactive

loci-dev force-pushed the main branch from 765e416 to 3c6cece Compare December 16, 2025 18:12

loci-dev force-pushed the main branch 18 times, most recently from a014a6b to eda9f43 Compare December 18, 2025 10:09

loci-dev force-pushed the main branch 20 times, most recently from a05d3f0 to c9b2647 Compare December 23, 2025 02:45

loci-dev force-pushed the upstream-PR18059-branch_coder543-master branch from 90ec9d1 to 5eb268e Compare December 23, 2025 03:52

loci-dev force-pushed the main branch 7 times, most recently from 8a19571 to 6b83243 Compare December 23, 2025 15:09

webui: Client-side implementation of tool calling with calculator too…

54f71ef

…l and (javascript) code interpreter tool

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #18059: webui: Client-side implementation of tool calling (with two tools)#585

UPSTREAM PR #18059: webui: Client-side implementation of tool calling (with two tools)#585
loci-dev wants to merge 1 commit intomainfrom
upstream-PR18059-branch_coder543-master

loci-dev commented Dec 16, 2025

Uh oh!

loci-review bot commented Dec 16, 2025

Uh oh!

loci-review bot commented Dec 16, 2025

Uh oh!

loci-review bot commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Dec 16, 2025

When an assistant message emits tool calls, the web UI...

Included tools

UX changes

Configuration & extensibility

Tests

Videos

Calculator tool

Code Interpreter tool

Code interpreter and calculator, including the model recovering from a syntax error in its first code interpreter attempt

Demonstrating how tool calling works for an Instruct model

Demonstrating how the regenerate button will correctly treat the entire response as one message, instead of regenerating just the last segment after the last tool call.

Deleting an entire response

Screenshots

New Settings Screen for Tools

Known Bugs

Uh oh!

loci-review bot commented Dec 16, 2025

Performance Analysis Summary: PR #585

Analysis Findings

Uh oh!

loci-review bot commented Dec 16, 2025

Performance Analysis Summary: PR #585 - WebUI Tool Calling

Uh oh!

loci-review bot commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants