Add FrameForge Studio voice command integration via Whisper.cpp + Llama.cpp by Copilot · Pull Request #2 · TheOriginalBytePlayer/llama.cpp

Copilot · 2025-12-29T10:15:00Z

Description

Implements voice command system for FrameForge Studio using Whisper for speech-to-text and Llama for intent classification. Audio input is transcribed, parsed into structured JSON commands, validated, and sent to a 32-bit bridge via named pipes.

Architecture:

Schema: JSON command structure with 4 action groups (CAMERA_CONTROL, ACTOR_POSE, OBJECT_MGMT, SHOT_MGMT) and 14 verbs (PAN, TILT, DOLLY, etc.)
Intent Engine: Whisper transcribes audio → Llama classifies intent with structured prompt → outputs JSON with verb, subject, parameters
Validation: CommandValidator checks required parameters per verb, generates clarification requests for missing data
IPC: Cross-platform named pipes (Windows/Linux) with length-prefixed protocol

Key Features:

Handles misspellings (e.g., "PIN" → "PAN")
Supports complex pose commands with joint rotation arrays
Test mode for audio file processing
Server mode for IPC daemon

Example:

# Test mode
./frameforge-sidecar -wm whisper-base.en.bin -lm llama-3-8b.gguf -a command.wav

# Server mode
./frameforge-sidecar -wm whisper-base.en.bin -lm llama-3-8b.gguf -p frameforge_pipe

Input: "Pan the camera left"
Output: {"verb": "PAN", "subject": "Camera1", "parameters": {"direction": "LEFT"}}

Changes

Add tools/frameforge/: Schema, validator, JSON serialization, IPC implementation, main sidecar app
Add tests/test-frameforge-validator.cpp: Unit tests (9 test cases)
Update tools/CMakeLists.txt: Include frameforge subdirectory
Update tests/CMakeLists.txt: Add frameforge validator tests

Components:

frameforge-schema.{h,cpp}: Command schema with enum converters, action group mapping, required parameter lookup
frameforge-validator.{h,cpp}: Validation logic, JSON parsing, clarification request generation
frameforge-json.{h,cpp}: JSON serialization using vendored nlohmann/json
frameforge-ipc.{h,cpp}: Named pipe server/client with 1MB message limit
frameforge-sidecar.cpp: Main application integrating Whisper + Llama, WAV file reader, tokenization

Note: This PR contains AI-generated code that has been tested and validated. All tests pass with 100% success rate.

Original prompt

With whisper.cpp now a submodule of llama.cpp, review the original directive, which follows below, and integrate the whisper functionality as it describes into the same single project.

The Schema: Define a JSON-based command schema. Every command must have a Verb (e.g., PAN, LEAN, ADD, DELETE, SHOT) and a Subject (e.g., Camera1, Tom). Verbs must have associated Parameters (Direction, Degrees, Speed, Target, PoseDescription).

The Intent Engine: When audio is received and transcribed by Whisper, send the text to Llama-3 with a strict system prompt for Intent Classification:

'Analyze the following user input for FrameForge Studio. Map the input to one of these Action Groups: [CAMERA_CONTROL, ACTOR_POSE, OBJECT_MGMT, SHOT_MGMT].'

'Extract parameters. If the user says "PIN LEFT", match it to the nearest valid verb "PAN" with direction "LEFT".'

'If Action Group is ACTOR_POSE, generate a JSON array of joint rotations for the described pose.'

'Return ONLY a valid JSON object.'

Command Validation Logic: Implement a C++ CommandValidator class that checks the LLM's JSON against the required slots for that specific verb. If a required slot (like Direction for PAN) is missing, return an error object asking the user for clarification.

Communication: Maintain the 64-bit Sidecar as a resident process. Use a high-speed IPC (Inter-Process Communication) like Named Pipes to send the validated JSON objects to the 32-bit FrameForge Bridge."

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: TheOriginalBytePlayer <[email protected]>

Copilot

Pull request overview

This PR adds voice command integration to FrameForge Studio by implementing a sidecar process that combines Whisper.cpp for speech-to-text transcription and Llama.cpp for intent classification. The system processes voice commands through a structured pipeline: audio transcription → intent classification → command validation → IPC transmission to a 32-bit bridge.

Key changes include:

Command schema system with 4 action groups and 14 verbs covering camera control, actor posing, object management, and shot management
CommandValidator with JSON parsing, parameter validation, and clarification request generation
Cross-platform IPC implementation using named pipes (Windows/Linux) with length-prefixed protocol

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 22 comments.

Show a summary per file

File	Description
tools/frameforge/frameforge-schema.h	Defines command schema enums (ActionGroup, Verb, Direction) and core data structures
tools/frameforge/frameforge-schema.cpp	Implements enum conversions, action group mappings, and required parameter lookup with misspelling support
tools/frameforge/frameforge-validator.h	Declares CommandValidator interface for validation and clarification generation
tools/frameforge/frameforge-validator.cpp	Implements validation logic with JSON parsing and parameter checking
tools/frameforge/frameforge-json.h	Declares JSON serialization/deserialization functions
tools/frameforge/frameforge-json.cpp	Implements bidirectional JSON conversion for Command objects
tools/frameforge/frameforge-ipc.h	Declares cross-platform IPC server/client classes
tools/frameforge/frameforge-ipc.cpp	Implements named pipe communication with platform-specific code for Windows/Unix
tools/frameforge/frameforge-sidecar.cpp	Main application integrating Whisper transcription, Llama classification, WAV processing, and command handling
tools/frameforge/CMakeLists.txt	Build configuration linking common, llama, and whisper libraries
tools/CMakeLists.txt	Adds frameforge subdirectory to build system
tools/frameforge/README.md	Comprehensive documentation covering architecture, usage, examples, and development guidelines
tests/test-frameforge-validator.cpp	Unit tests for schema, validation, and JSON serialization (9 test cases)
tests/CMakeLists.txt	Adds frameforge validator test executable to test suite

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-29T12:58:49Z

tools/frameforge/frameforge-ipc.cpp

+    if (msg_size == 0 || msg_size > MAX_MESSAGE_SIZE) {
+        return "";
+    }


The message size validation allows sizes up to MAX_MESSAGE_SIZE (1MB), but there's no check to ensure msg_size is greater than 0 before attempting to read. While msg_size == 0 is checked, the allocation of an empty string and the read attempt could be avoided by returning early when msg_size is 0.

@copilot apply changes based on this feedback

Copilot · 2025-12-29T12:58:49Z

tools/frameforge/frameforge-ipc.cpp

+    if (msg_size == 0 || msg_size > MAX_MESSAGE_SIZE) {
+        return "";
+    }


The same message size validation issue exists here. When msg_size is 0 or exceeds MAX_MESSAGE_SIZE, an early return happens after the check, but the logic could be clearer by checking msg_size > 0 before the upper bound check to avoid allocating an empty string.

@copilot apply changes based on this feedback

tools/frameforge/frameforge-sidecar.cpp

Copilot · 2025-12-29T12:58:50Z

tools/frameforge/frameforge-sidecar.cpp

+1. If user says "PIN", map it to "PAN" verb
+2. If Action Group is ACTOR_POSE, generate a JSON array of joint rotations for the described pose
+3. Infer missing subjects when context is clear (e.g., "camera" for camera commands)
+4. Return ONLY a valid JSON object with this structure:


The INTENT_SYSTEM_PROMPT describes mapping misspellings like "PIN" to "PAN", but this is already handled in the string_to_verb function. Having this instruction in the LLM prompt may be redundant and could cause confusion if the LLM tries to do the correction itself when it's already handled at the parsing layer.

Suggested change

1. If user says "PIN", map it to "PAN" verb

2. If Action Group is ACTOR_POSE, generate a JSON array of joint rotations for the described pose

3. Infer missing subjects when context is clear (e.g., "camera" for camera commands)

4. Return ONLY a valid JSON object with this structure:

1. If Action Group is ACTOR_POSE, generate a JSON array of joint rotations for the described pose

2. Infer missing subjects when context is clear (e.g., "camera" for camera commands)

3. Return ONLY a valid JSON object with this structure:

@copilot Please implement this suggestion.

Copilot · 2025-12-29T12:58:50Z

tools/frameforge/frameforge-ipc.cpp

+    pipe_fd_ = open(pipe_path.c_str(), O_RDWR | O_NONBLOCK);
+    if (pipe_fd_ < 0) {
+        std::cerr << "Failed to open named pipe: " << strerror(errno) << std::endl;
+        unlink(pipe_path.c_str());
+        return false;
+    }


The Unix pipe implementation opens the pipe with O_RDWR | O_NONBLOCK, but non-blocking mode could cause write and read operations to fail with EAGAIN/EWOULDBLOCK. The code doesn't handle these errors, which could lead to incomplete message transmission. Consider using blocking mode or implementing proper retry logic for non-blocking I/O.

@copilot apply changes based on this feedback

Copilot · 2025-12-29T12:58:53Z

tools/frameforge/frameforge-sidecar.cpp

+    // First get the required size
+    const int                n_tokens = -llama_tokenize(vocab, prompt.c_str(), prompt.size(), NULL, 0, true, true);
+    std::vector<llama_token> tokens(n_tokens);
+    if (llama_tokenize(vocab, prompt.c_str(), prompt.size(), tokens.data(), tokens.size(), true, true) < 0) {


The tokenization call uses negative return value handling that assumes the function returns a negative value to indicate the required buffer size. However, this pattern should be documented or verified against the llama.cpp API, as it's non-standard. The comment "First get the required size" suggests this is intentional, but the unusual pattern could lead to confusion.

Suggested change

// First get the required size

const int n_tokens = -llama_tokenize(vocab, prompt.c_str(), prompt.size(), NULL, 0, true, true);

std::vector<llama_token> tokens(n_tokens);

if (llama_tokenize(vocab, prompt.c_str(), prompt.size(), tokens.data(), tokens.size(), true, true) < 0) {

// First get the required size by calling with a null buffer

const int n_tokens = llama_tokenize(vocab, prompt.c_str(), prompt.size(), nullptr, 0, true, true);

if (n_tokens < 0) {

fprintf(stderr, "Error: Failed to get token count for prompt\n");

return "";

}

std::vector<llama_token> tokens(n_tokens);

const int n_tokens_actual = llama_tokenize(vocab, prompt.c_str(), prompt.size(), tokens.data(), n_tokens, true, true);

if (n_tokens_actual < 0 || n_tokens_actual != n_tokens) {

@copilot please implement this suggestion

Copilot · 2025-12-29T12:58:53Z

tools/frameforge/frameforge-sidecar.cpp

+    constexpr int WAV_HEADER_SIZE        = 44;
+    constexpr int WAV_SAMPLE_RATE_OFFSET = 24;
+
+    char buf[256];


The WAV header reading uses a char buffer of size 256 but only reads WAV_HEADER_SIZE (44) bytes. The buffer size is unnecessarily large. Consider using a buffer sized exactly to WAV_HEADER_SIZE (44 bytes) to better reflect the actual usage.

Suggested change

char buf[256];

char buf[WAV_HEADER_SIZE];

@copilot Please implement this suggestion

Copilot · 2025-12-29T12:58:53Z

tools/frameforge/frameforge-sidecar.cpp

+    }
+
+    // Get sample rate from header (offset 24 in WAV spec)
+    sample_rate = *(int32_t *) (buf + WAV_SAMPLE_RATE_OFFSET);


The function uses pointer arithmetic (buf + WAV_SAMPLE_RATE_OFFSET) and casts to int32_t pointer without checking alignment. On some architectures, unaligned access can cause crashes or undefined behavior. Consider using memcpy to safely copy the 4 bytes into an int32_t variable.

Suggested change

sample_rate = *(int32_t *) (buf + WAV_SAMPLE_RATE_OFFSET);

int32_t sample_rate_raw;

std::memcpy(&sample_rate_raw, buf + WAV_SAMPLE_RATE_OFFSET, sizeof(sample_rate_raw));

sample_rate = sample_rate_raw;

@copilot Please implement this suggestion

Copilot · 2025-12-29T12:58:54Z

tools/frameforge/frameforge-sidecar.cpp

+            }
+        } else if (arg == "-t" || arg == "--threads") {
+            if (i + 1 < argc) {
+                params.n_threads = std::stoi(argv[++i]);


The stoi function can throw std::invalid_argument or std::out_of_range exceptions if the input is not a valid integer or is out of range. The code should validate the input or catch these exceptions to provide better error messages to the user.

@copilot apply changes based on this feedback

Copilot · 2025-12-29T12:58:54Z

tools/frameforge/README.md

+- **Unix/Linux**: `/tmp/frameforge_pipe`
+


Using a fixed pipe name in /tmp for Unix (/tmp/frameforge_pipe) means any local user on the same host can open this IPC endpoint and send or read FrameForge control messages. This allows unauthorized local processes to impersonate the 32-bit bridge or eavesdrop on commands, breaking isolation between different users’ sessions. Consider creating the FIFO in a per-user directory and with restrictive permissions (e.g., owner-only) so that only the intended user/process can access this IPC channel.

Suggested change

- **Unix/Linux**: `/tmp/frameforge_pipe`

- **Unix/Linux**: use a per-user FIFO, for example:

- `$XDG_RUNTIME_DIR/frameforge/frameforge_pipe`

- or `/tmp/frameforge-$UID/frameforge_pipe` if `XDG_RUNTIME_DIR` is not set

The per-user directory should be created with owner-only permissions (for example, `chmod 700` on the directory and `chmod 600` on the FIFO) so that only the intended user/process can access this IPC channel.

@copilot Please implement this suggestion

…olved-comments Document unresolved code review comments from merged PRs #1 and #2

Initial plan

7bf814d

Copilot AI assigned Copilot and TheOriginalBytePlayer Dec 29, 2025

Copilot started work on behalf of TheOriginalBytePlayer December 29, 2025 10:15 View session

Copilot AI and others added 4 commits December 29, 2025 10:24

Add FrameForge Studio voice command integration components

0d4986e

Co-authored-by: TheOriginalBytePlayer <[email protected]>

Fix compilation errors and complete FrameForge integration

b9584f7

Co-authored-by: TheOriginalBytePlayer <[email protected]>

Add unit tests for FrameForge validator

0320a42

Co-authored-by: TheOriginalBytePlayer <[email protected]>

Address code review feedback: add missing includes and define constants

f93c930

Co-authored-by: TheOriginalBytePlayer <[email protected]>

Copilot AI changed the title ~~[WIP] Integrate whisper functionality into llama.cpp project~~ Add FrameForge Studio voice command integration via Whisper.cpp + Llama.cpp Dec 29, 2025

Copilot AI requested a review from TheOriginalBytePlayer December 29, 2025 10:40

Copilot finished work on behalf of TheOriginalBytePlayer December 29, 2025 10:40

TheOriginalBytePlayer approved these changes Dec 29, 2025

View reviewed changes

TheOriginalBytePlayer marked this pull request as ready for review December 29, 2025 12:51

Copilot AI review requested due to automatic review settings December 29, 2025 12:51

TheOriginalBytePlayer merged commit 27fd48b into master Dec 29, 2025
2 checks passed

TheOriginalBytePlayer deleted the copilot/integrate-whisper-into-llama branch December 29, 2025 12:51

Copilot started reviewing on behalf of TheOriginalBytePlayer December 29, 2025 12:51 View session

Copilot AI reviewed Dec 29, 2025

View reviewed changes

Copilot AI mentioned this pull request Dec 29, 2025

Document unresolved code review comments from merged PRs #1 and #2 #3

Merged

10 tasks

TheOriginalBytePlayer added a commit that referenced this pull request Dec 30, 2025

Merge pull request #3 from TheOriginalBytePlayer/copilot/review-unres…

a8acd88

…olved-comments Document unresolved code review comments from merged PRs #1 and #2

-    sample_rate = *(int32_t *) (buf + WAV_SAMPLE_RATE_OFFSET);
+    int32_t sample_rate_raw;
+    std::memcpy(&sample_rate_raw, buf + WAV_SAMPLE_RATE_OFFSET, sizeof(sample_rate_raw));
+    sample_rate = sample_rate_raw;

-- **Unix/Linux**: `/tmp/frameforge_pipe`
+- **Unix/Linux**: use a per-user FIFO, for example:
+  - `$XDG_RUNTIME_DIR/frameforge/frameforge_pipe`
+  - or `/tmp/frameforge-$UID/frameforge_pipe` if `XDG_RUNTIME_DIR` is not set
+The per-user directory should be created with owner-only permissions (for example, `chmod 700` on the directory and `chmod 600` on the FIFO) so that only the intended user/process can access this IPC channel.

Conversation

Copilot AI commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Dec 29, 2025 •

edited

Loading