UPSTREAM PR #18110: server: (router) allow child process to report status via stdout by loci-dev · Pull Request #595 · auroralabs-loci/llama.cpp

loci-dev · 2025-12-16T18:44:27Z

In case the router listen on a specific address other than 127.0.0.1, the child process will fail to report its status back to the router

This change complete replace this reporting mechanism to using pipe (stdout) instead.

loci-review · 2025-12-16T19:33:54Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary: PR #595

Overview

This PR refactors the router-child IPC mechanism from HTTP-based to stdout pipe-based communication. Analysis shows no performance impact on inference paths. The changes affect only the model loading initialization sequence, which occurs once per model instance startup.

Key Findings

Inference Performance Impact: None

No changes detected in tokenization or inference functions. The following critical functions remain unmodified:

llama_decode
llama_encode
llama_tokenize
ggml_mul_mat
llama_build_graph

Tokens per second: No impact expected. The refactored code executes only during model initialization, not during token generation. Request proxying and inference hot paths are unchanged.

Startup Performance Improvement

The modified server_models::load() and setup_child_server() functions show reduced latency:

Eliminated HTTP client instantiation overhead (approximately 1000000 ns per model load)
Removed JSON serialization and HTTP POST request (approximately 2000000 ns per operation)
Replaced with stdout write operation (approximately 100000 ns)

Net improvement: approximately 2900000 ns per child process startup.

Power Consumption Analysis

All analyzed binaries show negligible change:

build.bin.libllama.so: 0 nJ change (186068 nJ baseline)
build.bin.llama-run: 0 nJ change (222960 nJ baseline)
build.bin.llama-cvector-generator: -1 nJ change (255554 nJ baseline)
build.bin.llama-tts: 0 nJ change (259957 nJ baseline)

Remaining 12 binaries show 0 nJ change. Total power consumption difference across all binaries: -1 nJ (negligible).

Modified Functions

The changes affect non-inference code paths:

server_models::load(): Adds stdout parsing logic with strstr() overhead (approximately 2000 ns per log line, one-time during startup)
server_models::setup_child_server(): Replaces HTTP POST with stdout write
Removed post_router_models_status HTTP endpoint handler

Code Changes

The PR implements a protocol change for status reporting between router and child processes. The modification eliminates network stack usage for local IPC, replacing it with direct pipe communication. This addresses a configuration bug where routers listening on non-localhost addresses prevented child status reporting.

server: (router) allow child process to report status via stdout

1bc37a0

loci-dev temporarily deployed to PROD__AL_DEMO December 16, 2025 18:44 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 27 times, most recently from e02e9be to 9f1f66d Compare December 19, 2025 11:08

loci-dev force-pushed the main branch 30 times, most recently from 006b713 to 51e2c27 Compare December 25, 2025 04:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #18110: server: (router) allow child process to report status via stdout#595

UPSTREAM PR #18110: server: (router) allow child process to report status via stdout#595
loci-dev wants to merge 1 commit intomainfrom
upstream-PR18110-branch_ngxson-xsn/router_cmd_stdout

loci-dev commented Dec 16, 2025

Uh oh!

loci-review bot commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Dec 16, 2025

Uh oh!

loci-review bot commented Dec 16, 2025

Performance Analysis Summary: PR #595

Overview

Key Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants