Skip to content

UPSTREAM PR #17878: server : run child server on localhost#496

Open
loci-dev wants to merge 1 commit intomainfrom
upstream-PR17878-branch_aldehir-server/fix-router-inaddr-any
Open

UPSTREAM PR #17878: server : run child server on localhost#496
loci-dev wants to merge 1 commit intomainfrom
upstream-PR17878-branch_aldehir-server/fix-router-inaddr-any

Conversation

@loci-dev
Copy link

@loci-dev loci-dev commented Dec 9, 2025

Mirrored from ggml-org/llama.cpp#17878

When passing in --host 0.0.0.0, the child runs on host 0.0.0.0 and the router tries to access it at 0.0.0.0. I can't think of why the child should not always run on 127.0.0.1.

get_free_port() binds to INADDR_ANY, which should select a port that is available across all interfaces. This can be changed to INADDR_LOOPBACK if we ensure the child will only ever bind to 127.0.0.1. If not, then INADDR_ANY is a safe choice.

fixes #17862

@loci-review
Copy link

loci-review bot commented Dec 9, 2025

Explore the complete analysis inside the Version Insights

Performance Analysis Summary: PR #496

Overview

This PR implements a networking configuration fix for the llama.cpp server router, forcing child server instances to bind exclusively to localhost (127.0.0.1) rather than inheriting the router's host configuration. The changes span 2 files with 7 line additions and 2 deletions, modifying only server infrastructure code without touching inference or tokenization logic.

Performance Impact

No measurable performance impact detected. Power consumption analysis across all binaries shows changes below 0.001%:

  • build.bin.libllama.so: 0.22 nJ reduction (194,204 nJ baseline)
  • build.bin.llama-run: 1.48 nJ reduction (219,166 nJ baseline)
  • build.bin.llama-cvector-generator: 0.95 nJ increase (249,477 nJ baseline)
  • build.bin.llama-tts: 0.73 nJ increase (253,600 nJ baseline)
  • All other binaries: 0.0% change

Inference Performance: No impact on tokens per second. The modified code paths (server-models.cpp, server-models.h) handle only HTTP routing, child process management, and network configuration. Core inference functions (llama_decode, llama_encode, llama_tokenize) remain unchanged. No modifications to model loading, tokenization, sampling, KV cache, or computational graph execution.

Code Changes

The PR adds a hostname field to server_model_meta structure and implements three key modifications:

  1. Sets inst.meta.hostname = "127.0.0.1" during model instance initialization
  2. Passes --host 127.0.0.1 explicitly to child server processes via command-line arguments
  3. Updates proxy connection logic to use meta->hostname instead of base_params.hostname

These changes resolve a routing failure where child servers bound to 0.0.0.0 were unreachable by the router. The fix is purely correctness-focused with no computational overhead—localhost connections have identical latency characteristics to the previous configuration, and the additional command-line argument adds negligible parsing overhead.

Security improvement: Child servers are no longer exposed on external network interfaces, reducing attack surface while maintaining functional equivalence for local routing scenarios.

@loci-dev loci-dev force-pushed the main branch 27 times, most recently from 4f731df to 8e6f6e8 Compare December 12, 2025 15:09
@loci-dev loci-dev force-pushed the main branch 30 times, most recently from b9ba67d to 320a1fc Compare December 17, 2025 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants