UPSTREAM PR #18110: server: (router) allow child process to report status via stdout#595
UPSTREAM PR #18110: server: (router) allow child process to report status via stdout#595
Conversation
|
Explore the complete analysis inside the Version Insights Performance Analysis Summary: PR #595OverviewThis PR refactors the router-child IPC mechanism from HTTP-based to stdout pipe-based communication. Analysis shows no performance impact on inference paths. The changes affect only the model loading initialization sequence, which occurs once per model instance startup. Key FindingsInference Performance Impact: None No changes detected in tokenization or inference functions. The following critical functions remain unmodified:
Tokens per second: No impact expected. The refactored code executes only during model initialization, not during token generation. Request proxying and inference hot paths are unchanged. Startup Performance Improvement The modified
Net improvement: approximately 2900000 ns per child process startup. Power Consumption Analysis All analyzed binaries show negligible change:
Remaining 12 binaries show 0 nJ change. Total power consumption difference across all binaries: -1 nJ (negligible). Modified Functions The changes affect non-inference code paths:
Code Changes The PR implements a protocol change for status reporting between router and child processes. The modification eliminates network stack usage for local IPC, replacing it with direct pipe communication. This addresses a configuration bug where routers listening on non-localhost addresses prevented child status reporting. |
e02e9be to
9f1f66d
Compare
006b713 to
51e2c27
Compare
Mirrored from ggml-org/llama.cpp#18110
In case the router listen on a specific address other than 127.0.0.1, the child process will fail to report its status back to the router
This change complete replace this reporting mechanism to using pipe (stdout) instead.