UPSTREAM PR #19109: llama : disable Direct IO by default by loci-dev · Pull Request #1040 · auroralabs-loci/llama.cpp

loci-dev · 2026-01-26T11:36:38Z

Mirrored from ggml-org/llama.cpp#19109

ref ggml-org/llama.cpp#19035 (comment)

wip

loci-review · 2026-01-26T13:04:43Z

Performance Review Report: llama.cpp Base → Target Version

Impact Classification: Minor Impact

Commit: 5ef960f - "llama : disable Direct IO by default" by Georgi Gerganov
Files Changed: 5 modified, 37 added, 3 deleted
Functions Analyzed: 13 functions across 3 binaries (libllama.so, llama-cvector-generator, llama-tts)

Executive Summary

Performance changes stem from build configuration differences (Debug vs Release) and I/O strategy optimization (Direct I/O → buffered I/O), not algorithmic modifications. All analyzed functions operate in non-critical paths (initialization, argument parsing, utilities). Core inference operations remain unchanged.

Key Findings

Genuine Optimization (2 functions):

operator() mmap flag handler: -66.71ns (-80%) in both cvector-generator and llama-tts. Removed coupling between mmap and Direct I/O options, achieving 5x speedup through code simplification.

Build Configuration Effects (2 functions):

std::_Rb_tree::begin(): +182ns (+220%) in both binaries. Debug builds with _GLIBCXX_ASSERTIONS add STL validation overhead. Zero impact on Release builds.

I/O Strategy Trade-offs (3 functions in libllama.so):

Token sorting comparator: +176ns (+138%) per comparison. Trades individual operation latency for bulk throughput—appropriate for batch inference.
KV cache hashtable begin(): -186ns (-64%). Genuine improvement in moderately sensitive KV cache operations.
Bigram iterator: +57ns (+48%). Minimal overhead in tokenization preprocessing.

Compiler Variations (6 functions):

Mixed results from compiler optimization differences. Notable: vector::begin() improved -180ns (-68%), json::get<bool>() improved -181ns (-75%). Others show negligible changes in initialization code.

Performance-Critical Assessment

Zero impact on critical paths: Matrix multiplication, attention computation, quantization, and sampling algorithms unchanged. Analyzed functions contribute <0.1% to total inference time.

Power consumption: Negligible change (within measurement noise). Startup phase shows slight improvement from buffered I/O; inference overhead is <0.01%.

GPU/ML operations: Zero changes to CUDA, Metal, HIP, Vulkan backends or ML kernels.

Conclusion

Changes represent intentional I/O flexibility improvements with acceptable trade-offs. The 80% speedup in argument parsing demonstrates genuine optimization through architectural improvement. Other changes reflect build configuration differences (Debug assertions) or I/O strategy optimization for typical LLM workloads. No performance concerns for production deployments.

See the complete breakdown in Version Insights
Have questions? Tag @loci-dev to ask about this PR.

llama : disable Direct IO by default

5ef960f

loci-dev temporarily deployed to PROD__AL_DEMO January 26, 2026 11:36 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 27 times, most recently from 62bf34b to 10471d1 Compare January 29, 2026 13:31

loci-dev force-pushed the main branch 30 times, most recently from 9216bda to 6b41339 Compare February 1, 2026 00:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #19109: llama : disable Direct IO by default#1040

UPSTREAM PR #19109: llama : disable Direct IO by default#1040
loci-dev wants to merge 1 commit intomainfrom
upstream-PR19109-branch_ggml-org-gg/llama-dio-off

loci-dev commented Jan 26, 2026

Uh oh!

loci-review bot commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Jan 26, 2026

Uh oh!

loci-review bot commented Jan 26, 2026

Performance Review Report: llama.cpp Base → Target Version

Impact Classification: Minor Impact

Executive Summary

Key Findings

Performance-Critical Assessment

Conclusion

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants