llama-bench: add direct_io parameter by 0cc4m · Pull Request #18778 · ggml-org/llama.cpp

0cc4m · 2026-01-12T12:09:28Z

This adds the direct-io parameter added in #18166 to llama-bench. The reasoning is that we currently have model loading issues in Vulkan that happen when using direct-io (see for example #18741), and it was not possible to work around this in llama-bench without a code change.

ggerganov

Approving to facilitate debugging the related issues.

Though my understanding is that this parameter should not affect the performance numbers of the benches - it should just affect the loading times. Is this still a valid assumption? In the future, if we spot a perf discrepancy between dio 0/1 should we treat it as a bug, or do we have actualy use cases where a difference can be expected? cc @JTischbein @jeffbolznv

0cc4m · 2026-01-13T07:23:06Z

It should not affect performance, but it is possible that it does. I think these cases should be treated as a bug.

JTischbein · 2026-01-14T13:54:28Z

@0cc4m When only setting --mmap 1 the DirectIO path will still be used if available. In the argument parsing of llama-cli, llama-server, ... we are setting use_direct_io to false if --mmap is explicitly enabled.

diff --git a/tools/llama-bench/llama-bench.cpp b/tools/llama-bench/llama-bench.cpp
index aed97e77e..1d616daf3 100644
--- a/tools/llama-bench/llama-bench.cpp
+++ b/tools/llama-bench/llama-bench.cpp
@@ -372,7 +372,7 @@ static const cmd_params cmd_params_defaults = {
     /* devices              */ { {} },
     /* tensor_split         */ { std::vector<float>(llama_max_devices(), 0.0f) },
     /* tensor_buft_overrides*/ { std::vector<llama_model_tensor_buft_override>{ { nullptr, nullptr } } },
-    /* use_mmap             */ { true },
+    /* use_mmap             */ { false },
     /* use_direct_io        */ { true },
     /* embeddings           */ { false },
     /* no_op_offload        */ { false },
@@ -1185,6 +1185,8 @@ static std::vector<cmd_params_instance> get_cmd_params_instances(const cmd_param
     for (const auto & cs : params.cpu_strict)
     for (const auto & nd : params.n_depth)
     for (const auto & pl : params.poll) {
+        bool use_dio = dio;
+        if (mmp) use_dio = false;
         for (const auto & n_prompt : params.n_prompt) {
             if (n_prompt == 0) {
                 continue;
@@ -1212,7 +1214,7 @@ static std::vector<cmd_params_instance> get_cmd_params_instances(const cmd_param
                 /* .tensor_split = */ ts,
                 /* .tensor_buft_overrides = */ ot,
                 /* .use_mmap     = */ mmp,
-                /* .use_direct_io= */ dio,
+                /* .use_direct_io= */ use_dio,
                 /* .embeddings   = */ embd,
                 /* .no_op_offload= */ nopo,
                 /* .no_host      = */ noh,
@@ -1247,7 +1249,7 @@ static std::vector<cmd_params_instance> get_cmd_params_instances(const cmd_param
                 /* .tensor_split = */ ts,
                 /* .tensor_buft_overrides = */ ot,
                 /* .use_mmap     = */ mmp,
-                /* .use_direct_io= */ dio,
+                /* .use_direct_io= */ use_dio,
                 /* .embeddings   = */ embd,
                 /* .no_op_offload= */ nopo,
                 /* .no_host      = */ noh,
@@ -1282,7 +1284,7 @@ static std::vector<cmd_params_instance> get_cmd_params_instances(const cmd_param
                 /* .tensor_split = */ ts,
                 /* .tensor_buft_overrides = */ ot,
                 /* .use_mmap     = */ mmp,
-                /* .use_direct_io= */ dio,
+                /* .use_direct_io= */ use_dio,
                 /* .embeddings   = */ embd,
                 /* .no_op_offload= */ nopo,
                 /* .no_host      = */ noh,

This would enable -dio and disable --mmap by default and disable -dio in case --mmap 1 is specified. I don't think that is a pretty solution, but I don't have a prettier idea right now.

Prettiest solution would be disabling mmap and dio by default. Then specifying --mmap 1 and -dio 1 would both work.

0cc4m · 2026-01-14T15:04:03Z

You're right, I was mostly focused on making it work again at all. It makes sense to keep both off by default for llama-bench.

llama-bench: add direct_io parameter

b4dd2c2

github-actions bot added the examples label Jan 12, 2026

0cc4m requested a review from ggerganov January 13, 2026 04:59

0cc4m mentioned this pull request Jan 13, 2026

Misc. bug: Unable to run llama-bench.exe on Vulkan #18764

Closed

ggerganov approved these changes Jan 13, 2026

View reviewed changes

0cc4m merged commit db79dc0 into master Jan 13, 2026
75 of 76 checks passed

0cc4m deleted the 0cc4m/llama-bench-direct-io branch January 13, 2026 07:49

IIIIIllllIIIIIlllll mentioned this pull request Jan 13, 2026

Misc. bug: [Vulkan]: 'failed to load model' on AMD Strix Halo GPU When Loading Models #18741

Closed

gary149 pushed a commit to gary149/llama-agent that referenced this pull request Jan 13, 2026

llama-bench: add direct_io parameter (ggml-org#18778)

83b2c95

dillon-blake pushed a commit to Boxed-Logic/llama.cpp that referenced this pull request Jan 15, 2026

llama-bench: add direct_io parameter (ggml-org#18778)

3534957

MaheshJakkala pushed a commit to MaheshJakkala/llama.cpp that referenced this pull request Mar 15, 2026

llama-bench: add direct_io parameter (ggml-org#18778)

7ccfa0d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama-bench: add direct_io parameter#18778

llama-bench: add direct_io parameter#18778
0cc4m merged 1 commit intomasterfrom
0cc4m/llama-bench-direct-io

0cc4m commented Jan 12, 2026

Uh oh!

ggerganov left a comment

Uh oh!

0cc4m commented Jan 13, 2026

Uh oh!

Uh oh!

JTischbein commented Jan 14, 2026

Uh oh!

0cc4m commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

0cc4m commented Jan 12, 2026

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

0cc4m commented Jan 13, 2026

Uh oh!

Uh oh!

JTischbein commented Jan 14, 2026

Uh oh!

0cc4m commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants