Skip to content

llama-bench: add direct_io parameter#18778

Merged
0cc4m merged 1 commit intomasterfrom
0cc4m/llama-bench-direct-io
Jan 13, 2026
Merged

llama-bench: add direct_io parameter#18778
0cc4m merged 1 commit intomasterfrom
0cc4m/llama-bench-direct-io

Conversation

@0cc4m
Copy link
Contributor

@0cc4m 0cc4m commented Jan 12, 2026

This adds the direct-io parameter added in #18166 to llama-bench. The reasoning is that we currently have model loading issues in Vulkan that happen when using direct-io (see for example #18741), and it was not possible to work around this in llama-bench without a code change.

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to facilitate debugging the related issues.

Though my understanding is that this parameter should not affect the performance numbers of the benches - it should just affect the loading times. Is this still a valid assumption? In the future, if we spot a perf discrepancy between dio 0/1 should we treat it as a bug, or do we have actualy use cases where a difference can be expected? cc @JTischbein @jeffbolznv

@0cc4m
Copy link
Contributor Author

0cc4m commented Jan 13, 2026

It should not affect performance, but it is possible that it does. I think these cases should be treated as a bug.

@0cc4m 0cc4m merged commit db79dc0 into master Jan 13, 2026
75 of 76 checks passed
@0cc4m 0cc4m deleted the 0cc4m/llama-bench-direct-io branch January 13, 2026 07:49
gary149 pushed a commit to gary149/llama-agent that referenced this pull request Jan 13, 2026
@JTischbein
Copy link
Contributor

@0cc4m When only setting --mmap 1 the DirectIO path will still be used if available. In the argument parsing of llama-cli, llama-server, ... we are setting use_direct_io to false if --mmap is explicitly enabled.

diff --git a/tools/llama-bench/llama-bench.cpp b/tools/llama-bench/llama-bench.cpp
index aed97e77e..1d616daf3 100644
--- a/tools/llama-bench/llama-bench.cpp
+++ b/tools/llama-bench/llama-bench.cpp
@@ -372,7 +372,7 @@ static const cmd_params cmd_params_defaults = {
     /* devices              */ { {} },
     /* tensor_split         */ { std::vector<float>(llama_max_devices(), 0.0f) },
     /* tensor_buft_overrides*/ { std::vector<llama_model_tensor_buft_override>{ { nullptr, nullptr } } },
-    /* use_mmap             */ { true },
+    /* use_mmap             */ { false },
     /* use_direct_io        */ { true },
     /* embeddings           */ { false },
     /* no_op_offload        */ { false },
@@ -1185,6 +1185,8 @@ static std::vector<cmd_params_instance> get_cmd_params_instances(const cmd_param
     for (const auto & cs : params.cpu_strict)
     for (const auto & nd : params.n_depth)
     for (const auto & pl : params.poll) {
+        bool use_dio = dio;
+        if (mmp) use_dio = false;
         for (const auto & n_prompt : params.n_prompt) {
             if (n_prompt == 0) {
                 continue;
@@ -1212,7 +1214,7 @@ static std::vector<cmd_params_instance> get_cmd_params_instances(const cmd_param
                 /* .tensor_split = */ ts,
                 /* .tensor_buft_overrides = */ ot,
                 /* .use_mmap     = */ mmp,
-                /* .use_direct_io= */ dio,
+                /* .use_direct_io= */ use_dio,
                 /* .embeddings   = */ embd,
                 /* .no_op_offload= */ nopo,
                 /* .no_host      = */ noh,
@@ -1247,7 +1249,7 @@ static std::vector<cmd_params_instance> get_cmd_params_instances(const cmd_param
                 /* .tensor_split = */ ts,
                 /* .tensor_buft_overrides = */ ot,
                 /* .use_mmap     = */ mmp,
-                /* .use_direct_io= */ dio,
+                /* .use_direct_io= */ use_dio,
                 /* .embeddings   = */ embd,
                 /* .no_op_offload= */ nopo,
                 /* .no_host      = */ noh,
@@ -1282,7 +1284,7 @@ static std::vector<cmd_params_instance> get_cmd_params_instances(const cmd_param
                 /* .tensor_split = */ ts,
                 /* .tensor_buft_overrides = */ ot,
                 /* .use_mmap     = */ mmp,
-                /* .use_direct_io= */ dio,
+                /* .use_direct_io= */ use_dio,
                 /* .embeddings   = */ embd,
                 /* .no_op_offload= */ nopo,
                 /* .no_host      = */ noh,

This would enable -dio and disable --mmap by default and disable -dio in case --mmap 1 is specified. I don't think that is a pretty solution, but I don't have a prettier idea right now.

Prettiest solution would be disabling mmap and dio by default. Then specifying --mmap 1 and -dio 1 would both work.

@0cc4m
Copy link
Contributor Author

0cc4m commented Jan 14, 2026

You're right, I was mostly focused on making it work again at all. It makes sense to keep both off by default for llama-bench.

dillon-blake pushed a commit to Boxed-Logic/llama.cpp that referenced this pull request Jan 15, 2026
MaheshJakkala pushed a commit to MaheshJakkala/llama.cpp that referenced this pull request Mar 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants