Conversation
ggerganov
left a comment
There was a problem hiding this comment.
Approving to facilitate debugging the related issues.
Though my understanding is that this parameter should not affect the performance numbers of the benches - it should just affect the loading times. Is this still a valid assumption? In the future, if we spot a perf discrepancy between dio 0/1 should we treat it as a bug, or do we have actualy use cases where a difference can be expected? cc @JTischbein @jeffbolznv
|
It should not affect performance, but it is possible that it does. I think these cases should be treated as a bug. |
|
@0cc4m When only setting This would enable Prettiest solution would be disabling mmap and dio by default. Then specifying |
|
You're right, I was mostly focused on making it work again at all. It makes sense to keep both off by default for llama-bench. |
This adds the direct-io parameter added in #18166 to llama-bench. The reasoning is that we currently have model loading issues in Vulkan that happen when using direct-io (see for example #18741), and it was not possible to work around this in llama-bench without a code change.