tests : update for LLAMA_SET_ROWS=1#14961
Conversation
e1ebdea to
d6233d6
Compare
tests/test-thread-safety.cpp
Outdated
| // each context has a single sequence | ||
| cparams.n_seq_max = 1; | ||
|
|
||
| // prevent from launching too many threads | ||
| cparams.n_threads = std::min<int>(std::max(2u, std::thread::hardware_concurrency()/params.n_parallel), cparams.n_threads); | ||
|
|
There was a problem hiding this comment.
@slaren Small change to the test to make it compatible with split KV cache. Reduced the number of CPU threads because on the MacBook the process takes a long time (several minutes) to terminate (think it's some resource congestion when there are many threads started by the process, not sure).
There was a problem hiding this comment.
This is a known issue with the thread pool implementation, using more threads than available will result in the threads spending more time spinning than doing work.
There was a problem hiding this comment.
I am not convinced that it is good to ignore the parameters of the user to workaround what essentially is a bug. Can this be solved by running the test with -t 1?
There was a problem hiding this comment.
Yes, -t 1 works. I was thinking to use -t 2 so we have context-level concurrency too. With -t 2 the test also runs cleanly on my devices.
* test-thread-safety : each context uses a single sequence * embedding : handle --parallel argument ggml-ci * save-load : handle -np 1 ggml-ci * thread-safety : avoid overriding threads, reduce test case arg ggml-ci
target #14960
Extract the test updates from #14959 in a separate PR to be merged before enabling
LLAMA_SET_ROWS=1by default.Test updates:
test-thread-safetycparams.n_seq_max = 1embedding-npis not specifiedsave-load-state-npis not specified