-
Notifications
You must be signed in to change notification settings - Fork 16.2k
Misc. bug: Qwen3 coder next server crash #19329
Copy link
Copy link
Closed
Labels
Description
Name and Version
When using the latest llama.cpp from master (compiled a few mins before writing), when using qwen 3 coder next the server can crash with an uncaught exception.
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
llama.cpp/build/bin/llama-server --ssl-key-file key.pem --ssl-cert-file cert.pem --temp 1.0 --top-p 0.95 --top-k 40 --model Qwen3-Coder-Next-Q4_K_M-00001-of-00004.gguf --metrics --host 0.0.0.0 --port 8000 -np 4 --jinjaProblem description & steps to reproduce
I haven't managed to narrow down how to repdoduce, just use it generally.
First Bad Commit
No response
Relevant log output
Logs
<!-- Long logs that you upload as files go here, outside the "console" area -->
Matched tool start: "<tool_call>\n<"
Partial parse: incomplete tool_call
Grammar still awaiting trigger after token 1688 (function)
res send: sending result for task id = 144
res send: task id = 144 pushed to result queue
slot process_toke: id 3 | task 144 | n_decoded = 4, n_remaining = -1, next token: 1688 'function'
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 148
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 149, front = 0
slot update_slots: id 3 | task 144 | slot decode token, n_ctx = 65536, n_tokens = 7708, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
srv update_chat_: Parsing chat message: <tool_call>
<function
Parsing input with format Qwen3 Coder: <tool_call>
<function
Matched tool start: "<tool_call>\n<function"
Partial parse: incomplete tool_call
[New LWP 21469]
[New LWP 21474]
[New LWP 21475]
[New LWP 21476]
[New LWP 21477]
[New LWP 21478]
[New LWP 21479]
[New LWP 21480]
[New LWP 21481]
[New LWP 21482]
[New LWP 21483]
[New LWP 21484]
[New LWP 21485]
[New LWP 21486]
[New LWP 21487]
[New LWP 21488]
[New LWP 21489]
[New LWP 21490]
[New LWP 21491]
[New LWP 21492]
[New LWP 21493]
[New LWP 21494]
[New LWP 21495]
[New LWP 21496]
[New LWP 21497]
[New LWP 21498]
[New LWP 21499]
[New LWP 21500]
[New LWP 21501]
[New LWP 21502]
[New LWP 21503]
[New LWP 21504]
[New LWP 21505]
[New LWP 21506]
[New LWP 21507]
[New LWP 21508]
[New LWP 21509]
[New LWP 21510]
[New LWP 21511]
[New LWP 21512]
[New LWP 21513]
[New LWP 21514]
[New LWP 21515]
[New LWP 21516]
[New LWP 21517]
[New LWP 21518]
[New LWP 21519]
[New LWP 21520]
[New LWP 21521]
[New LWP 21522]
[New LWP 21523]
[New LWP 21524]
[New LWP 21525]
[New LWP 21526]
[New LWP 21527]
[New LWP 21528]
[New LWP 21529]
[New LWP 21530]
[New LWP 21531]
[New LWP 21532]
[New LWP 21533]
[New LWP 21534]
[New LWP 21535]
[New LWP 21536]
[New LWP 21537]
[New LWP 21538]
[New LWP 21539]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
0x0000e32b6ff69940 in GI_wait4 (pid=<optimized out>, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30 ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory.
#0 0x0000e32b6ff69940 in GI_wait4 (pid=<optimized out>, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30 in ../sysdeps/unix/sysv/linux/wait4.c
#1 0x0000e32b70375774 in ggml_print_backtrace () from /home/ubuntu/llama.cpp/build/bin/libggml-base.so.0
#2 0x0000e32b7038795c in ggml_uncaught_exception() () from /home/ubuntu/llama.cpp/build/bin/libggml-base.so.0
#3 0x0000e32b701d2b4c in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6
#4 0x0000e32b701d2bb0 in std::terminate() () from /lib/aarch64-linux-gnu/libstdc++.so.6
#5 0x0000e32b701d2e94 in __cxa_throw () from /lib/aarch64-linux-gnu/libstdc++.so.6
#6 0x0000e32b704f8010 in llama_grammar_accept_token(llama_grammar&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /home/ubuntu/llama.cpp/build/bin/libllama.so.0
#7 0x0000e32b704f9b58 in llama_grammar_accept_impl(llama_grammar&, int) () from /home/ubuntu/llama.cpp/build/bin/libllama.so.0
#8 0x0000af028e2c1a28 in common_sampler_accept(common_sampler*, int, bool) ()
#9 0x0000af028e135c90 in server_context_impl::update_slots() ()
#10 0x0000af028e169ab8 in server_queue::start_loop(long) ()
#11 0x0000af028e0978f4 in main ()
[Inferior 1 (process 21468) detached]
terminate called after throwing an instance of 'std::runtime_error'
what(): Unexpected empty grammar stack after accepting piece: =search (96598)
AbortedUsing host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
0x0000e32b6ff69940 in GI_wait4 (pid=<optimized out>, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30 ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory.
#0 0x0000e32b6ff69940 in GI_wait4 (pid=<optimized out>, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30 in ../sysdeps/unix/sysv/linux/wait4.c
#1 0x0000e32b70375774 in ggml_print_backtrace () from /home/ubuntu/llama.cpp/build/bin/libggml-base.so.0
#2 0x0000e32b7038795c in ggml_uncaught_exception() () from /home/ubuntu/llama.cpp/build/bin/libggml-base.so.0
#3 0x0000e32b701d2b4c in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6
#4 0x0000e32b701d2bb0 in std::terminate() () from /lib/aarch64-linux-gnu/libstdc++.so.6
#5 0x0000e32b701d2e94 in __cxa_throw () from /lib/aarch64-linux-gnu/libstdc++.so.6
#6 0x0000e32b704f8010 in llama_grammar_accept_token(llama_grammar&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /home/ubuntu/llama.cpp/build/bin/libllama.so.0
#7 0x0000e32b704f9b58 in llama_grammar_accept_impl(llama_grammar&, int) () from /home/ubuntu/llama.cpp/build/bin/libllama.so.0
#8 0x0000af028e2c1a28 in common_sampler_accept(common_sampler*, int, bool) ()
#9 0x0000af028e135c90 in server_context_impl::update_slots() ()
#10 0x0000af028e169ab8 in server_queue::start_loop(long) ()
#11 0x0000af028e0978f4 in main ()
[Inferior 1 (process 21468) detached]
terminate called after throwing an instance of 'std::runtime_error'
what(): Unexpected empty grammar stack after accepting piece: =search (96598)
Aborted
Reactions are currently unavailable