-
Notifications
You must be signed in to change notification settings - Fork 248
[WIP] Qwen 3 Next experiment #1251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 7 commits
Commits
Show all changes
52 commits
Select commit
Hold shift + click to select a range
a7df116
qwen3next: add architecture support and recurrent-state fixes
9fbb504
qwen3next: optimize broadcast sub and single-seq ssm conv
89e9ecf
cuda: build MoE row mapping on device in mul_mat_id
236633a
cuda: add guarded multi-seq fast path for ssm_conv
c767cfa
docs: update qwen3next perf report for cuda MoE/SSM tuning
e64b433
cuda: reduce qwen3next moe/ssm sync overhead and refresh eval
6db8dc8
qwen3next: split cpu/cuda eval builds and tune PP scheduling
fffd27e
qwen3next: harden seq-state flow and support optional dense FFN layers
a1163d0
qwen3next: trim delta-net graph overhead in chunking path
0e3891b
qwen3next: remove redundant v_conv cont in delta path
43edfa2
qwen3next: avoid extra cont on linear attention output
de5bf44
qwen3next: drop redundant cont before recurrent state flatten
5a6c4e8
qwen3next: keep recurrent state in 4d layout through delta path
6dd990d
qwen3next: add fused delta-net op and wire model path
ed0565f
tests: add backend-op coverage for ggml_delta_net
b33cef6
qwen3next: add runtime switch for fused delta-net path
81e788e
docs: refresh qwen3next perf review and benchmark matrix
9930f4d
qwen3next: default fused delta-net off and document quality checks
143e88a
qwen3next: add decode-only fused delta mode
64099e7
qwen3next: make fused delta safe by default and fix fused tensor layout
343e335
qwen3next: warn when forcing fused decode mode
44db394
qwen3next: add fused-delta regression runner script
55270b0
qwen3next: integrate fused regression into eval harness
670434e
qwen3next: clean up chunked delta-net shape handling
691df60
qwen3next: add absolute sanity guards to fused regression
a822db6
qwen3next: add unified regression runner script
627d469
qwen3next: disable flash-attn for cpu-only contexts
bd0dd78
docs: reconcile qwen3next status and remaining upstream gaps
b5c9554
common: add qwen3next fused-delta runtime flag
eef360a
cuda: add qwen3next delta-net kernel dispatch override
69529d3
docs: update qwen3next quality and serving baseline findings
48e0e35
qwen3next: keep fused delta on safe path and remove PR artifacts
9241164
qwen3next: align autoregressive delta-net decode layout
6009557
Revert "qwen3next: align autoregressive delta-net decode layout"
113ad6c
cuda: port solve-tri fast-paths for qwen3next delta-net
6f21f24
qwen3next: add fused-delta runtime flag and drop env toggle
f1f6da7
qwen3next: make fused delta single-flag and default on
4ab02c9
Account for GPU arch differences
117ff5d
Revert "cuda: build MoE row mapping on device in mul_mat_id"
6d8fb70
qwen3next: drop non-essential MoE scheduling and split heuristics
ed10c94
qwen3next: avoid generic ggml_sub broadcast changes
4e55ac7
llama: restore only_active_experts log message
71035bf
Merge branch 'ikawrakow:main' into main
YurkoHoshko 012377b
Remove unnecessary hacks, disable fusion for now.
b7781f2
qwen3next: port hybrid recurrent state memory semantics
d7b6358
qwen3next: clean up recurrent state slot plumbing
aaa1b12
qwen3next: fix hybrid V-cache layout plumbing
cac3c5f
qwen3next: guard recurrent state slots against kv capacity
c771416
qwen3next: persist recurrent state in session data
dd690cb
qwen3next: drop unused fused-delta builder path
3470e8a
qwen3next: remove unused fused-delta CLI/context plumbing
cb99ab7
ggml: remove unused DELTA_NET operator stack
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doc was used solely by AI during development and I included it just to keep track of what was going on - please ignore.