-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Mamba2 SSD #16982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
gabe-l-hart
wants to merge
69
commits into
ggml-org:master
Choose a base branch
from
gabe-l-hart:Mamba2SSD
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+2,194
−79
Draft
Mamba2 SSD #16982
Changes from all commits
Commits
Show all changes
69 commits
Select commit
Hold shift + click to select a range
245f391
graph : reuse hybrid graphs
ggerganov 638e2c2
graph : reuse recurrent graphs
ggerganov 0b9c1ae
metal : fix mul-mm condition + fix mul-mv permuted kernels
ggerganov 1f02d93
graph : fix reuse check for recurrent inputs
ggerganov 00f115f
memory : move the recurrent state into the memory context
ggerganov 2744d61
Revert "memory : move the recurrent state into the memory context"
ggerganov ab3f3fe
Merge branch 'gg/metal-mul-mat-fixes' into gg/graph-mamba-reuse
gabe-l-hart 8c23c43
Added: tri, cumsum. Still a mess.
gabe-l-hart 2a2e79c
feat(tests): Add --verbose | -v flag to test-backend-ops to print ten…
gabe-l-hart 092f740
test: Add cumsum tests to test-backend-ops
gabe-l-hart 6949ce7
feat(ggml-cpu): Add cumsum support for f16 and bf16
gabe-l-hart f8fba60
feat(ggml-cpu): Add F16 and BF16 support for tri
gabe-l-hart 058160a
test: Add test cases for tri
gabe-l-hart 86ce3da
chore: TODOs to loosen assertions in tri for ggml_is_contiguous
gabe-l-hart 3a8958f
feat(ggml-metal): Initial (slow) implementation of cumsum for metal
gabe-l-hart cbaed86
feat(ggml-metal): Add stubs for metal tri
gabe-l-hart e596469
test: Use looser nmse for lower-precision types for cumsum
gabe-l-hart 3011a6e
Merge remote-tracking branch 'origin/master' into Mamba2SSD
gabe-l-hart 112d339
test: Allow multiple verbose flags to fully print tensors
gabe-l-hart 78e137f
feat(llama-gguf): Print out the tensor type in llama-gguf r
gabe-l-hart e5587cb
feat(ggml-metal): Efficient implementation of cumsum for metal
gabe-l-hart 0468b99
test: More verbose printing and better cumsum tests
gabe-l-hart c71e35e
fix(ggml-metal): better granularity for support bool for CUMSUM and TRI
gabe-l-hart 5f0d2a1
feat(ggml-metal): Metal impl of tri
gabe-l-hart 426580d
Merge remote-tracking branch 'origin/master' into Mamba2SSD
gabe-l-hart ba3b8db
fix(ggml-cpu): Fix warnings from build with gcc
gabe-l-hart dfae909
feat(ggml-cuda): common implementation of prefix sum
gabe-l-hart d1f8658
feat(ggml-cuda): CUDA implementation of CUMSUM
gabe-l-hart 5071fbd
feat(ggml-cuda): CUDA implementation of TRI
gabe-l-hart be23a29
test: Add test-backend-ops perf tests for ssm conv and scan
gabe-l-hart 71e2289
feat(ggml-cpu): Rename ggml_softplus to ggml_op_softplus to make room…
gabe-l-hart f6d60e3
feat(ggml-cpu): Add ggml_softplus tensor op for CPU
gabe-l-hart 778e835
test: Better verbosity output for inputs in test-backend-ops
gabe-l-hart 4228002
feat(ggml-metal): Add ggml_softplus support for metal
gabe-l-hart 97bd17d
feat(ggml-cuda): Add support for ggml_softplus
gabe-l-hart ffd88ff
style: comments on ggml tri types
gabe-l-hart 7409d9e
WIP(llama-model): Partial work on graph-based SSD implementation
gabe-l-hart ba74006
TEMP: Increase the max graph nodes to handle all the nodes for SSD
gabe-l-hart 29b30c6
WIP: Shape-correct impl of SSD w/out multi-chunk support
gabe-l-hart fb68967
fix: Add names to tensors for better debugging and fix several wiring…
gabe-l-hart cd73f4d
fix(wip): Fix matmul order for CB and y
gabe-l-hart 52be1ab
fix: Working output!!
gabe-l-hart f57dafe
feat(eval-callback): Use -vb to set tensor print width and number of …
gabe-l-hart 8a87063
feat(ggml-cpu): Add ggml_tri_dims to support non-standard dims (with …
gabe-l-hart 79bce3e
feat(ggml-metal): Extend metal tri imple for arbitrary dims and non-c…
gabe-l-hart 1ceb15e
feat(ggml-cuda): Extend CUDA impl of tri to support arbitrary dims an…
gabe-l-hart ef12069
fix: Fix INT_MAX to use numeric_limits for better compiler compat
gabe-l-hart 3da5c97
fix(temp): Fix CBdecay to make decay contiguous for metal
gabe-l-hart 3336f3c
fix: Use ggml_tri_dims to avoid perm/cont for initial decay step
gabe-l-hart d1e15c0
feat(ggml-cpu): Add dim arg to ggml_cumsum
gabe-l-hart ee13af1
feat(ggml-metal): Support arbitrary dim and non-cont in cumsum
gabe-l-hart 3b4055e
feat(ggml-cuda): Support arbitrary dims and non-cont in cumsum
gabe-l-hart 3963a72
feat(wip): Partially working implementation with update from previous…
gabe-l-hart 188ae84
refact: Avoid permute and cont for first cumsum
gabe-l-hart 0441ccb
fix: Subset input states to match ids
gabe-l-hart aba30d6
fix: Fix the chunk size computation
gabe-l-hart 62ac897
fix: Fix handling of batch size > 1 in chunk updates
gabe-l-hart 36244fe
fix: Fix permutation for nemotron-h shape
gabe-l-hart 5ff37fa
Merge remote-tracking branch 'origin/master' into Mamba2SSD
gabe-l-hart 8b6f38a
feat(off-topic): print the number of elements in tensors with llama-gguf
gabe-l-hart 82bba1d
feat(ggml-cpu): Add f16 and bf16 support for ssm_conv
gabe-l-hart 7ad0f37
feat(llama-quant): Allow F16 and BF16 quants of ssm_conv1d.weight
gabe-l-hart 6256f9a
feat(ggml-cpu): Add partial implementation of scale for f16
gabe-l-hart 204cd80
feat(wip): Use type_k/type_v for hybrid cache types
gabe-l-hart 86788a2
temp: Cast ssm to F32
gabe-l-hart de43d0b
feat(ggml-metal): Add support for F16 and BF16 ssm_conv weights
gabe-l-hart 426a97c
feat: Keep ssm in f16 until output on SSD code path
gabe-l-hart 6733bda
feat: Remove sub-ubatch batching
gabe-l-hart 4435600
Merge remote-tracking branch 'origin/master' into Mamba2SSD
gabe-l-hart File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.