Commit 471958b
Add GraniteMoeHybrid support for 4.0 (#37658)
* initial config and MLA layer
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* first pass at decoder
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* completion of layers
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* modeling class
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* adding hybrid class to imports
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* fix imports granitemoehybrid
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* fix granitehybrid imports
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* fix granitehybrid import
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* fix generated modeling file
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* add some comments
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* minor fixes in layers
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* add sharedMLP layer
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* correct layer names
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* fixes in mamba config
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* fix mamba config
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* change name of MLP layer
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* fix seq mizer layers
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* correct mamba config
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* fixes in param names
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* enable hybrid model
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* update config
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* fix config granite hybrid
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* fix attention layer
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* cleanup to re-use mamba code
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* keep layer types
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* attention bias cleanup
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* update mamba layer name
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* first pass at tests
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* first pass at tests
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* use granite attention
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* fix: self attn weights
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* pass at making pos_emb optional
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* initialize self_attn only as needed
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* overwrite forward to create HybridMambaCache
Signed-off-by: Sukriti-Sharma4 <[email protected]>
* Log invalid layer types
* Add attention outputs test
* Only emit attentions/logits if not None
* Fix config test hidden size divisibility
* mark granitmoehybrid as stateful
* Initialize mamba convolutional layers
* Formatting fixes
* config docstring, removed some unused attrs
* Fix missing arg in models test
* Fix create and check decoder model test
* support logits to keep in granitemoe
* regen to pass logits_to_keep
* Allow None or rope
* Fix gradient checkpointing
* Add granitemoehybrid as special cache for generate check
* Remove unused MLA refs
* Fix mamba layer mask
* Remove logits to keep from config
* Minor docstring nits
* Update licenses
* Enable cache by default
* map layer types to layer block type
* First pass at granite moe hybrid docs
* Ignore granite moe hybrid in valid checkpoint check
* Align attention interfaces
* regenerate modular granitemoeshared attention interface
* Align granite moe hybrid attn interface
* run formatting
* Handle mamba initialization
* avoid conditional attr defs
* Move hybrid layer validation to config
* Add placeholder integration tests
* Docs nits / Update model names
* Clean up forward conditions
* Use gradient checkpointing layer
* Remove some copied bamba tests + inherit
align test init
delete more tests
Use common layer init with bamba tests
finish test consolidation
* avoid redundant intermediate std var
* use @can_return_tuple
* Remove unused moe state
* make skipped test names consistent
* Fix docstring order
* Add missing toc
* Always create the shared mlp
* Fix name in docstring
* link preview model in docs
---------
Signed-off-by: Sukriti-Sharma4 <[email protected]>
Co-authored-by: Alex-Brooks <[email protected]>1 parent fe29b8c commit 471958b
File tree
21 files changed
+3150
-544
lines changed- docs/source/en
- model_doc
- src/transformers/models
- auto
- bamba
- granitemoehybrid
- granitemoe
- tests
- generation
- models
- bamba
- granitemoehybrid
- utils
21 files changed
+3150
-544
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
495 | 495 | | |
496 | 496 | | |
497 | 497 | | |
| 498 | + | |
| 499 | + | |
498 | 500 | | |
499 | 501 | | |
500 | 502 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
129 | 129 | | |
130 | 130 | | |
131 | 131 | | |
| 132 | + | |
132 | 133 | | |
133 | 134 | | |
134 | 135 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
146 | 146 | | |
147 | 147 | | |
148 | 148 | | |
| 149 | + | |
149 | 150 | | |
150 | 151 | | |
151 | 152 | | |
| |||
509 | 510 | | |
510 | 511 | | |
511 | 512 | | |
| 513 | + | |
512 | 514 | | |
513 | 515 | | |
514 | 516 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
138 | 138 | | |
139 | 139 | | |
140 | 140 | | |
| 141 | + | |
141 | 142 | | |
142 | 143 | | |
143 | 144 | | |
| |||
558 | 559 | | |
559 | 560 | | |
560 | 561 | | |
| 562 | + | |
561 | 563 | | |
562 | 564 | | |
563 | 565 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
854 | 854 | | |
855 | 855 | | |
856 | 856 | | |
| 857 | + | |
857 | 858 | | |
858 | 859 | | |
859 | 860 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
651 | 651 | | |
652 | 652 | | |
653 | 653 | | |
| 654 | + | |
654 | 655 | | |
655 | 656 | | |
656 | 657 | | |
| |||
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
166 | 166 | | |
167 | 167 | | |
168 | 168 | | |
| 169 | + | |
| 170 | + | |
169 | 171 | | |
170 | 172 | | |
171 | 173 | | |
| |||
0 commit comments