Commit fe2baf5
Squashed commit of the following:
commit 912ed2cd9339d1b2875d98744ca5b51fa62e581e
Author: samuel <[email protected]>
Date: Sun Dec 7 23:00:29 2025 -0300
speculative (feat): implement recursive MTP drafting for GLM-4.5
commit bdf72d9552e3da64ffc85f175664713388752914
Author: samuel <[email protected]>
Date: Sat Dec 6 16:10:16 2025 -0300
sampling (feat): optimize speculative drafting with fast-path selection
commit a91980a8f3475a6bbac0a64d8be06dd4b613020e
Author: samuel <[email protected]>
Date: Sat Dec 6 15:18:19 2025 -0300
mtp (chore): clean old code
commit 6de0ecf55db8567db4faa99b0152b72c9e854548
Author: samuel <[email protected]>
Date: Sat Dec 6 14:40:13 2025 -0300
mtp (feat): add mtp arg
commit ea77394183b8e6c368af969b8274039a54b11486
Author: samuel <[email protected]>
Date: Sat Dec 6 13:47:54 2025 -0300
mtp-graph (fix): move llama_get_logits_ith outside the loop
commit 15dff208958fb66802f20ec53ce5fcaff133edb7
Merge: 171346c74 cae85fe53
Author: samuel <[email protected]>
Date: Thu Oct 16 13:44:41 2025 -0300
Merge branch 'glm4-mtp-batch' of https://github.com/SamuelOliveirads/llama.cpp into glm4-mtp-graph-cache
commit cae85fe531876762ee02524fc4c3f6c5e7824c63
Author: samuel <[email protected]>
Date: Thu Oct 16 13:42:31 2025 -0300
mtp-batch(fix): avoid logits for mtp kv cache operations
commit 171346c742c310bbcfbd786b61250638ccf8b44d
Author: samuel <[email protected]>
Date: Sun Oct 12 16:33:01 2025 -0300
mtp-graph(feat): Reactivate graph reuse only for main model path
commit 0127c6beeb384ec3abbc18b22dbe830f22fcf4b4
Author: samuel <[email protected]>
Date: Sat Oct 11 22:20:54 2025 -0300
mtp-batch(chore): Remove final MTP debug logs and dead code
commit 4bcc9e261ef57ee4cfaa65d06bcd0fcdeacf7797
Author: samuel <[email protected]>
Date: Sat Oct 11 18:51:22 2025 -0300
mtp-batch(fix): Correctly advance cache head and add MTP documentation
commit b4cbe030ac25056717763b812d1dd89681c08522
Author: samuel <[email protected]>
Date: Sat Oct 11 18:37:40 2025 -0300
mtp-batch(chore): Fix logit flags for speculative sampling and remove debug logs
commit a99709d0c1401d0b447dce1bd0101fb56390f50e
Author: samuel <[email protected]>
Date: Fri Oct 10 17:24:34 2025 -0300
mtp-batch(refactor): Extract decode context and MTP input logic into helper methods
commit 913af8f48d2dab1d9e907cf6c48c921a229a295c
Author: samuel <[email protected]>
Date: Fri Oct 10 16:44:28 2025 -0300
mtp-batch(refactor): Replace MTP boolean flags with an explicit operation enum
commit 6f74ba38070d62d37bc0fb71ce9871e1a4ffabcc
Author: samuel <[email protected]>
Date: Thu Oct 9 22:27:18 2025 -0300
mtp-batch (fix): prevent mtp draft from polluting the cache
commit 5e1d719beffccf8c22784c24b52ff6f5ab56b9ff
Author: samuel <[email protected]>
Date: Thu Oct 9 15:21:23 2025 -0300
mtp-batch (feat): Create and manage sinfo for MTP
commit febd8235d27fe9174ee4b54ea7a10e630939fee0
Author: samuel <[email protected]>
Date: Sun Oct 5 14:43:40 2025 -0300
mtp-batch (wip): fix how to warmup kv cache for MTP
commit 67c6c069e0a5496adfd7d8aa6ca7514db5a6f437
Author: samuel <[email protected]>
Date: Sat Sep 27 19:42:32 2025 -0300
mtp-batch (wip): Isolate MTP graph to prevent host embedding buffer corruption
commit 75dc25e6fe781c1b65038d69390fb778d760e3a1
Author: samuel <[email protected]>
Date: Sat Sep 27 17:17:00 2025 -0300
mtp-batch (wip): organize batch for mtp cache
commit 3da7e7f3309dbb576538850c92c1cbf8fdc6d6ee
Author: samuel <[email protected]>
Date: Tue Sep 23 22:45:11 2025 -0300
mtp-batch (fix): warm mtp cache for small batch size
commit df64508b937784112168aa099644b60fef015f05
Author: samuel <[email protected]>
Date: Sun Sep 21 21:55:41 2025 -0300
mtp-batch (wip): merge glm graphs
commit 042eb8a829876ed175320df9c8133bcea0c40460
Author: samuel <[email protected]>
Date: Sun Sep 21 21:29:00 2025 -0300
mtp-batch (wip): merge mtp and model graph
commit 1318b2de82716710b9853e07bd640443a5a025bb
Author: samuel <[email protected]>
Date: Sun Sep 14 10:22:59 2025 -0300
mtp-batch (wip): move mtp execution to batch format
commit c6237c71ffd4485df1c35829c380b63e472fc5dd
Merge: 9fab53e43 8742ce0e3
Author: Aaron Lee <[email protected]>
Date: Sat Sep 13 02:57:01 2025 -0400
Merge pull request #1 from SamuelOliveirads/glm4-moe-mtp
feat: implemented sampling for MTP
commit 8742ce0e39823eeb101bb5b6099ff4ca7be10c6e
Author: samuel <[email protected]>
Date: Sat Sep 6 00:21:18 2025 -0300
feat: apply logits + greedy sampler
commit 5a5bce85777041d841393b4396e28f8e3065bb10
Author: samuel <[email protected]>
Date: Wed Sep 3 17:56:14 2025 -0300
fix: add sample acceptance
commit 07670a22c63b1fa335d6ec1c4a1e4255a920848c
Author: samuel <[email protected]>
Date: Wed Sep 3 13:25:21 2025 -0300
feat: implemented sampling for MTP
commit 9fab53e4388c20aef497efd82e86dcb99ca58064
Author: Aaron Lee <[email protected]>
Date: Tue Sep 2 17:14:09 2025 -0400
fixed mtp kv cache update step in cases where prompt size > n_batch and n_ubatch
commit 98bc0c6bf223f425f4ecea14f13fc46101f1b44a
Author: Aaron Lee <[email protected]>
Date: Tue Aug 26 01:26:51 2025 -0400
replace standard sampler with greedy sampler for mtp draft
commit 471e026327cca9f6f58aeefe32129a6cb9390f4f
Author: Aaron Lee <[email protected]>
Date: Tue Aug 19 23:10:56 2025 -0400
fixed vram leak
commit d72f9d5691054958cd1b139f228e5e588d3974cf
Author: Aaron Lee <[email protected]>
Date: Tue Aug 19 01:50:34 2025 -0400
kludge-y kv cache management of mtp layer
commit 382135aa3619294ab8bf87b0de4b1255ab7942f0
Author: Aaron Lee <[email protected]>
Date: Sun Aug 17 21:54:45 2025 -0400
fixed mtp kv cache update sequencing after prompt processing
commit 6870f9790c1bb1d0254241267b1a6c8a7fc82830
Author: Aaron Lee <[email protected]>
Date: Sun Aug 17 04:59:36 2025 -0400
added proper KV cache management for MTP layers and slightly refactored
commit 6e9bafc7a738b4c99f9440c0ec461e08cf6ce702
Author: Aaron Lee <[email protected]>
Date: Fri Aug 15 23:13:56 2025 -0400
failed attempt to implement MTP; outputs tokens but KV cache management is unreasonable
commit cf0f7c0448c2c1736588673114558e5829db7879
Author: Aaron Lee <[email protected]>
Date: Wed Aug 13 02:21:17 2025 -0400
broad thrust of the mtp implementation
commit 03231da69eec20677e25e2307d4fe31ac2ede034
Author: Aaron Lee <[email protected]>
Date: Tue Aug 12 01:03:59 2025 -0400
add model member function to build mtp graph, to be called from speculative.cpp
commit 1f477b375504aa557ed21066aa6783b11781a179
Author: Aaron Lee <[email protected]>
Date: Mon Aug 11 20:54:45 2025 -0400
make nextn weights loadable without a crash
commit e434f87cc739a1901931d88e33f777170a4e18e7
Author: Aaron Lee <[email protected]>
Date: Mon Aug 11 01:21:47 2025 -0400
some work towards building mtp layer graph
commit db60623e7926fb151b3cc63f029929122cac342a
Author: Aaron Lee <[email protected]>
Date: Sun Aug 10 23:52:54 2025 -0400
added getter for nextn layer count and server slot has_mtp property1 parent e1f15b4 commit fe2baf5
File tree
18 files changed
+1037
-280
lines changed- common
- include
- src
- models
- tools/server
18 files changed
+1037
-280
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3214 | 3214 | | |
3215 | 3215 | | |
3216 | 3216 | | |
| 3217 | + | |
| 3218 | + | |
| 3219 | + | |
| 3220 | + | |
| 3221 | + | |
| 3222 | + | |
| 3223 | + | |
3217 | 3224 | | |
3218 | 3225 | | |
3219 | 3226 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
430 | 430 | | |
431 | 431 | | |
432 | 432 | | |
| 433 | + | |
433 | 434 | | |
434 | 435 | | |
435 | 436 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
666 | 666 | | |
667 | 667 | | |
668 | 668 | | |
| 669 | + | |
| 670 | + | |
| 671 | + | |
| 672 | + | |
| 673 | + | |
| 674 | + | |
| 675 | + | |
| 676 | + | |
| 677 | + | |
| 678 | + | |
| 679 | + | |
| 680 | + | |
| 681 | + | |
| 682 | + | |
| 683 | + | |
| 684 | + | |
| 685 | + | |
| 686 | + | |
| 687 | + | |
| 688 | + | |
| 689 | + | |
| 690 | + | |
| 691 | + | |
| 692 | + | |
| 693 | + | |
| 694 | + | |
| 695 | + | |
| 696 | + | |
| 697 | + | |
| 698 | + | |
| 699 | + | |
| 700 | + | |
| 701 | + | |
| 702 | + | |
| 703 | + | |
| 704 | + | |
| 705 | + | |
| 706 | + | |
| 707 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
359 | 359 | | |
360 | 360 | | |
361 | 361 | | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
15 | 21 | | |
16 | 22 | | |
17 | 23 | | |
| |||
29 | 35 | | |
30 | 36 | | |
31 | 37 | | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
228 | 228 | | |
229 | 229 | | |
230 | 230 | | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
231 | 242 | | |
232 | 243 | | |
233 | 244 | | |
| |||
237 | 248 | | |
238 | 249 | | |
239 | 250 | | |
| 251 | + | |
240 | 252 | | |
241 | 253 | | |
242 | 254 | | |
| |||
536 | 548 | | |
537 | 549 | | |
538 | 550 | | |
| 551 | + | |
| 552 | + | |
539 | 553 | | |
540 | 554 | | |
541 | 555 | | |
| |||
1442 | 1456 | | |
1443 | 1457 | | |
1444 | 1458 | | |
| 1459 | + | |
| 1460 | + | |
| 1461 | + | |
| 1462 | + | |
| 1463 | + | |
| 1464 | + | |
| 1465 | + | |
| 1466 | + | |
| 1467 | + | |
| 1468 | + | |
| 1469 | + | |
| 1470 | + | |
| 1471 | + | |
| 1472 | + | |
| 1473 | + | |
| 1474 | + | |
| 1475 | + | |
| 1476 | + | |
| 1477 | + | |
| 1478 | + | |
| 1479 | + | |
| 1480 | + | |
| 1481 | + | |
| 1482 | + | |
| 1483 | + | |
| 1484 | + | |
| 1485 | + | |
| 1486 | + | |
| 1487 | + | |
| 1488 | + | |
| 1489 | + | |
| 1490 | + | |
1445 | 1491 | | |
1446 | 1492 | | |
1447 | 1493 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2370 | 2370 | | |
2371 | 2371 | | |
2372 | 2372 | | |
2373 | | - | |
2374 | | - | |
2375 | | - | |
2376 | | - | |
2377 | | - | |
2378 | | - | |
| 2373 | + | |
| 2374 | + | |
| 2375 | + | |
| 2376 | + | |
| 2377 | + | |
| 2378 | + | |
| 2379 | + | |
2379 | 2380 | | |
2380 | 2381 | | |
2381 | 2382 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
301 | 301 | | |
302 | 302 | | |
303 | 303 | | |
304 | | - | |
305 | | - | |
306 | | - | |
307 | | - | |
308 | | - | |
309 | | - | |
310 | | - | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
311 | 311 | | |
312 | | - | |
313 | | - | |
314 | | - | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
315 | 315 | | |
316 | 316 | | |
317 | 317 | | |
| |||
874 | 874 | | |
875 | 875 | | |
876 | 876 | | |
877 | | - | |
878 | | - | |
879 | | - | |
880 | | - | |
881 | | - | |
882 | | - | |
883 | | - | |
| 877 | + | |
| 878 | + | |
| 879 | + | |
| 880 | + | |
| 881 | + | |
| 882 | + | |
| 883 | + | |
| 884 | + | |
884 | 885 | | |
885 | 886 | | |
886 | 887 | | |
| |||
0 commit comments