Commit 2decab1
Fix kernel_1 backward on the CPU interpreter path (#499)
The single-pass launch queried the CUDA SM count unconditionally to bound the
program count, which broke `triton_normalization_backward` under
`TRITON_INTERPRET=1` on CPU (no CUDA). The bound is a GPU-occupancy heuristic, so
skip it off-GPU and use one program per tile. Also cover the two-pass path in the
kernel test (it was only exercising single-pass) and drop the now-constant
num_stages from the launch config.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>1 parent 0c5d45c commit 2decab1
2 files changed
Lines changed: 16 additions & 9 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
376 | 376 | | |
377 | 377 | | |
378 | 378 | | |
379 | | - | |
| 379 | + | |
380 | 380 | | |
381 | 381 | | |
382 | 382 | | |
| |||
398 | 398 | | |
399 | 399 | | |
400 | 400 | | |
401 | | - | |
| 401 | + | |
402 | 402 | | |
403 | 403 | | |
404 | 404 | | |
| |||
424 | 424 | | |
425 | 425 | | |
426 | 426 | | |
427 | | - | |
| 427 | + | |
428 | 428 | | |
429 | 429 | | |
430 | 430 | | |
431 | 431 | | |
432 | 432 | | |
433 | | - | |
434 | | - | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
435 | 440 | | |
436 | 441 | | |
437 | 442 | | |
| |||
472 | 477 | | |
473 | 478 | | |
474 | 479 | | |
475 | | - | |
| 480 | + | |
476 | 481 | | |
477 | 482 | | |
478 | 483 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
106 | 106 | | |
107 | 107 | | |
108 | 108 | | |
| 109 | + | |
| 110 | + | |
109 | 111 | | |
110 | 112 | | |
111 | | - | |
112 | | - | |
| 113 | + | |
| 114 | + | |
113 | 115 | | |
114 | 116 | | |
115 | | - | |
| 117 | + | |
116 | 118 | | |
117 | 119 | | |
118 | 120 | | |
| |||
0 commit comments