Support Gumbel-Max Sampling Kernel #574

STWMichael · 2025-11-12T20:11:34Z

Description of changes:
This PR adds support for Gumbel-Max Sampling Kernel in all architectures. The original kernel code is from flashinfer.

In order to support the sampling in MPK, we:

Change BLOCK_THREADS from 1024 to 128/256.
Set the grid dimension to always be 1 and, instead, takes a parameter called batch_size and iterates through all batches sequentially.

This PR also adds unit tests to verify the correctness of the kernel. It also provides a demo_sampling.py that shows the sampling effect during inference.

Related Issues:

#519

Linked Issues:

Issue #

Issues closed by this PR:

Closes #

yzh119 · 2025-11-20T08:51:05Z

Please consider the latest version flashinfer-ai/flashinfer#2119 which significant improves performance.

JackFram · 2025-11-24T20:26:18Z

include/mirage/persistent_kernel/tasks/common/sampling.cuh

+
+template <uint32_t BLOCK_THREADS,
+          uint32_t VEC_SIZE,
+          int BATCH_SIZE,


Do we still need this template arg if we are passing in batch size as a function arg

JackFram · 2025-11-24T20:30:38Z

include/mirage/persistent_kernel/tasks/common/worker_config.h

 constexpr int CONSUMER_NUM_THREADS = 128; // Grace Hopper setting
+#else
+// Default settings for other architectures
+constexpr int WORKER_NUM_THREADS = 128;


We also have WORKER_NUM_THREADS defined in the persistent_kernel.cuh file, not sure if it's a duplicated definition here.

JackFram

The sampling task implementation in this PR is already in a pretty good shape. I left some comments, and feel free to address them. Waiting for other reviewers input

yzh119 · 2025-11-24T20:47:25Z

include/mirage/persistent_kernel/tasks/common/sampling.cuh

@@ -0,0 +1,228 @@
+/* Copyright 2023-2024 CMU
+ *


Per Apache 2.0 you have to keep the original copyright as well.
e.g.
Copyright 2023-2025 FlashInfer contributors

in addition to your own copyright.

Sorry, I forgot to mention this. @STWMichae, please make sure to add the copyright before merging as well.

carry from flash infer and test

75bae2b

STWMichael self-assigned this Nov 12, 2025

STWMichael changed the title ~~Support Sampling in Mirage~~ Supporting TopP and TopK sampling Nov 12, 2025

STWMichael mentioned this pull request Nov 12, 2025

[Feature Request] - Supporting TopP and TopK sampling #519

Open

Michael Wang added 6 commits November 12, 2025 15:57

move files and include for test

91fce18

small fix

9031695

success run of tests

ddc0809

changed to monte carlo test and changed thread to 256

7f8d556

format

176e654

batch_size = 1

fa7fcaf

STWMichael changed the title ~~Supporting TopP and TopK sampling~~ Support Gumbel-Max Sampling Kernel for Blackwell Nov 24, 2025

Michael Wang added 4 commits November 23, 2025 21:03

working demo

b98615b

make format

65b5b5e

uncomment compile-issue related change

7b17772

Merge remote-tracking branch 'upstream/mpk' into sampling

d864f6b

STWMichael marked this pull request as ready for review November 24, 2025 02:39

STWMichael requested review from JackFram, jiazhihao and xinhaoc November 24, 2025 02:39

move file

eb574ed

STWMichael changed the title ~~Support Gumbel-Max Sampling Kernel for Blackwell~~ Support Gumbel-Max Sampling Kernel Nov 24, 2025

JackFram reviewed Nov 24, 2025

View reviewed changes

yzh119 reviewed Nov 24, 2025

View reviewed changes

Michael Wang added 2 commits November 24, 2025 15:58

added copyright

19c2371

remove definition of num_threads; remove template param

10fc325

jiazhihao approved these changes Nov 25, 2025

View reviewed changes

jiazhihao merged commit 5b86df9 into mirage-project:mpk Nov 25, 2025
6 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Gumbel-Max Sampling Kernel #574

Support Gumbel-Max Sampling Kernel #574

Uh oh!

STWMichael commented Nov 12, 2025 •

edited

Loading

Uh oh!

yzh119 commented Nov 20, 2025

Uh oh!

JackFram Nov 24, 2025

Uh oh!

STWMichael Nov 24, 2025

Uh oh!

JackFram Nov 24, 2025

Uh oh!

STWMichael Nov 24, 2025

Uh oh!

JackFram left a comment

Uh oh!

yzh119 Nov 24, 2025

Uh oh!

JackFram Nov 24, 2025

Uh oh!

STWMichael Nov 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Support Gumbel-Max Sampling Kernel #574

Support Gumbel-Max Sampling Kernel #574

Uh oh!

Conversation

STWMichael commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yzh119 commented Nov 20, 2025

Uh oh!

JackFram Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

STWMichael Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

JackFram Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

STWMichael Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

JackFram left a comment

Choose a reason for hiding this comment

Uh oh!

yzh119 Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

JackFram Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

STWMichael Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

STWMichael commented Nov 12, 2025 •

edited

Loading