[WIP] 128k8k config update by YAMY1234 · Pull Request #156 · ishandhanani/srt-slurm

YAMY1234 · 2026-02-06T19:32:37Z

Summary by CodeRabbit

Chores
- Removed multiple outdated GB300‑FP4 128k/8k experimental deployment configurations (top‑of‑curve, mid‑curve variants, low‑latency).
New Features
- Added several new optimized deployment profiles (high‑throughput, max‑throughput, middle‑curve, MTP variants, chunked prefill).
- Added a setup script to automate environment preparation and FlashInfer/sgl‑kernel installation.

coderabbitai · 2026-02-06T19:32:57Z

📝 Walkthrough

Walkthrough

Removed five legacy gb300-fp4 128k8k YAML recipes and added multiple new 128k8k and 128k8k_mtp configuration YAMLs, a chunked-prefill recipe, and a Bash setup script; changes are configuration-only (no code/API changes).

Changes

Cohort / File(s)	Summary
Deleted Legacy Recipes `recipes/gb300-fp4/128k8k/1-top-of-curve.yaml`, `recipes/gb300-fp4/128k8k/2-mid-curve-pt1.yaml`, `recipes/gb300-fp4/128k8k/3-mid-curve-pt2.yaml`, `recipes/gb300-fp4/128k8k/4-mid-curve-pt3.yaml`, `recipes/gb300-fp4/128k8k/5-low-latency.yaml`	Removed five numbered legacy benchmark recipes, including model metadata, resource allocations, backend env blocks, sglang_config entries, and benchmark settings.
New 128k8k Recipes `recipes/gb300-fp4/128k8k/highthroughput-ctx5_ctx_pp4_gen1_dep16_batch16_eplb0_mtp0.yaml`, `recipes/gb300-fp4/128k8k/maxthroughput-ctx3_ctx_pp4_gen1_dep8_batch32_eplb0_mtp0.yaml`, `recipes/gb300-fp4/128k8k/middlecurve-ctx8_ctx_pp4_gen1_dep32_batch8_eplb0_mtp0.yaml`	Added three new 128k8k configs with model, resource, decode/prefill envs, comprehensive sglang_config, and benchmark sections.
New 128k8k_mtp Recipes `recipes/gb300-fp4/128k8k_mtp/highthroughput-ctx3_ctx_pp4_gen1_dep8_batch16_eplb0_mtp1.yaml`, `recipes/gb300-fp4/128k8k_mtp/midcurve-ctx5_ctx_pp4_gen1_dep16_batch8_eplb0_mtp2.yaml`	Added two MTP-specific configs introducing speculative/MTP parameters, MOE/backends, and per-stage tuning.
New Chunked-Prefill Recipe `recipes/gb300-fp4/128k8k/minttft-ctx1_ctx_pp4_gen1_tp4_eplb0_mtp0-chunked-32k-dynchunk.yaml`	Added a chunked-prefill configuration (dynamic chunking) with decode/prefill envs and sglang prefill/decode settings.
Setup Script `configs/gb300-fp4-128k8k-setup.sh`	Added Bash setup script to prepare branch, pip cache, uninstall/install FlashInfer packages, and install sgl-kernel wheel.
Net Effect	Removal of five legacy recipe files and addition of multiple new YAML configs plus one setup script; all edits are configuration files or scripts.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

feat: TRTLLM support #79: Adds TRTLLM backend support and related configuration fields that align with new sglang/backend entries.
Remove moe_dense_tp_size for TP configs #153: Overlaps edits/removal of gb300-fp4/128k8k YAML configurations (shared config surface).

Suggested reviewers

ishandhanani
kyleliang-nv
gracehonv

Poem

🐰 I hopped through YAML fields, pruning old leaves away,
Sprinkled new configs and a script for a brighter day.
FlashInfer bells chimed, pipelines set to run,
Benchmarks await — carrots earned, work well done! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title '[WIP] 128k8k config update' is vague and generic, using non-descriptive terms that don't clearly convey the specific changes being made.	Replace with a more specific title describing the main change, such as 'Replace legacy 128k8k recipes with performance-tuned configurations' or 'Remove experimental configs and add optimized 128k8k recipes'.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments

recipes/gb300-fp4/128k8k_mtp/highthroughput-ctx3_ctx_pp4_gen1_dep8_batch16_eplb0_mtp1.yaml (2)

21-70: Consider using YAML anchors to reduce decode/prefill environment duplication.

The decode and prefill environment blocks share ~15 identical key-value pairs. A YAML anchor (&common_env) on the shared subset with <<: *common_env merge keys would reduce drift risk when a shared variable needs updating.

47-48: Unquoted path values are inconsistent with the rest of the file.

Most environment variable values are quoted strings, but FLASHINFER_WORKSPACE_BASE and SGLANG_DG_CACHE_DIR are unquoted on Lines 47-48 and 69-70. While YAML handles this correctly, quoting them would be consistent with the surrounding style.

Also applies to: 69-70
recipes/gb300-fp4/128k8k/minttft-ctx1_ctx_pp4_gen1_tp4_eplb0_mtp0-chunked-32k-dynchunk.yaml (2)
40-41: Unquoted path values — inconsistent with the rest of the environment block.

All other environment variable values in this file are quoted strings, but FLASHINFER_WORKSPACE_BASE and SGLANG_DG_CACHE_DIR are bare. While YAML will still parse them as strings, quoting them keeps the file consistent and avoids surprises with paths that may later contain special characters.

Same issue on lines 62–63 in prefill_environment.
Suggested fix
-    FLASHINFER_WORKSPACE_BASE: /tmp/flashinfer-cache
-    SGLANG_DG_CACHE_DIR: /configs/deepgemm-cache
+    FLASHINFER_WORKSPACE_BASE: "/tmp/flashinfer-cache"
+    SGLANG_DG_CACHE_DIR: "/configs/deepgemm-cache"
21-63: decode_environment and prefill_environment are identical — consider using a YAML anchor.

The two environment blocks are copy-pasted verbatim. If they intentionally diverge later, this is fine. Otherwise, a YAML anchor (& / *) would eliminate the duplication and make future edits less error-prone:
Example using YAML anchors
-  decode_environment:
+  decode_environment: &common_env
     TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "7200"
     ...
 
-  prefill_environment:
-    TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "7200"
-    ...
+  prefill_environment:
+    <<: *common_env
recipes/gb300-fp4/128k8k/middlecurve-ctx8_ctx_pp4_gen1_dep32_batch8_eplb0_mtp0.yaml (1)
43-44: Inconsistent quoting of environment variable values.

Most env var values are quoted strings (e.g., "7200", "1"), but FLASHINFER_WORKSPACE_BASE and SGLANG_DG_CACHE_DIR are unquoted paths. While YAML will treat them as strings either way, the inconsistency can cause confusion or subtle issues if a value ever contains special YAML characters.
Suggested fix
-    FLASHINFER_WORKSPACE_BASE: /tmp/flashinfer-cache
-    SGLANG_DG_CACHE_DIR: /configs/deepgemm-cache
+    FLASHINFER_WORKSPACE_BASE: "/tmp/flashinfer-cache"
+    SGLANG_DG_CACHE_DIR: "/configs/deepgemm-cache"
Apply the same in the prefill_environment block.
Also applies to: 65-66
recipes/gb300-fp4/128k8k/maxthroughput-ctx3_ctx_pp4_gen1_dep8_batch32_eplb0_mtp0.yaml (1)

43-44: Same inconsistent quoting of path-valued env vars as the other config file.

FLASHINFER_WORKSPACE_BASE and SGLANG_DG_CACHE_DIR are unquoted while all other values are quoted strings. Consider quoting for consistency.

Also applies to: 65-66

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 10

🤖 Fix all issues with AI agents

In
`@recipes/gb300-fp4/128k8k_mtp/highthroughput-ctx3_ctx_pp4_gen1_dep8_batch16_eplb0_mtp1.yaml`:
- Around line 1-4: The recipe filename indicates EPLB is off (eplb0) but the
sglang_config enables EPLB via the keys eplb-algorithm and
ep-num-redundant-experts; either rename the recipe to reflect EPLB on (update
name to include eplb1) or disable EPLB by removing or commenting out the
sglang_config keys eplb-algorithm and ep-num-redundant-experts (and any other
EPLB-related params) so the file name and the actual config are consistent;
locate these keys (eplb-algorithm, ep-num-redundant-experts) in the recipe and
apply one of the two options.
- Around line 145-146: The YAML contains a user-specific mount entry under the
extra_mount key (the string "/home/yangminl/sglang:/sgl-workspace/sglang");
remove this personal dev path or replace it with a generic/template value (e.g.,
an env var or placeholder) so CI and other developers won't reference a
non-existent local path; update the extra_mount list in the same file to either
omit the entry or use a parameter like "${LOCAL_SGLANG}:/sgl-workspace/sglang"
and document the expected env var.

In
`@recipes/gb300-fp4/128k8k_mtp/midcurve-ctx5_ctx_pp4_gen1_dep16_batch8_eplb0_mtp2.yaml`:
- Around line 97-99: The recipe filename indicates "eplb0" but the config
enables EPLB (eplb-algorithm: deepseek) and sets ep-num-redundant-experts: 32;
update one of them to be consistent: either rename the recipe file to include
the EPLB variant (e.g., remove "eplb0" or add "eplb-deepseek") or change the
config to disable EPLB (set eplb-algorithm to "none"/"" or remove
ep-num-redundant-experts and set ep-num-redundant-experts: 0) so that
ep-num-redundant-experts and eplb-algorithm in the YAML match the filename.
- Around line 1-4: The config name contains a duplicated token "ctx5_ctx_pp4"
which looks like a copy-paste typo; update the YAML name value (the "name:"
string in this file) to the corrected identifier (e.g.,
"gb300-fp4-128k8k-ctx5_pp4_gen1_dep16_batch8_eplb0_mtp2") and ensure the recipe
filename and any other references that mirror the name use the same corrected
token ("ctx5_pp4") so the identifier is consistent across the file and
references.
- Around line 146-147: The extra_mount entry contains a personal home path
"/home/yangminl/sglang:/sgl-workspace/sglang" which must not be committed;
remove this mapping from the extra_mount list or replace it with a
non-user-specific alternative (e.g., a project-relative path, CI-standard path,
or environment variable placeholder) so that the extra_mount key remains generic
and portable; update the extra_mount value that currently references
"/home/yangminl/sglang:/sgl-workspace/sglang" accordingly.

In
`@recipes/gb300-fp4/128k8k/highthroughput-ctx5_ctx_pp4_gen1_dep16_batch16_eplb0_mtp0.yaml`:
- Around line 21-66: The environment YAML has inconsistent boolean string
casing: MC_TE_METRIC is "true" while SGLANG_MOONCAKE_CUSTOM_MEM_POOL is "True";
update the values for SGLANG_MOONCAKE_CUSTOM_MEM_POOL in both decode_environment
and prefill_environment to the same lowercase "true" (or normalize both to
"True" if your code expects title-case) so string comparisons like MC_TE_METRIC
== "true" behave consistently; ensure you change the
SGLANG_MOONCAKE_CUSTOM_MEM_POOL entries that appear alongside MC_TE_METRIC in
both blocks.
- Around line 132-133: Remove the hardcoded developer path in the extra_mount
entry ("/home/yangminl/sglang:/sgl-workspace/sglang") — either delete that mount
line from the extra_mount list or replace the left-hand side with a configurable
placeholder/env var (e.g., ${SGLANG_LOCAL_PATH}) so users can override it;
ensure the YAML key extra_mount remains valid and document the new variable in
README or defaults so CI and other developers won’t rely on a personal home
directory.

In
`@recipes/gb300-fp4/128k8k/maxthroughput-ctx3_ctx_pp4_gen1_dep8_batch32_eplb0_mtp0.yaml`:
- Around line 132-133: The extra_mount entry currently contains a hardcoded,
user-specific path "/home/yangminl/sglang:/sgl-workspace/sglang"; update the
extra_mount configuration to remove or replace this hardcoded path by either
deleting the mount, substituting a project-relative/shared path (e.g.,
"./sglang:/sgl-workspace/sglang"), or parameterizing it with an environment
variable (e.g., use a placeholder like
"${SGLANG_HOST_PATH}:/sgl-workspace/sglang" and document SGLANG_HOST_PATH). Edit
the extra_mount key in the YAML to implement one of these options and ensure no
username or absolute home path remains.

In
`@recipes/gb300-fp4/128k8k/middlecurve-ctx8_ctx_pp4_gen1_dep32_batch8_eplb0_mtp0.yaml`:
- Around line 132-133: The extra_mount entry contains a hardcoded personal path
"/home/yangminl/sglang" which will break for other users; edit the extra_mount
list (key: extra_mount) to remove this user-specific mount or replace it with a
parameterized/portable value (e.g., use an environment variable placeholder like
${SG_LANG_PATH} or ${HOME}/sglang, or point to a shared well-known path) and
update any README or config docs to describe how to set that variable so the
recipe is portable.
- Around line 90-94: The filename indicates EPLB is disabled (suffix "eplb0")
but the config enables it via ep-num-redundant-experts and eplb-algorithm;
reconcile by either updating the filename to reflect EPLB enabled or disabling
EPLB in the config: change eplb-related fields (eplb-algorithm,
ep-num-redundant-experts, and any related flags like
disable-shared-experts-fusion if needed) so they reflect "EPLB off" (e.g., set
eplb-algorithm to a disabled/none value and ep-num-redundant-experts to 0) or
rename the recipe to remove the "eplb0" suffix so it accurately denotes EPLB is
active.

🧹 Nitpick comments (8)

recipes/gb300-fp4/128k8k_mtp/midcurve-ctx5_ctx_pp4_gen1_dep16_batch8_eplb0_mtp2.yaml (2)

46-49: Unresolved TODO comments — confirm whether these env vars are needed for prefill.

Lines 48–49 flag open questions (TODO: check if needed for prefill) for SGLANG_NCCL_ALL_GATHER_IN_OVERLAP_SCHEDULER_SYNC_BATCH and SGLANG_BLACKWELL_OVERLAP_SHARED_EXPERTS_OUTSIDE_SBO. Since these are currently absent from prefill_environment, please verify and resolve the TODOs before merging, or leave a tracking issue.

Would you like me to open an issue to track resolving these TODOs?

126-136: Commented-out config lines — clean up or document intent.

Lines 126, 130, and 136 contain commented-out settings (enable-symm-mem, max-total-tokens, alternative moe-runner-backend). If these are intentionally kept as reference for future tuning, a brief comment explaining why would help; otherwise, remove them to reduce noise.

recipes/gb300-fp4/128k8k/maxthroughput-ctx3_ctx_pp4_gen1_dep8_batch32_eplb0_mtp0.yaml (2)

21-66: Consider using YAML anchors to reduce environment duplication.

The decode_environment and prefill_environment blocks share ~15 identical key-value pairs, with decode adding only 3 extra vars (SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK, SGLANG_MOE_NVFP4_DISPATCH, SGLANG_NVFP4_CKPT_FP8_NEXTN_MOE). YAML anchors (& / <<: *) would eliminate the drift risk when a shared variable needs updating.

113-117: Commented-out config keys left in prefill section.

Lines 113 (enable-symm-mem) and 117 (max-total-tokens) are commented out. If these are intentionally documented as tuning knobs, a brief inline comment explaining why they're disabled would help future readers; otherwise consider removing them to keep the config clean.

recipes/gb300-fp4/128k8k/middlecurve-ctx8_ctx_pp4_gen1_dep32_batch8_eplb0_mtp0.yaml (2)

113-113: Clean up commented-out config lines before merging.

Lines 113 (# enable-symm-mem: true) and 117 (# max-total-tokens: 544000) are development remnants. If they're intentionally disabled, they can simply be removed to keep the config clean. If they may be re-enabled, consider adding a brief comment explaining why they're preserved.

Also applies to: 117-117

43-44: Minor style inconsistency: unquoted string values.

FLASHINFER_WORKSPACE_BASE and SGLANG_DG_CACHE_DIR values are bare paths while most other env vars use quoted strings. YAML handles this fine, but quoting them would be consistent with the rest of the file.

Also applies to: 65-66

recipes/gb300-fp4/128k8k_mtp/highthroughput-ctx3_ctx_pp4_gen1_dep8_batch16_eplb0_mtp1.yaml (2)

46-49: Resolve TODO comments before merging.

Lines 48–49 flag open questions about whether SGLANG_NCCL_ALL_GATHER_IN_OVERLAP_SCHEDULER_SYNC_BATCH and SGLANG_BLACKWELL_OVERLAP_SHARED_EXPERTS_OUTSIDE_SBO are needed for prefill. These are only set in the decode environment; if they are indeed needed for prefill as well, they should be added to prefill_environment. Please resolve these TODOs (or confirm they're decode-only) before this ships.

Would you like me to open an issue to track confirming and removing these TODOs?

43-44: Unquoted path values — minor inconsistency.

FLASHINFER_WORKSPACE_BASE and SGLANG_DG_CACHE_DIR are unquoted on Lines 43–44 and 70–71, while most other values in the environment blocks are quoted strings. YAML will still parse these as strings, but quoting them would be consistent with the rest of the file.

Also applies to: 70-71

coderabbitai · 2026-02-06T19:36:22Z

recipes/gb300-fp4/128k8k_mtp/highthroughput-ctx3_ctx_pp4_gen1_dep8_batch16_eplb0_mtp1.yaml

+# Config: ctx3_ctx_pp4_gen1_dep8_batch16_eplb0_mtp1
+# max global bs: 128, concurrency: 256
+
+name: "gb300-fp4-128k8k-ctx3_ctx_pp4_gen1_dep8_batch16_eplb0_mtp1"


⚠️ Potential issue | 🟠 Major

Filename says eplb0 but config enables EPLB.

The filename encodes eplb0 (EPLB off), yet the decode sglang_config sets eplb-algorithm: deepseek (Line 99) and ep-num-redundant-experts: 32 (Line 97). Either the filename should be updated to reflect EPLB being on, or those config keys should be removed if EPLB is truly intended to be disabled.

🤖 Prompt for AI Agents

In `@recipes/gb300-fp4/128k8k_mtp/highthroughput-ctx3_ctx_pp4_gen1_dep8_batch16_eplb0_mtp1.yaml` around lines 1 - 4, The recipe filename indicates EPLB is off (eplb0) but the sglang_config enables EPLB via the keys eplb-algorithm and ep-num-redundant-experts; either rename the recipe to reflect EPLB on (update name to include eplb1) or disable EPLB by removing or commenting out the sglang_config keys eplb-algorithm and ep-num-redundant-experts (and any other EPLB-related params) so the file name and the actual config are consistent; locate these keys (eplb-algorithm, ep-num-redundant-experts) in the recipe and apply one of the two options.

recipes/gb300-fp4/128k8k_mtp/highthroughput-ctx3_ctx_pp4_gen1_dep8_batch16_eplb0_mtp1.yaml

coderabbitai · 2026-02-06T19:36:22Z

recipes/gb300-fp4/128k8k_mtp/midcurve-ctx5_ctx_pp4_gen1_dep16_batch8_eplb0_mtp2.yaml

+# Config: ctx5_ctx_pp4_gen1_dep16_batch8_eplb0_mtp2
+# max global bs: 128, concurrency: 256
+
+name: "gb300-fp4-128k8k-ctx5_ctx_pp4_gen1_dep16_batch8_eplb0_mtp2"


⚠️ Potential issue | 🟡 Minor

Possible typo in the config identifier: ctx5_ctx.

The name and filename both contain the substring ctx5_ctx_pp4, where the repeated ctx fragment looks like a copy-paste artifact. Should this be ctx5_pp4 (or a different token)?

🤖 Prompt for AI Agents

In `@recipes/gb300-fp4/128k8k_mtp/midcurve-ctx5_ctx_pp4_gen1_dep16_batch8_eplb0_mtp2.yaml` around lines 1 - 4, The config name contains a duplicated token "ctx5_ctx_pp4" which looks like a copy-paste typo; update the YAML name value (the "name:" string in this file) to the corrected identifier (e.g., "gb300-fp4-128k8k-ctx5_pp4_gen1_dep16_batch8_eplb0_mtp2") and ensure the recipe filename and any other references that mirror the name use the same corrected token ("ctx5_pp4") so the identifier is consistent across the file and references.

coderabbitai · 2026-02-06T19:36:22Z

recipes/gb300-fp4/128k8k_mtp/midcurve-ctx5_ctx_pp4_gen1_dep16_batch8_eplb0_mtp2.yaml

+      ep-num-redundant-experts: 32
+      disable-shared-experts-fusion: true
+      eplb-algorithm: deepseek


⚠️ Potential issue | 🟡 Minor

Filename says eplb0 but config sets eplb-algorithm: deepseek.

The filename encodes eplb0, which implies EPLB is disabled or zeroed, but the decode config explicitly enables it with eplb-algorithm: deepseek and ep-num-redundant-experts: 32. Either the filename should reflect the actual EPLB setting or the config should disable EPLB to match the name.

🤖 Prompt for AI Agents

In `@recipes/gb300-fp4/128k8k_mtp/midcurve-ctx5_ctx_pp4_gen1_dep16_batch8_eplb0_mtp2.yaml` around lines 97 - 99, The recipe filename indicates "eplb0" but the config enables EPLB (eplb-algorithm: deepseek) and sets ep-num-redundant-experts: 32; update one of them to be consistent: either rename the recipe file to include the EPLB variant (e.g., remove "eplb0" or add "eplb-deepseek") or change the config to disable EPLB (set eplb-algorithm to "none"/"" or remove ep-num-redundant-experts and set ep-num-redundant-experts: 0) so that ep-num-redundant-experts and eplb-algorithm in the YAML match the filename.

recipes/gb300-fp4/128k8k_mtp/midcurve-ctx5_ctx_pp4_gen1_dep16_batch8_eplb0_mtp2.yaml

coderabbitai · 2026-02-06T19:36:22Z

recipes/gb300-fp4/128k8k/highthroughput-ctx5_ctx_pp4_gen1_dep16_batch16_eplb0_mtp0.yaml

+  decode_environment:
+    TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "7200"
+    PYTHONUNBUFFERED: "1"
+    DYN_SKIP_SGLANG_LOG_FORMATTING: "1"
+    SGLANG_NVFP4_CKPT_FP8_GEMM_IN_ATTN: "1"
+    SGLANG_PER_TOKEN_GROUP_QUANT_8BIT_V2: "1"
+    SGLANG_DISAGGREGATION_HEARTBEAT_MAX_FAILURE: "100000"
+    SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT: "100000"
+    SGLANG_DISAGGREGATION_WAITING_TIMEOUT: "100000"
+    SGLANG_HACK_SEQ_BOOTSTRAP_ROOM: "1"
+    MC_TE_METRIC: "true"
+    MC_FORCE_MNNVL: "1"
+    NCCL_MNNVL_ENABLE: "1"
+    NCCL_CUMEM_ENABLE: "1"
+    SGLANG_MOONCAKE_CUSTOM_MEM_POOL: "True"
+    SGLANG_USE_MESSAGE_QUEUE_BROADCASTER: "0"
+    SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK: "1"
+    SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK: "1024"
+    SGLANG_MOE_NVFP4_DISPATCH: "1"
+    SGLANG_NVFP4_CKPT_FP8_NEXTN_MOE: "1"
+    SGLANG_FLASHINFER_FP4_GEMM_BACKEND: "cutlass"
+    FLASHINFER_DISABLE_VERSION_CHECK: "1"
+    FLASHINFER_WORKSPACE_BASE: /tmp/flashinfer-cache
+    SGLANG_DG_CACHE_DIR: /configs/deepgemm-cache
+
+  prefill_environment:
+    TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "7200"
+    PYTHONUNBUFFERED: "1"
+    DYN_SKIP_SGLANG_LOG_FORMATTING: "1"
+    SGLANG_NVFP4_CKPT_FP8_GEMM_IN_ATTN: "1"
+    SGLANG_PER_TOKEN_GROUP_QUANT_8BIT_V2: "1"
+    SGLANG_DISAGGREGATION_HEARTBEAT_MAX_FAILURE: "100000"
+    SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT: "100000"
+    SGLANG_DISAGGREGATION_WAITING_TIMEOUT: "100000"
+    SGLANG_HACK_SEQ_BOOTSTRAP_ROOM: "1"
+    MC_TE_METRIC: "true"
+    MC_FORCE_MNNVL: "1"
+    NCCL_MNNVL_ENABLE: "1"
+    NCCL_CUMEM_ENABLE: "1"
+    SGLANG_MOONCAKE_CUSTOM_MEM_POOL: "True"
+    SGLANG_USE_MESSAGE_QUEUE_BROADCASTER: "0"
+    SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK: "1"
+    SGLANG_FLASHINFER_FP4_GEMM_BACKEND: "cutlass"
+    FLASHINFER_DISABLE_VERSION_CHECK: "1"
+    FLASHINFER_WORKSPACE_BASE: /tmp/flashinfer-cache
+    SGLANG_DG_CACHE_DIR: /configs/deepgemm-cache


⚠️ Potential issue | 🟡 Minor

Minor inconsistency: "true" vs "True" casing in environment values.

MC_TE_METRIC uses lowercase "true" (lines 31, 56) while SGLANG_MOONCAKE_CUSTOM_MEM_POOL uses title-case "True" (lines 35, 60). If the consuming code does strict string comparison (e.g., == "true"), the title-case variant may not be recognized. Worth normalizing to a consistent casing.

🤖 Prompt for AI Agents

In `@recipes/gb300-fp4/128k8k/highthroughput-ctx5_ctx_pp4_gen1_dep16_batch16_eplb0_mtp0.yaml` around lines 21 - 66, The environment YAML has inconsistent boolean string casing: MC_TE_METRIC is "true" while SGLANG_MOONCAKE_CUSTOM_MEM_POOL is "True"; update the values for SGLANG_MOONCAKE_CUSTOM_MEM_POOL in both decode_environment and prefill_environment to the same lowercase "true" (or normalize both to "True" if your code expects title-case) so string comparisons like MC_TE_METRIC == "true" behave consistently; ensure you change the SGLANG_MOONCAKE_CUSTOM_MEM_POOL entries that appear alongside MC_TE_METRIC in both blocks.

recipes/gb300-fp4/128k8k/highthroughput-ctx5_ctx_pp4_gen1_dep16_batch16_eplb0_mtp0.yaml

recipes/gb300-fp4/128k8k/maxthroughput-ctx3_ctx_pp4_gen1_dep8_batch32_eplb0_mtp0.yaml

coderabbitai · 2026-02-06T19:36:22Z

recipes/gb300-fp4/128k8k/middlecurve-ctx8_ctx_pp4_gen1_dep32_batch8_eplb0_mtp0.yaml

+      deepep-mode: low_latency
+      ep-dispatch-algorithm: static
+      ep-num-redundant-experts: 32
+      disable-shared-experts-fusion: true
+      eplb-algorithm: deepseek


⚠️ Potential issue | 🟡 Minor

Filename says eplb0 but config enables EPLB.

The filename suffix eplb0 implies EPLB is disabled, yet the decode config sets eplb-algorithm: deepseek with ep-num-redundant-experts: 32. Either the filename is stale or these config values are unintended. Please reconcile.

🤖 Prompt for AI Agents

In `@recipes/gb300-fp4/128k8k/middlecurve-ctx8_ctx_pp4_gen1_dep32_batch8_eplb0_mtp0.yaml` around lines 90 - 94, The filename indicates EPLB is disabled (suffix "eplb0") but the config enables it via ep-num-redundant-experts and eplb-algorithm; reconcile by either updating the filename to reflect EPLB enabled or disabling EPLB in the config: change eplb-related fields (eplb-algorithm, ep-num-redundant-experts, and any related flags like disable-shared-experts-fusion if needed) so they reflect "EPLB off" (e.g., set eplb-algorithm to a disabled/none value and ep-num-redundant-experts to 0) or rename the recipe to remove the "eplb0" suffix so it accurately denotes EPLB is active.

recipes/gb300-fp4/128k8k/middlecurve-ctx8_ctx_pp4_gen1_dep32_batch8_eplb0_mtp0.yaml

Fridge003 · 2026-02-10T17:04:13Z

recipes/gb300-fp4/128k8k_mtp/midcurve-ctx5_ctx_pp4_gen1_dep16_batch8_eplb0_mtp2.yaml

These two flags are decode only

Got it, removing in next patchset

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@configs/gb300-fp4-128k8k-setup.sh`:
- Around line 39-42: The direct wheel URL for sgl-kernel
(sgl_kernel-0.3.21+cu130-cp310-abi3_manylinux2014_aarch64.whl) is pinned to
aarch64 and CPython 3.10 and may install an incompatible binary; add a pre-check
that validates the host architecture and Python major/minor (e.g., check uname
-m and python -c "import sys; print(sys.version_info[:2])") before running the
pip install, and if they don’t match aarch64/3.10 either abort with a clear
error message or choose an alternative wheel/source, or at minimum add an
explicit comment documenting the assumed architecture/Python and reference
PIP_ARGS and the exact wheel filename so future changes will update this guard.
- Around line 4-10: The setup script currently hardcodes BRANCH="gb300_blog" and
replaces the remote with a personal fork via the git remote add origin
https://github.com/YAMY1234/sglang.git and then checks out origin/${BRANCH},
which risks non-reproducible and broken deployments; change the script to accept
the remote URL and branch/commit as parameters (or default to the official
upstream repo sgl-project/sglang), stop force-replacing origin blindly (use git
remote set-url or conditionally add if missing), and pin the checkout to a
specific commit SHA instead of checking out origin/${BRANCH} so the code is
reproducible (update the BRANCH variable usage and the git fetch / git checkout
steps to fetch the chosen remote and checkout the provided commit SHA).

In
`@recipes/gb300-fp4/128k8k/maxthroughput-ctx3_ctx_pp4_gen1_dep8_batch32_eplb0_mtp0.yaml`:
- Around line 1-4: The filename indicates EPLB disabled (eplb0) but the config
enables it — fix the mismatch by either renaming the recipe to use eplb1 in the
name or disabling the EPLB keys; locate and update the entries
ep-num-redundant-experts, disable-shared-experts-fusion, and eplb-algorithm
(lines shown in the diff) to be consistent with the filename: if you want EPLB
off remove or zero out ep-num-redundant-experts and reset/disable
disable-shared-experts-fusion and eplb-algorithm, or else rename the file and
top-level name string from eplb0 to eplb1 to reflect that EPLB is enabled.

In
`@recipes/gb300-fp4/128k8k/minttft-ctx1_ctx_pp4_gen1_tp4_eplb0_mtp0-chunked-32k-dynchunk.yaml`:
- Around line 1-4: The config filename contains "tp4" while the YAML name field
and top comment contain "tep4", causing a mismatch; pick the canonical shorthand
and make them consistent by either renaming the file to use "tep4" or updating
the name field string and comment to use "tp4" (edit the value of the name key
"gb300-fp4-128k8k-ctx1_ctx_pp4_gen1_tep4_batch1_eplb0_mtp0" and the top comment
accordingly), ensuring any tooling or references that parse the filename or the
name key will match.

🧹 Nitpick comments (1)

recipes/gb300-fp4/128k8k_mtp/midcurve-ctx5_ctx_pp4_gen1_dep16_batch8_eplb0_mtp2.yaml (1)

19-66: Decode and prefill environment blocks share ~15 identical variables.

Consider extracting common environment variables into a shared/base section (if supported by the deployment framework) to reduce duplication and drift risk. Not blocking, just a maintainability suggestion.

coderabbitai · 2026-02-10T20:09:10Z

configs/gb300-fp4-128k8k-setup.sh

+BRANCH="gb300_blog"
+
+cd /sgl-workspace/sglang
+git remote remove origin
+git remote add origin https://github.com/YAMY1234/sglang.git
+git fetch origin
+git checkout origin/${BRANCH}


⚠️ Potential issue | 🔴 Critical

Setup script points origin to a personal fork — this should not be merged as-is.

Replacing the container's origin remote with https://github.com/YAMY1234/sglang.git ties every deployment to a personal fork. If this fork diverges, is force-pushed, or is deleted, all environments running this setup will break or pull unexpected code.

Additionally, checking out origin/${BRANCH} (a branch tip) without pinning a specific commit SHA makes this non-reproducible.

Consider:

Using the upstream sgl-project/sglang repository, or at minimum parameterizing the remote URL.

Pinning a specific commit hash instead of a branch name.

Suggested improvement

-BRANCH="gb300_blog" +REPO_URL="${SGLANG_REPO_URL:-https://github.com/sgl-project/sglang.git}" +COMMIT="${SGLANG_COMMIT:-<pin-a-specific-sha>}" cd /sgl-workspace/sglang git remote remove origin -git remote add origin https://github.com/YAMY1234/sglang.git +git remote add origin "$REPO_URL" git fetch origin -git checkout origin/${BRANCH} +git checkout "$COMMIT"

🤖 Prompt for AI Agents

In `@configs/gb300-fp4-128k8k-setup.sh` around lines 4 - 10, The setup script currently hardcodes BRANCH="gb300_blog" and replaces the remote with a personal fork via the git remote add origin https://github.com/YAMY1234/sglang.git and then checks out origin/${BRANCH}, which risks non-reproducible and broken deployments; change the script to accept the remote URL and branch/commit as parameters (or default to the official upstream repo sgl-project/sglang), stop force-replacing origin blindly (use git remote set-url or conditionally add if missing), and pin the checkout to a specific commit SHA instead of checking out origin/${BRANCH} so the code is reproducible (update the BRANCH variable usage and the git fetch / git checkout steps to fetch the chosen remote and checkout the provided commit SHA).

coderabbitai · 2026-02-10T20:09:11Z

configs/gb300-fp4-128k8k-setup.sh

+# Install sgl-kernel 0.3.21 for CUDA 13.0 (aarch64)
+echo "Installing sgl-kernel 0.3.21..."
+pip install https://github.com/sgl-project/whl/releases/download/v0.3.21/sgl_kernel-0.3.21+cu130-cp310-abi3-manylinux2014_aarch64.whl --force-reinstall $PIP_ARGS
+echo "sgl-kernel 0.3.21 installed"


⚠️ Potential issue | 🟡 Minor

Direct wheel URL for sgl-kernel is architecture- and Python-version-specific.

The wheel sgl_kernel-0.3.21+cu130-cp310-abi3-manylinux2014_aarch64.whl is pinned to aarch64 and cp310 ABI. If the container ever changes Python version or architecture, this will silently install an incompatible binary. Consider adding a guard or comment documenting these assumptions.

🤖 Prompt for AI Agents

In `@configs/gb300-fp4-128k8k-setup.sh` around lines 39 - 42, The direct wheel URL for sgl-kernel (sgl_kernel-0.3.21+cu130-cp310-abi3_manylinux2014_aarch64.whl) is pinned to aarch64 and CPython 3.10 and may install an incompatible binary; add a pre-check that validates the host architecture and Python major/minor (e.g., check uname -m and python -c "import sys; print(sys.version_info[:2])") before running the pip install, and if they don’t match aarch64/3.10 either abort with a clear error message or choose an alternative wheel/source, or at minimum add an explicit comment documenting the assumed architecture/Python and reference PIP_ARGS and the exact wheel filename so future changes will update this guard.

coderabbitai · 2026-02-10T20:09:11Z

recipes/gb300-fp4/128k8k/maxthroughput-ctx3_ctx_pp4_gen1_dep8_batch32_eplb0_mtp0.yaml

+# Config: ctx3_ctx_pp4_gen1_dep8_batch32_eplb0_mtp0
+# max global bs: 256, concurrency: 512
+
+name: "gb300-fp4-128k8k-ctx3_ctx_pp4_gen1_dep8_batch32_eplb0_mtp0"


⚠️ Potential issue | 🟠 Major

Filename encodes eplb0 but decode config enables EPLB.

Lines 87-89 set ep-num-redundant-experts: 32, disable-shared-experts-fusion: true, and eplb-algorithm: deepseek, which all indicate EPLB is active. The filename eplb0 suggests it's off. This mismatch is present across multiple YAML files in this PR (maxthroughput, highthroughput, middlecurve). Please reconcile — either rename the files to eplb1 or remove the EPLB-related keys.

Also applies to: 85-89

🤖 Prompt for AI Agents

In `@recipes/gb300-fp4/128k8k/maxthroughput-ctx3_ctx_pp4_gen1_dep8_batch32_eplb0_mtp0.yaml` around lines 1 - 4, The filename indicates EPLB disabled (eplb0) but the config enables it — fix the mismatch by either renaming the recipe to use eplb1 in the name or disabling the EPLB keys; locate and update the entries ep-num-redundant-experts, disable-shared-experts-fusion, and eplb-algorithm (lines shown in the diff) to be consistent with the filename: if you want EPLB off remove or zero out ep-num-redundant-experts and reset/disable disable-shared-experts-fusion and eplb-algorithm, or else rename the file and top-level name string from eplb0 to eplb1 to reflect that EPLB is enabled.

coderabbitai · 2026-02-10T20:09:11Z

recipes/gb300-fp4/128k8k/minttft-ctx1_ctx_pp4_gen1_tp4_eplb0_mtp0-chunked-32k-dynchunk.yaml

+# Config: ctx1_ctx_pp4_gen1_tep4_batch1_eplb0_mtp0
+# chunked prefill size: 32000 (dynamic chunking enabled)
+
+name: "gb300-fp4-128k8k-ctx1_ctx_pp4_gen1_tep4_batch1_eplb0_mtp0"


⚠️ Potential issue | 🟡 Minor

Filename says tp4 but the name field and comment say tep4 — which is correct?

The filename uses tp4 while Line 1 and Line 4 both use tep4. If tep4 is a shorthand for tensor+expert parallel, this could confuse tooling or humans grepping by filename vs. the name key. Please reconcile so the filename and config name match.

🤖 Prompt for AI Agents

In `@recipes/gb300-fp4/128k8k/minttft-ctx1_ctx_pp4_gen1_tp4_eplb0_mtp0-chunked-32k-dynchunk.yaml` around lines 1 - 4, The config filename contains "tp4" while the YAML name field and top comment contain "tep4", causing a mismatch; pick the canonical shorthand and make them consistent by either renaming the file to use "tep4" or updating the name field string and comment to use "tp4" (edit the value of the name key "gb300-fp4-128k8k-ctx1_ctx_pp4_gen1_tep4_batch1_eplb0_mtp0" and the top comment accordingly), ensuring any tooling or references that parse the filename or the name key will match.

128k8k config update

b6baf63

coderabbitai bot reviewed Feb 6, 2026

View reviewed changes

Fridge003 reviewed Feb 10, 2026

View reviewed changes

config update & clean up

e0e1407

coderabbitai bot reviewed Feb 10, 2026

View reviewed changes

add cache dir to avoid warm up timeout issue

f2658f4

Conversation

YAMY1234 commented Feb 6, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Fridge003 Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

YAMY1234 Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

YAMY1234 commented Feb 6, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 6, 2026 •

edited

Loading