Skip to content

[WIP] 128k8k config update#156

Open
YAMY1234 wants to merge 3 commits intoishandhanani:mainfrom
YAMY1234:update_gb300_128k8k_config
Open

[WIP] 128k8k config update#156
YAMY1234 wants to merge 3 commits intoishandhanani:mainfrom
YAMY1234:update_gb300_128k8k_config

Conversation

@YAMY1234
Copy link
Contributor

@YAMY1234 YAMY1234 commented Feb 6, 2026

Summary by CodeRabbit

  • Chores
    • Removed multiple outdated GB300‑FP4 128k/8k experimental deployment configurations (top‑of‑curve, mid‑curve variants, low‑latency).
  • New Features
    • Added several new optimized deployment profiles (high‑throughput, max‑throughput, middle‑curve, MTP variants, chunked prefill).
    • Added a setup script to automate environment preparation and FlashInfer/sgl‑kernel installation.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 6, 2026

📝 Walkthrough

Walkthrough

Removed five legacy gb300-fp4 128k8k YAML recipes and added multiple new 128k8k and 128k8k_mtp configuration YAMLs, a chunked-prefill recipe, and a Bash setup script; changes are configuration-only (no code/API changes).

Changes

Cohort / File(s) Summary
Deleted Legacy Recipes
recipes/gb300-fp4/128k8k/1-top-of-curve.yaml, recipes/gb300-fp4/128k8k/2-mid-curve-pt1.yaml, recipes/gb300-fp4/128k8k/3-mid-curve-pt2.yaml, recipes/gb300-fp4/128k8k/4-mid-curve-pt3.yaml, recipes/gb300-fp4/128k8k/5-low-latency.yaml
Removed five numbered legacy benchmark recipes, including model metadata, resource allocations, backend env blocks, sglang_config entries, and benchmark settings.
New 128k8k Recipes
recipes/gb300-fp4/128k8k/highthroughput-ctx5_ctx_pp4_gen1_dep16_batch16_eplb0_mtp0.yaml, recipes/gb300-fp4/128k8k/maxthroughput-ctx3_ctx_pp4_gen1_dep8_batch32_eplb0_mtp0.yaml, recipes/gb300-fp4/128k8k/middlecurve-ctx8_ctx_pp4_gen1_dep32_batch8_eplb0_mtp0.yaml
Added three new 128k8k configs with model, resource, decode/prefill envs, comprehensive sglang_config, and benchmark sections.
New 128k8k_mtp Recipes
recipes/gb300-fp4/128k8k_mtp/highthroughput-ctx3_ctx_pp4_gen1_dep8_batch16_eplb0_mtp1.yaml, recipes/gb300-fp4/128k8k_mtp/midcurve-ctx5_ctx_pp4_gen1_dep16_batch8_eplb0_mtp2.yaml
Added two MTP-specific configs introducing speculative/MTP parameters, MOE/backends, and per-stage tuning.
New Chunked-Prefill Recipe
recipes/gb300-fp4/128k8k/minttft-ctx1_ctx_pp4_gen1_tp4_eplb0_mtp0-chunked-32k-dynchunk.yaml
Added a chunked-prefill configuration (dynamic chunking) with decode/prefill envs and sglang prefill/decode settings.
Setup Script
configs/gb300-fp4-128k8k-setup.sh
Added Bash setup script to prepare branch, pip cache, uninstall/install FlashInfer packages, and install sgl-kernel wheel.
Net Effect Removal of five legacy recipe files and addition of multiple new YAML configs plus one setup script; all edits are configuration files or scripts.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • ishandhanani
  • kyleliang-nv
  • gracehonv

Poem

🐰 I hopped through YAML fields, pruning old leaves away,
Sprinkled new configs and a script for a brighter day.
FlashInfer bells chimed, pipelines set to run,
Benchmarks await — carrots earned, work well done! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title '[WIP] 128k8k config update' is vague and generic, using non-descriptive terms that don't clearly convey the specific changes being made. Replace with a more specific title describing the main change, such as 'Replace legacy 128k8k recipes with performance-tuned configurations' or 'Remove experimental configs and add optimized 128k8k recipes'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments
recipes/gb300-fp4/128k8k_mtp/highthroughput-ctx3_ctx_pp4_gen1_dep8_batch16_eplb0_mtp1.yaml (2)

21-70: Consider using YAML anchors to reduce decode/prefill environment duplication.

The decode and prefill environment blocks share ~15 identical key-value pairs. A YAML anchor (&common_env) on the shared subset with <<: *common_env merge keys would reduce drift risk when a shared variable needs updating.


47-48: Unquoted path values are inconsistent with the rest of the file.

Most environment variable values are quoted strings, but FLASHINFER_WORKSPACE_BASE and SGLANG_DG_CACHE_DIR are unquoted on Lines 47-48 and 69-70. While YAML handles this correctly, quoting them would be consistent with the surrounding style.

Also applies to: 69-70

recipes/gb300-fp4/128k8k/minttft-ctx1_ctx_pp4_gen1_tp4_eplb0_mtp0-chunked-32k-dynchunk.yaml (2)

40-41: Unquoted path values — inconsistent with the rest of the environment block.

All other environment variable values in this file are quoted strings, but FLASHINFER_WORKSPACE_BASE and SGLANG_DG_CACHE_DIR are bare. While YAML will still parse them as strings, quoting them keeps the file consistent and avoids surprises with paths that may later contain special characters.

Same issue on lines 62–63 in prefill_environment.

Suggested fix
-    FLASHINFER_WORKSPACE_BASE: /tmp/flashinfer-cache
-    SGLANG_DG_CACHE_DIR: /configs/deepgemm-cache
+    FLASHINFER_WORKSPACE_BASE: "/tmp/flashinfer-cache"
+    SGLANG_DG_CACHE_DIR: "/configs/deepgemm-cache"

21-63: decode_environment and prefill_environment are identical — consider using a YAML anchor.

The two environment blocks are copy-pasted verbatim. If they intentionally diverge later, this is fine. Otherwise, a YAML anchor (& / *) would eliminate the duplication and make future edits less error-prone:

Example using YAML anchors
-  decode_environment:
+  decode_environment: &common_env
     TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "7200"
     ...
 
-  prefill_environment:
-    TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "7200"
-    ...
+  prefill_environment:
+    <<: *common_env
recipes/gb300-fp4/128k8k/middlecurve-ctx8_ctx_pp4_gen1_dep32_batch8_eplb0_mtp0.yaml (1)

43-44: Inconsistent quoting of environment variable values.

Most env var values are quoted strings (e.g., "7200", "1"), but FLASHINFER_WORKSPACE_BASE and SGLANG_DG_CACHE_DIR are unquoted paths. While YAML will treat them as strings either way, the inconsistency can cause confusion or subtle issues if a value ever contains special YAML characters.

Suggested fix
-    FLASHINFER_WORKSPACE_BASE: /tmp/flashinfer-cache
-    SGLANG_DG_CACHE_DIR: /configs/deepgemm-cache
+    FLASHINFER_WORKSPACE_BASE: "/tmp/flashinfer-cache"
+    SGLANG_DG_CACHE_DIR: "/configs/deepgemm-cache"

Apply the same in the prefill_environment block.

Also applies to: 65-66

recipes/gb300-fp4/128k8k/maxthroughput-ctx3_ctx_pp4_gen1_dep8_batch32_eplb0_mtp0.yaml (1)

43-44: Same inconsistent quoting of path-valued env vars as the other config file.

FLASHINFER_WORKSPACE_BASE and SGLANG_DG_CACHE_DIR are unquoted while all other values are quoted strings. Consider quoting for consistency.

Also applies to: 65-66

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🤖 Fix all issues with AI agents
In
`@recipes/gb300-fp4/128k8k_mtp/highthroughput-ctx3_ctx_pp4_gen1_dep8_batch16_eplb0_mtp1.yaml`:
- Around line 1-4: The recipe filename indicates EPLB is off (eplb0) but the
sglang_config enables EPLB via the keys eplb-algorithm and
ep-num-redundant-experts; either rename the recipe to reflect EPLB on (update
name to include eplb1) or disable EPLB by removing or commenting out the
sglang_config keys eplb-algorithm and ep-num-redundant-experts (and any other
EPLB-related params) so the file name and the actual config are consistent;
locate these keys (eplb-algorithm, ep-num-redundant-experts) in the recipe and
apply one of the two options.
- Around line 145-146: The YAML contains a user-specific mount entry under the
extra_mount key (the string "/home/yangminl/sglang:/sgl-workspace/sglang");
remove this personal dev path or replace it with a generic/template value (e.g.,
an env var or placeholder) so CI and other developers won't reference a
non-existent local path; update the extra_mount list in the same file to either
omit the entry or use a parameter like "${LOCAL_SGLANG}:/sgl-workspace/sglang"
and document the expected env var.

In
`@recipes/gb300-fp4/128k8k_mtp/midcurve-ctx5_ctx_pp4_gen1_dep16_batch8_eplb0_mtp2.yaml`:
- Around line 97-99: The recipe filename indicates "eplb0" but the config
enables EPLB (eplb-algorithm: deepseek) and sets ep-num-redundant-experts: 32;
update one of them to be consistent: either rename the recipe file to include
the EPLB variant (e.g., remove "eplb0" or add "eplb-deepseek") or change the
config to disable EPLB (set eplb-algorithm to "none"/"" or remove
ep-num-redundant-experts and set ep-num-redundant-experts: 0) so that
ep-num-redundant-experts and eplb-algorithm in the YAML match the filename.
- Around line 1-4: The config name contains a duplicated token "ctx5_ctx_pp4"
which looks like a copy-paste typo; update the YAML name value (the "name:"
string in this file) to the corrected identifier (e.g.,
"gb300-fp4-128k8k-ctx5_pp4_gen1_dep16_batch8_eplb0_mtp2") and ensure the recipe
filename and any other references that mirror the name use the same corrected
token ("ctx5_pp4") so the identifier is consistent across the file and
references.
- Around line 146-147: The extra_mount entry contains a personal home path
"/home/yangminl/sglang:/sgl-workspace/sglang" which must not be committed;
remove this mapping from the extra_mount list or replace it with a
non-user-specific alternative (e.g., a project-relative path, CI-standard path,
or environment variable placeholder) so that the extra_mount key remains generic
and portable; update the extra_mount value that currently references
"/home/yangminl/sglang:/sgl-workspace/sglang" accordingly.

In
`@recipes/gb300-fp4/128k8k/highthroughput-ctx5_ctx_pp4_gen1_dep16_batch16_eplb0_mtp0.yaml`:
- Around line 21-66: The environment YAML has inconsistent boolean string
casing: MC_TE_METRIC is "true" while SGLANG_MOONCAKE_CUSTOM_MEM_POOL is "True";
update the values for SGLANG_MOONCAKE_CUSTOM_MEM_POOL in both decode_environment
and prefill_environment to the same lowercase "true" (or normalize both to
"True" if your code expects title-case) so string comparisons like MC_TE_METRIC
== "true" behave consistently; ensure you change the
SGLANG_MOONCAKE_CUSTOM_MEM_POOL entries that appear alongside MC_TE_METRIC in
both blocks.
- Around line 132-133: Remove the hardcoded developer path in the extra_mount
entry ("/home/yangminl/sglang:/sgl-workspace/sglang") — either delete that mount
line from the extra_mount list or replace the left-hand side with a configurable
placeholder/env var (e.g., ${SGLANG_LOCAL_PATH}) so users can override it;
ensure the YAML key extra_mount remains valid and document the new variable in
README or defaults so CI and other developers won’t rely on a personal home
directory.

In
`@recipes/gb300-fp4/128k8k/maxthroughput-ctx3_ctx_pp4_gen1_dep8_batch32_eplb0_mtp0.yaml`:
- Around line 132-133: The extra_mount entry currently contains a hardcoded,
user-specific path "/home/yangminl/sglang:/sgl-workspace/sglang"; update the
extra_mount configuration to remove or replace this hardcoded path by either
deleting the mount, substituting a project-relative/shared path (e.g.,
"./sglang:/sgl-workspace/sglang"), or parameterizing it with an environment
variable (e.g., use a placeholder like
"${SGLANG_HOST_PATH}:/sgl-workspace/sglang" and document SGLANG_HOST_PATH). Edit
the extra_mount key in the YAML to implement one of these options and ensure no
username or absolute home path remains.

In
`@recipes/gb300-fp4/128k8k/middlecurve-ctx8_ctx_pp4_gen1_dep32_batch8_eplb0_mtp0.yaml`:
- Around line 132-133: The extra_mount entry contains a hardcoded personal path
"/home/yangminl/sglang" which will break for other users; edit the extra_mount
list (key: extra_mount) to remove this user-specific mount or replace it with a
parameterized/portable value (e.g., use an environment variable placeholder like
${SG_LANG_PATH} or ${HOME}/sglang, or point to a shared well-known path) and
update any README or config docs to describe how to set that variable so the
recipe is portable.
- Around line 90-94: The filename indicates EPLB is disabled (suffix "eplb0")
but the config enables it via ep-num-redundant-experts and eplb-algorithm;
reconcile by either updating the filename to reflect EPLB enabled or disabling
EPLB in the config: change eplb-related fields (eplb-algorithm,
ep-num-redundant-experts, and any related flags like
disable-shared-experts-fusion if needed) so they reflect "EPLB off" (e.g., set
eplb-algorithm to a disabled/none value and ep-num-redundant-experts to 0) or
rename the recipe to remove the "eplb0" suffix so it accurately denotes EPLB is
active.
🧹 Nitpick comments (8)
recipes/gb300-fp4/128k8k_mtp/midcurve-ctx5_ctx_pp4_gen1_dep16_batch8_eplb0_mtp2.yaml (2)

46-49: Unresolved TODO comments — confirm whether these env vars are needed for prefill.

Lines 48–49 flag open questions (TODO: check if needed for prefill) for SGLANG_NCCL_ALL_GATHER_IN_OVERLAP_SCHEDULER_SYNC_BATCH and SGLANG_BLACKWELL_OVERLAP_SHARED_EXPERTS_OUTSIDE_SBO. Since these are currently absent from prefill_environment, please verify and resolve the TODOs before merging, or leave a tracking issue.

Would you like me to open an issue to track resolving these TODOs?


126-136: Commented-out config lines — clean up or document intent.

Lines 126, 130, and 136 contain commented-out settings (enable-symm-mem, max-total-tokens, alternative moe-runner-backend). If these are intentionally kept as reference for future tuning, a brief comment explaining why would help; otherwise, remove them to reduce noise.

recipes/gb300-fp4/128k8k/maxthroughput-ctx3_ctx_pp4_gen1_dep8_batch32_eplb0_mtp0.yaml (2)

21-66: Consider using YAML anchors to reduce environment duplication.

The decode_environment and prefill_environment blocks share ~15 identical key-value pairs, with decode adding only 3 extra vars (SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK, SGLANG_MOE_NVFP4_DISPATCH, SGLANG_NVFP4_CKPT_FP8_NEXTN_MOE). YAML anchors (& / <<: *) would eliminate the drift risk when a shared variable needs updating.


113-117: Commented-out config keys left in prefill section.

Lines 113 (enable-symm-mem) and 117 (max-total-tokens) are commented out. If these are intentionally documented as tuning knobs, a brief inline comment explaining why they're disabled would help future readers; otherwise consider removing them to keep the config clean.

recipes/gb300-fp4/128k8k/middlecurve-ctx8_ctx_pp4_gen1_dep32_batch8_eplb0_mtp0.yaml (2)

113-113: Clean up commented-out config lines before merging.

Lines 113 (# enable-symm-mem: true) and 117 (# max-total-tokens: 544000) are development remnants. If they're intentionally disabled, they can simply be removed to keep the config clean. If they may be re-enabled, consider adding a brief comment explaining why they're preserved.

Also applies to: 117-117


43-44: Minor style inconsistency: unquoted string values.

FLASHINFER_WORKSPACE_BASE and SGLANG_DG_CACHE_DIR values are bare paths while most other env vars use quoted strings. YAML handles this fine, but quoting them would be consistent with the rest of the file.

Also applies to: 65-66

recipes/gb300-fp4/128k8k_mtp/highthroughput-ctx3_ctx_pp4_gen1_dep8_batch16_eplb0_mtp1.yaml (2)

46-49: Resolve TODO comments before merging.

Lines 48–49 flag open questions about whether SGLANG_NCCL_ALL_GATHER_IN_OVERLAP_SCHEDULER_SYNC_BATCH and SGLANG_BLACKWELL_OVERLAP_SHARED_EXPERTS_OUTSIDE_SBO are needed for prefill. These are only set in the decode environment; if they are indeed needed for prefill as well, they should be added to prefill_environment. Please resolve these TODOs (or confirm they're decode-only) before this ships.

Would you like me to open an issue to track confirming and removing these TODOs?


43-44: Unquoted path values — minor inconsistency.

FLASHINFER_WORKSPACE_BASE and SGLANG_DG_CACHE_DIR are unquoted on Lines 43–44 and 70–71, while most other values in the environment blocks are quoted strings. YAML will still parse these as strings, but quoting them would be consistent with the rest of the file.

Also applies to: 70-71

Comment on lines +1 to +4
# Config: ctx3_ctx_pp4_gen1_dep8_batch16_eplb0_mtp1
# max global bs: 128, concurrency: 256

name: "gb300-fp4-128k8k-ctx3_ctx_pp4_gen1_dep8_batch16_eplb0_mtp1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Filename says eplb0 but config enables EPLB.

The filename encodes eplb0 (EPLB off), yet the decode sglang_config sets eplb-algorithm: deepseek (Line 99) and ep-num-redundant-experts: 32 (Line 97). Either the filename should be updated to reflect EPLB being on, or those config keys should be removed if EPLB is truly intended to be disabled.

🤖 Prompt for AI Agents
In
`@recipes/gb300-fp4/128k8k_mtp/highthroughput-ctx3_ctx_pp4_gen1_dep8_batch16_eplb0_mtp1.yaml`
around lines 1 - 4, The recipe filename indicates EPLB is off (eplb0) but the
sglang_config enables EPLB via the keys eplb-algorithm and
ep-num-redundant-experts; either rename the recipe to reflect EPLB on (update
name to include eplb1) or disable EPLB by removing or commenting out the
sglang_config keys eplb-algorithm and ep-num-redundant-experts (and any other
EPLB-related params) so the file name and the actual config are consistent;
locate these keys (eplb-algorithm, ep-num-redundant-experts) in the recipe and
apply one of the two options.

Comment on lines +1 to +4
# Config: ctx5_ctx_pp4_gen1_dep16_batch8_eplb0_mtp2
# max global bs: 128, concurrency: 256

name: "gb300-fp4-128k8k-ctx5_ctx_pp4_gen1_dep16_batch8_eplb0_mtp2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Possible typo in the config identifier: ctx5_ctx.

The name and filename both contain the substring ctx5_ctx_pp4, where the repeated ctx fragment looks like a copy-paste artifact. Should this be ctx5_pp4 (or a different token)?

🤖 Prompt for AI Agents
In
`@recipes/gb300-fp4/128k8k_mtp/midcurve-ctx5_ctx_pp4_gen1_dep16_batch8_eplb0_mtp2.yaml`
around lines 1 - 4, The config name contains a duplicated token "ctx5_ctx_pp4"
which looks like a copy-paste typo; update the YAML name value (the "name:"
string in this file) to the corrected identifier (e.g.,
"gb300-fp4-128k8k-ctx5_pp4_gen1_dep16_batch8_eplb0_mtp2") and ensure the recipe
filename and any other references that mirror the name use the same corrected
token ("ctx5_pp4") so the identifier is consistent across the file and
references.

Comment on lines +97 to +99
ep-num-redundant-experts: 32
disable-shared-experts-fusion: true
eplb-algorithm: deepseek
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Filename says eplb0 but config sets eplb-algorithm: deepseek.

The filename encodes eplb0, which implies EPLB is disabled or zeroed, but the decode config explicitly enables it with eplb-algorithm: deepseek and ep-num-redundant-experts: 32. Either the filename should reflect the actual EPLB setting or the config should disable EPLB to match the name.

🤖 Prompt for AI Agents
In
`@recipes/gb300-fp4/128k8k_mtp/midcurve-ctx5_ctx_pp4_gen1_dep16_batch8_eplb0_mtp2.yaml`
around lines 97 - 99, The recipe filename indicates "eplb0" but the config
enables EPLB (eplb-algorithm: deepseek) and sets ep-num-redundant-experts: 32;
update one of them to be consistent: either rename the recipe file to include
the EPLB variant (e.g., remove "eplb0" or add "eplb-deepseek") or change the
config to disable EPLB (set eplb-algorithm to "none"/"" or remove
ep-num-redundant-experts and set ep-num-redundant-experts: 0) so that
ep-num-redundant-experts and eplb-algorithm in the YAML match the filename.

Comment on lines 21 to 66
decode_environment:
TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "7200"
PYTHONUNBUFFERED: "1"
DYN_SKIP_SGLANG_LOG_FORMATTING: "1"
SGLANG_NVFP4_CKPT_FP8_GEMM_IN_ATTN: "1"
SGLANG_PER_TOKEN_GROUP_QUANT_8BIT_V2: "1"
SGLANG_DISAGGREGATION_HEARTBEAT_MAX_FAILURE: "100000"
SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT: "100000"
SGLANG_DISAGGREGATION_WAITING_TIMEOUT: "100000"
SGLANG_HACK_SEQ_BOOTSTRAP_ROOM: "1"
MC_TE_METRIC: "true"
MC_FORCE_MNNVL: "1"
NCCL_MNNVL_ENABLE: "1"
NCCL_CUMEM_ENABLE: "1"
SGLANG_MOONCAKE_CUSTOM_MEM_POOL: "True"
SGLANG_USE_MESSAGE_QUEUE_BROADCASTER: "0"
SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK: "1"
SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK: "1024"
SGLANG_MOE_NVFP4_DISPATCH: "1"
SGLANG_NVFP4_CKPT_FP8_NEXTN_MOE: "1"
SGLANG_FLASHINFER_FP4_GEMM_BACKEND: "cutlass"
FLASHINFER_DISABLE_VERSION_CHECK: "1"
FLASHINFER_WORKSPACE_BASE: /tmp/flashinfer-cache
SGLANG_DG_CACHE_DIR: /configs/deepgemm-cache

prefill_environment:
TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "7200"
PYTHONUNBUFFERED: "1"
DYN_SKIP_SGLANG_LOG_FORMATTING: "1"
SGLANG_NVFP4_CKPT_FP8_GEMM_IN_ATTN: "1"
SGLANG_PER_TOKEN_GROUP_QUANT_8BIT_V2: "1"
SGLANG_DISAGGREGATION_HEARTBEAT_MAX_FAILURE: "100000"
SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT: "100000"
SGLANG_DISAGGREGATION_WAITING_TIMEOUT: "100000"
SGLANG_HACK_SEQ_BOOTSTRAP_ROOM: "1"
MC_TE_METRIC: "true"
MC_FORCE_MNNVL: "1"
NCCL_MNNVL_ENABLE: "1"
NCCL_CUMEM_ENABLE: "1"
SGLANG_MOONCAKE_CUSTOM_MEM_POOL: "True"
SGLANG_USE_MESSAGE_QUEUE_BROADCASTER: "0"
SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK: "1"
SGLANG_FLASHINFER_FP4_GEMM_BACKEND: "cutlass"
FLASHINFER_DISABLE_VERSION_CHECK: "1"
FLASHINFER_WORKSPACE_BASE: /tmp/flashinfer-cache
SGLANG_DG_CACHE_DIR: /configs/deepgemm-cache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor inconsistency: "true" vs "True" casing in environment values.

MC_TE_METRIC uses lowercase "true" (lines 31, 56) while SGLANG_MOONCAKE_CUSTOM_MEM_POOL uses title-case "True" (lines 35, 60). If the consuming code does strict string comparison (e.g., == "true"), the title-case variant may not be recognized. Worth normalizing to a consistent casing.

🤖 Prompt for AI Agents
In
`@recipes/gb300-fp4/128k8k/highthroughput-ctx5_ctx_pp4_gen1_dep16_batch16_eplb0_mtp0.yaml`
around lines 21 - 66, The environment YAML has inconsistent boolean string
casing: MC_TE_METRIC is "true" while SGLANG_MOONCAKE_CUSTOM_MEM_POOL is "True";
update the values for SGLANG_MOONCAKE_CUSTOM_MEM_POOL in both decode_environment
and prefill_environment to the same lowercase "true" (or normalize both to
"True" if your code expects title-case) so string comparisons like MC_TE_METRIC
== "true" behave consistently; ensure you change the
SGLANG_MOONCAKE_CUSTOM_MEM_POOL entries that appear alongside MC_TE_METRIC in
both blocks.

Comment on lines +90 to +94
deepep-mode: low_latency
ep-dispatch-algorithm: static
ep-num-redundant-experts: 32
disable-shared-experts-fusion: true
eplb-algorithm: deepseek
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Filename says eplb0 but config enables EPLB.

The filename suffix eplb0 implies EPLB is disabled, yet the decode config sets eplb-algorithm: deepseek with ep-num-redundant-experts: 32. Either the filename is stale or these config values are unintended. Please reconcile.

🤖 Prompt for AI Agents
In
`@recipes/gb300-fp4/128k8k/middlecurve-ctx8_ctx_pp4_gen1_dep32_batch8_eplb0_mtp0.yaml`
around lines 90 - 94, The filename indicates EPLB is disabled (suffix "eplb0")
but the config enables it via ep-num-redundant-experts and eplb-algorithm;
reconcile by either updating the filename to reflect EPLB enabled or disabling
EPLB in the config: change eplb-related fields (eplb-algorithm,
ep-num-redundant-experts, and any related flags like
disable-shared-experts-fusion if needed) so they reflect "EPLB off" (e.g., set
eplb-algorithm to a disabled/none value and ep-num-redundant-experts to 0) or
rename the recipe to remove the "eplb0" suffix so it accurately denotes EPLB is
active.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two flags are decode only

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, removing in next patchset

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@configs/gb300-fp4-128k8k-setup.sh`:
- Around line 39-42: The direct wheel URL for sgl-kernel
(sgl_kernel-0.3.21+cu130-cp310-abi3_manylinux2014_aarch64.whl) is pinned to
aarch64 and CPython 3.10 and may install an incompatible binary; add a pre-check
that validates the host architecture and Python major/minor (e.g., check uname
-m and python -c "import sys; print(sys.version_info[:2])") before running the
pip install, and if they don’t match aarch64/3.10 either abort with a clear
error message or choose an alternative wheel/source, or at minimum add an
explicit comment documenting the assumed architecture/Python and reference
PIP_ARGS and the exact wheel filename so future changes will update this guard.
- Around line 4-10: The setup script currently hardcodes BRANCH="gb300_blog" and
replaces the remote with a personal fork via the git remote add origin
https://github.com/YAMY1234/sglang.git and then checks out origin/${BRANCH},
which risks non-reproducible and broken deployments; change the script to accept
the remote URL and branch/commit as parameters (or default to the official
upstream repo sgl-project/sglang), stop force-replacing origin blindly (use git
remote set-url or conditionally add if missing), and pin the checkout to a
specific commit SHA instead of checking out origin/${BRANCH} so the code is
reproducible (update the BRANCH variable usage and the git fetch / git checkout
steps to fetch the chosen remote and checkout the provided commit SHA).

In
`@recipes/gb300-fp4/128k8k/maxthroughput-ctx3_ctx_pp4_gen1_dep8_batch32_eplb0_mtp0.yaml`:
- Around line 1-4: The filename indicates EPLB disabled (eplb0) but the config
enables it — fix the mismatch by either renaming the recipe to use eplb1 in the
name or disabling the EPLB keys; locate and update the entries
ep-num-redundant-experts, disable-shared-experts-fusion, and eplb-algorithm
(lines shown in the diff) to be consistent with the filename: if you want EPLB
off remove or zero out ep-num-redundant-experts and reset/disable
disable-shared-experts-fusion and eplb-algorithm, or else rename the file and
top-level name string from eplb0 to eplb1 to reflect that EPLB is enabled.

In
`@recipes/gb300-fp4/128k8k/minttft-ctx1_ctx_pp4_gen1_tp4_eplb0_mtp0-chunked-32k-dynchunk.yaml`:
- Around line 1-4: The config filename contains "tp4" while the YAML name field
and top comment contain "tep4", causing a mismatch; pick the canonical shorthand
and make them consistent by either renaming the file to use "tep4" or updating
the name field string and comment to use "tp4" (edit the value of the name key
"gb300-fp4-128k8k-ctx1_ctx_pp4_gen1_tep4_batch1_eplb0_mtp0" and the top comment
accordingly), ensuring any tooling or references that parse the filename or the
name key will match.
🧹 Nitpick comments (1)
recipes/gb300-fp4/128k8k_mtp/midcurve-ctx5_ctx_pp4_gen1_dep16_batch8_eplb0_mtp2.yaml (1)

19-66: Decode and prefill environment blocks share ~15 identical variables.

Consider extracting common environment variables into a shared/base section (if supported by the deployment framework) to reduce duplication and drift risk. Not blocking, just a maintainability suggestion.

Comment on lines +4 to +10
BRANCH="gb300_blog"

cd /sgl-workspace/sglang
git remote remove origin
git remote add origin https://github.com/YAMY1234/sglang.git
git fetch origin
git checkout origin/${BRANCH}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Setup script points origin to a personal fork — this should not be merged as-is.

Replacing the container's origin remote with https://github.com/YAMY1234/sglang.git ties every deployment to a personal fork. If this fork diverges, is force-pushed, or is deleted, all environments running this setup will break or pull unexpected code.

Additionally, checking out origin/${BRANCH} (a branch tip) without pinning a specific commit SHA makes this non-reproducible.

Consider:

  1. Using the upstream sgl-project/sglang repository, or at minimum parameterizing the remote URL.
  2. Pinning a specific commit hash instead of a branch name.
Suggested improvement
-BRANCH="gb300_blog"
+REPO_URL="${SGLANG_REPO_URL:-https://github.com/sgl-project/sglang.git}"
+COMMIT="${SGLANG_COMMIT:-<pin-a-specific-sha>}"

 cd /sgl-workspace/sglang
 git remote remove origin
-git remote add origin https://github.com/YAMY1234/sglang.git
+git remote add origin "$REPO_URL"
 git fetch origin
-git checkout origin/${BRANCH}
+git checkout "$COMMIT"
🤖 Prompt for AI Agents
In `@configs/gb300-fp4-128k8k-setup.sh` around lines 4 - 10, The setup script
currently hardcodes BRANCH="gb300_blog" and replaces the remote with a personal
fork via the git remote add origin https://github.com/YAMY1234/sglang.git and
then checks out origin/${BRANCH}, which risks non-reproducible and broken
deployments; change the script to accept the remote URL and branch/commit as
parameters (or default to the official upstream repo sgl-project/sglang), stop
force-replacing origin blindly (use git remote set-url or conditionally add if
missing), and pin the checkout to a specific commit SHA instead of checking out
origin/${BRANCH} so the code is reproducible (update the BRANCH variable usage
and the git fetch / git checkout steps to fetch the chosen remote and checkout
the provided commit SHA).

Comment on lines +39 to +42
# Install sgl-kernel 0.3.21 for CUDA 13.0 (aarch64)
echo "Installing sgl-kernel 0.3.21..."
pip install https://github.com/sgl-project/whl/releases/download/v0.3.21/sgl_kernel-0.3.21+cu130-cp310-abi3-manylinux2014_aarch64.whl --force-reinstall $PIP_ARGS
echo "sgl-kernel 0.3.21 installed"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Direct wheel URL for sgl-kernel is architecture- and Python-version-specific.

The wheel sgl_kernel-0.3.21+cu130-cp310-abi3-manylinux2014_aarch64.whl is pinned to aarch64 and cp310 ABI. If the container ever changes Python version or architecture, this will silently install an incompatible binary. Consider adding a guard or comment documenting these assumptions.

🤖 Prompt for AI Agents
In `@configs/gb300-fp4-128k8k-setup.sh` around lines 39 - 42, The direct wheel URL
for sgl-kernel (sgl_kernel-0.3.21+cu130-cp310-abi3_manylinux2014_aarch64.whl) is
pinned to aarch64 and CPython 3.10 and may install an incompatible binary; add a
pre-check that validates the host architecture and Python major/minor (e.g.,
check uname -m and python -c "import sys; print(sys.version_info[:2])") before
running the pip install, and if they don’t match aarch64/3.10 either abort with
a clear error message or choose an alternative wheel/source, or at minimum add
an explicit comment documenting the assumed architecture/Python and reference
PIP_ARGS and the exact wheel filename so future changes will update this guard.

Comment on lines +1 to +4
# Config: ctx3_ctx_pp4_gen1_dep8_batch32_eplb0_mtp0
# max global bs: 256, concurrency: 512

name: "gb300-fp4-128k8k-ctx3_ctx_pp4_gen1_dep8_batch32_eplb0_mtp0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Filename encodes eplb0 but decode config enables EPLB.

Lines 87-89 set ep-num-redundant-experts: 32, disable-shared-experts-fusion: true, and eplb-algorithm: deepseek, which all indicate EPLB is active. The filename eplb0 suggests it's off. This mismatch is present across multiple YAML files in this PR (maxthroughput, highthroughput, middlecurve). Please reconcile — either rename the files to eplb1 or remove the EPLB-related keys.

Also applies to: 85-89

🤖 Prompt for AI Agents
In
`@recipes/gb300-fp4/128k8k/maxthroughput-ctx3_ctx_pp4_gen1_dep8_batch32_eplb0_mtp0.yaml`
around lines 1 - 4, The filename indicates EPLB disabled (eplb0) but the config
enables it — fix the mismatch by either renaming the recipe to use eplb1 in the
name or disabling the EPLB keys; locate and update the entries
ep-num-redundant-experts, disable-shared-experts-fusion, and eplb-algorithm
(lines shown in the diff) to be consistent with the filename: if you want EPLB
off remove or zero out ep-num-redundant-experts and reset/disable
disable-shared-experts-fusion and eplb-algorithm, or else rename the file and
top-level name string from eplb0 to eplb1 to reflect that EPLB is enabled.

Comment on lines +1 to +4
# Config: ctx1_ctx_pp4_gen1_tep4_batch1_eplb0_mtp0
# chunked prefill size: 32000 (dynamic chunking enabled)

name: "gb300-fp4-128k8k-ctx1_ctx_pp4_gen1_tep4_batch1_eplb0_mtp0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Filename says tp4 but the name field and comment say tep4 — which is correct?

The filename uses tp4 while Line 1 and Line 4 both use tep4. If tep4 is a shorthand for tensor+expert parallel, this could confuse tooling or humans grepping by filename vs. the name key. Please reconcile so the filename and config name match.

🤖 Prompt for AI Agents
In
`@recipes/gb300-fp4/128k8k/minttft-ctx1_ctx_pp4_gen1_tp4_eplb0_mtp0-chunked-32k-dynchunk.yaml`
around lines 1 - 4, The config filename contains "tp4" while the YAML name field
and top comment contain "tep4", causing a mismatch; pick the canonical shorthand
and make them consistent by either renaming the file to use "tep4" or updating
the name field string and comment to use "tp4" (edit the value of the name key
"gb300-fp4-128k8k-ctx1_ctx_pp4_gen1_tep4_batch1_eplb0_mtp0" and the top comment
accordingly), ensuring any tooling or references that parse the filename or the
name key will match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments