[NVIDIA] Qwen3.5 h200 sglang MTP by hshrivastava-droid · Pull Request #1001 · SemiAnalysisAI/InferenceX

hshrivastava-droid · 2026-04-03T19:50:45Z

Summary

Add benchmark configuration and script for Qwen3.5-397B-A17B-FP8 on H200 with SGLang MTP (Multi-Token Prediction) using EAGLE speculative decoding.

Changes

New benchmark script: `benchmarks/single_node/qwen3.5_fp8_h200_mtp.sh`

SGLang launch server with EAGLE speculative decoding:
- --speculative-algorithm EAGLE
- --speculative-num-steps 3
- --speculative-num-draft-tokens 4
- --speculative-eagle-topk 1
FP8 quantization with FP8 E4M3 KV cache
FlashInfer attention backend with allreduce fusion enabled
Chunked prefill (16384), radix cache disabled
Supports TP8/EP8 with --expert-parallel-size

New config: `qwen3.5-fp8-h200-sglang-mtp` in `nvidia-master.yaml`

Image: lmsysorg/sglang:v0.5.9-cu129-amd64
Model: Qwen/Qwen3.5-397B-A17B-FP8
Two sequence length configs: 1024/1024 and 8192/1024
Search space: TP8, EP8, concurrency 4–128 with MTP spec-decoding

`perf-changelog.yaml`

Added changelog entry for the new qwen3.5-fp8-h200-sglang-mtp config

Updated PR link for Qwen3.5-397B-A17B-FP8 H200 SGLang MTP entry.

kedarpotdar-nv · 2026-04-06T16:54:48Z

@claude add a PR description based on changes

Klaud-Cold · 2026-04-06T16:55:12Z

Claude finished @kedarpotdar-nv's task in 46s —— View job

Adding PR Description

Gather context from PR changes
Write and update PR description

PR description has been added with a summary of all changes:

New benchmark script for Qwen3.5-397B-A17B-FP8 on H200 with SGLang MTP (EAGLE speculative decoding)
New qwen3.5-fp8-h200-sglang-mtp config in nvidia-master.yaml (TP8/EP8, conc 4–128, two seq-len configs)
Perf-changelog entry for the new config

Oseltamivir

LGTM, Recipe in https://cookbook.sglang.io/autoregressive/Qwen/Qwen3.5

hshrivastava-droid added 2 commits April 2, 2026 21:24

add: qwen3.5 MTP

0fe7931

Change PR link in perf-changelog.yaml

545ece0

Updated PR link for Qwen3.5-397B-A17B-FP8 H200 SGLang MTP entry.

hshrivastava-droid requested a review from a team April 3, 2026 19:50

hshrivastava-droid requested review from jgangani and kedarpotdar-nv as code owners April 3, 2026 19:50

github-project-automation bot added this to InferenceMAX Board Apr 3, 2026

Merge branch 'main' into nv/qwen35_h200_v2

041dbe4

hshrivastava-droid added NVIDIA sweep-enabled labels Apr 3, 2026

update PR number

6a2d113

kedarpotdar-nv changed the title ~~[WIP] Qwen3.5 h200 sglang MTP~~ Qwen3.5 h200 sglang MTP Apr 6, 2026

hshrivastava-droid added 2 commits April 6, 2026 13:20

Merge branch 'main' into nv/qwen35_h200_v2

58af92d

Add Qwen3.5-397B-A17B-FP8 H200 SGLang MTP entry

269eca6

kedarpotdar-nv approved these changes Apr 6, 2026

View reviewed changes

Oseltamivir approved these changes Apr 6, 2026

View reviewed changes

hshrivastava-droid merged commit 51e31df into main Apr 6, 2026
21 of 22 checks passed

hshrivastava-droid deleted the nv/qwen35_h200_v2 branch April 6, 2026 22:56

github-project-automation bot moved this to Done in InferenceMAX Board Apr 6, 2026

JordanNanos mentioned this pull request Apr 8, 2026

feat: MI300X disaggregated inference with Broadcom IBGDA (#982) #998

Open

cquil11 changed the title ~~Qwen3.5 h200 sglang MTP~~ [NVIDIA] Qwen3.5 h200 sglang MTP Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVIDIA] Qwen3.5 h200 sglang MTP#1001

[NVIDIA] Qwen3.5 h200 sglang MTP#1001
hshrivastava-droid merged 6 commits intomainfrom
nv/qwen35_h200_v2

hshrivastava-droid commented Apr 3, 2026 •

edited by Klaud-Cold

Loading

Uh oh!

kedarpotdar-nv commented Apr 6, 2026

Uh oh!

Klaud-Cold commented Apr 6, 2026 •

edited

Loading

Uh oh!

Oseltamivir left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hshrivastava-droid commented Apr 3, 2026 • edited by Klaud-Cold Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

New benchmark script: benchmarks/single_node/qwen3.5_fp8_h200_mtp.sh

New config: qwen3.5-fp8-h200-sglang-mtp in nvidia-master.yaml

perf-changelog.yaml

Uh oh!

kedarpotdar-nv commented Apr 6, 2026

Uh oh!

Klaud-Cold commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Adding PR Description

Uh oh!

Oseltamivir left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hshrivastava-droid commented Apr 3, 2026 •

edited by Klaud-Cold

Loading

New benchmark script: `benchmarks/single_node/qwen3.5_fp8_h200_mtp.sh`

New config: `qwen3.5-fp8-h200-sglang-mtp` in `nvidia-master.yaml`

`perf-changelog.yaml`

Klaud-Cold commented Apr 6, 2026 •

edited

Loading