claude: add workload collection skill by yyihuang · Pull Request #182 · flashinfer-ai/flashinfer-bench

yyihuang · 2026-02-03T02:03:09Z

No description provided.

coderabbitai · 2026-02-03T02:03:19Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

🔍 Trigger a full review

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-02-03T02:03:27Z

Summary of Changes

Hello @yyihuang, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant new capability by adding an automated skill for collecting real-world inference workloads. This skill streamlines the process of generating diverse and representative datasets for FlashInfer kernels by integrating SGLang inference with FlashInfer's advanced logging, processing the raw tensor data, and automatically proposing these workloads to a public dataset repository. This enhancement is crucial for improving benchmarking, validation, and optimization efforts for FlashInfer.

Highlights

New collect-workloads Skill Introduced: An automated skill has been added to collect real-world workloads from SGLang inference runs, leveraging FlashInfer Level 10 logging to dump tensors.
Automated Workload Processing and Submission: The new skill handles the entire pipeline from tensor dumping and sanitization (according to kernel definitions) to the automatic submission of a pull request to the flashinfer-ai/flashinfer-trace HuggingFace dataset repository.
Updated Guidance for Model Constant Sourcing: Documentation in add-reference-tests/SKILL.md and extract-kernel-definitions/SKILL.md has been revised to prioritize HuggingFace model pages as the authoritative source for model constants, with SGLang used for runtime-specific values.
Integration into Main Workflow and Documentation: The CLAUDE.md document has been updated to include the new collect-workloads skill in the recommended workflow and provides a detailed description of its functionality and parameters.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

.claude/skills/add-reference-tests/SKILL.md
- Updated the 'For Model Constants' section to refer to HuggingFace + SGLang, linking to extract-kernel-definitions for detailed guidance.
.claude/skills/collect-workloads/SKILL.md
- Added a new skill definition file for collect-workloads, detailing its description, usage, parameters, prerequisites, workflow phases (Environment Setup, Logging Configuration, SGLang Inference, Tensor Dump Processing, Workload Sanitization, PR Submission), output format, implementation steps, dataset format, kernel mapping, error handling, integration with other skills, advanced usage, notes, troubleshooting, and references.
.claude/skills/extract-kernel-definitions/SKILL.md
- Modified the 'For Model Constants' section to emphasize HuggingFace model pages as the primary source for authoritative model constants and SGLang for runtime-specific constants.
CLAUDE.md
- Added a new step '4. Collect real-world workloads from inference runs' to the quickstart guide.
- Inserted an important note regarding sourcing model constants from HuggingFace model pages in the 'Map Modules to Definitions' section.
- Added a new section describing the collect-workloads skill, including its parameters, output, and workflow.
- Updated the 'References' section to include FlashInfer Logging API and the flashinfer-ai/flashinfer-trace dataset.
- Updated the 'Example Workflow' to include the collect-workloads step and its expected output.

Activity

The author yyihuang created this pull request to introduce a new skill for workload collection.
The pull request adds a comprehensive new skill definition and updates existing documentation to reflect this new capability and refine guidance on model constant sourcing.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new collect-workloads skill, which is a significant addition for automating the collection of real-world workloads for benchmarking. The skill is well-documented in the new SKILL.md file. The changes also include updates to other documentation files to incorporate this new skill and to refine the process for sourcing model constants, which improves clarity and robustness. My review focuses on the new skill's implementation details as described in the documentation, suggesting improvements for robustness and correctness in the shell commands.

gemini-code-assist · 2026-02-03T02:05:40Z

.claude/skills/collect-workloads/SKILL.md

+       cp ../../flashinfer-bench/flashinfer_trace/workloads/$op_type/$def_name.jsonl \
+          workloads/$op_type/$def_name.jsonl


The source path for the cp command appears to be incorrect. Given that the script's working directory is tmp/flashinfer-trace, the path ../../flashinfer-bench/flashinfer_trace/... seems to contain a redundant flashinfer-bench/ segment. The path should likely be relative from the repository root.

Suggested change

cp ../../flashinfer-bench/flashinfer_trace/workloads/$op_type/$def_name.jsonl \

workloads/$op_type/$def_name.jsonl

cp ../../flashinfer_trace/workloads/$op_type/$def_name.jsonl \

workloads/$op_type/$def_name.jsonl

gemini-code-assist · 2026-02-03T02:05:40Z

.claude/skills/collect-workloads/SKILL.md

+       cp -r ../../flashinfer-bench/flashinfer_trace/workloads/tensors/* \
+             workloads/tensors/


Similar to the previous comment, the source path for copying tensor files seems incorrect due to the redundant flashinfer-bench/ segment in the path.

Suggested change

cp -r ../../flashinfer-bench/flashinfer_trace/workloads/tensors/* \

workloads/tensors/

cp -r ../../flashinfer_trace/workloads/tensors/* \

workloads/tensors/

gemini-code-assist · 2026-02-03T02:05:40Z

.claude/skills/collect-workloads/SKILL.md

+   # Wait for server to be ready
+   sleep 30


Using a fixed sleep 30 is unreliable. The server might take longer to start on a slow machine or under load, or it might start faster, making the script wait unnecessarily. A more robust approach is to poll an endpoint on the server in a loop until it's ready.

Suggested change

# Wait for server to be ready

sleep 30

# Wait for server to be ready

echo "Waiting for SGLang server to be ready..."

for i in {1..60}; do

if curl -s --fail "http://localhost:30000/v1/models" > /dev/null; then

echo "Server is ready."

break

fi

if [ $i -eq 60 ]; then

echo "Server did not start within 60 seconds." >&2

exit 1

fi

sleep 1

done

gemini-code-assist · 2026-02-03T02:05:40Z

.claude/skills/collect-workloads/SKILL.md

+3. **Shutdown Server**:
+   ```bash
+   # Gracefully shutdown SGLang server
+   pkill -f "sglang.launch_server"


pkill -f can be risky as it might terminate other unrelated processes if their command line matches the pattern. A safer way is to store the Process ID (PID) of the background server process and use kill with the specific PID.

When launching the server (around line 113), you can capture the PID:

python -m sglang.launch_server ... & SERVER_PID=$!

Then, you can use it here to shut down the server.

Suggested change

pkill -f "sglang.launch_server"

kill $SERVER_PID

gemini-code-assist · 2026-02-03T02:05:40Z

.claude/skills/collect-workloads/SKILL.md

+
+1. **Locate Dump Directory**:
+   ```bash
+   DUMP_DIR=$(ls -td workload_dumps_* | head -1)


Instead of using ls -td to find the dump directory, you can directly use the FLASHINFER_DUMP_DIR environment variable that was set earlier (around line 86). This is more robust and avoids potential race conditions or errors if other directories with similar names exist.

Suggested change

DUMP_DIR=$(ls -td workload_dumps_* | head -1)

DUMP_DIR="$FLASHINFER_DUMP_DIR"

gemini-code-assist · 2026-02-03T02:05:40Z

.claude/skills/collect-workloads/SKILL.md

+           tensor = tensors[input_name]
+
+           # Decision: random vs saved tensor
+           if tensor.numel() < 262144:  # < 1MB for fp16


The value 262144 is a magic number. While the comment explains it, in a real script it would be better practice to define it as a named constant for clarity and easier maintenance, e.g., MAX_ELEMENTS_FOR_RANDOM = 1 * 1024 * 1024 // 2. This would make the code more readable and easier to modify if the threshold needs to be changed.

gemini-code-assist · 2026-02-03T02:05:40Z

.claude/skills/collect-workloads/SKILL.md

+/clone-repos
+
+# 2. Extract kernel definitions from model
+/extract-kernel-definitions --model-name deepseek_v3


There appears to be a typo in the model name. To maintain consistency with other examples in this file and the likely model identifier, it should be deepseek-v3 instead of deepseek_v3.

Suggested change

/extract-kernel-definitions --model-name deepseek_v3

/extract-kernel-definitions --model-name deepseek-v3

gemini-code-assist

Code Review

This pull request introduces a new and well-documented collect-workloads skill, which automates the process of collecting real-world workloads from SGLang inference runs. This is a valuable addition for streamlining the benchmarking and testing pipeline. The changes also include updates to other documentation to centralize and clarify the process of sourcing model constants. My review focuses on ensuring the correctness and robustness of the shell commands within the new skill's documentation, and I've identified a critical path issue and a minor inconsistency that should be addressed.

gemini-code-assist · 2026-02-03T02:05:51Z

.claude/skills/collect-workloads/SKILL.md

+   # Copy new/updated workload JSONL files
+   for def_name in $COLLECTED_DEFINITIONS; do
+       op_type=$(get_op_type $def_name)
+       cp ../../flashinfer-bench/flashinfer_trace/workloads/$op_type/$def_name.jsonl \
+          workloads/$op_type/$def_name.jsonl
+   done
+
+   # Copy any new tensor safetensors files
+   if [ -d "../../flashinfer-bench/flashinfer_trace/workloads/tensors/" ]; then
+       cp -r ../../flashinfer-bench/flashinfer_trace/workloads/tensors/* \
+             workloads/tensors/
+   fi
+   ```


The source paths in these cp commands appear to be incorrect. Assuming the script is run from the repository root, after cd tmp/flashinfer-trace, the relative path to the flashinfer_trace directory at the root would be ../../flashinfer_trace, not ../../flashinfer-bench/flashinfer_trace. The extra flashinfer-bench/ directory in the path will likely cause the command to fail.

Suggested change

# Copy new/updated workload JSONL files

for def_name in $COLLECTED_DEFINITIONS; do

op_type=$(get_op_type $def_name)

cp ../../flashinfer-bench/flashinfer_trace/workloads/$op_type/$def_name.jsonl \

workloads/$op_type/$def_name.jsonl

done

# Copy any new tensor safetensors files

if [ -d "../../flashinfer-bench/flashinfer_trace/workloads/tensors/" ]; then

cp -r ../../flashinfer-bench/flashinfer_trace/workloads/tensors/* \

workloads/tensors/

fi

```

# Copy new/updated workload JSONL files

for def_name in $COLLECTED_DEFINITIONS; do

op_type=$(get_op_type $def_name)

cp ../../flashinfer_trace/workloads/$op_type/$def_name.jsonl \

workloads/$op_type/$def_name.jsonl

done

# Copy any new tensor safetensors files

if [ -d "../../flashinfer_trace/workloads/tensors/" ]; then

cp -r ../../flashinfer_trace/workloads/tensors/* \

workloads/tensors/

fi

gemini-code-assist · 2026-02-03T02:05:51Z

.claude/skills/collect-workloads/SKILL.md

+
+1. **Locate Dump Directory**:
+   ```bash
+   DUMP_DIR=$(ls -td workload_dumps_* | head -1)


This command to find the latest dump directory is fragile. If other directories matching workload_dumps_* are created by other processes concurrently, this could pick the wrong one. A more robust approach would be to store the directory name in a variable when it's created in Phase 2 and reuse it here. For example, in Phase 2, you could do DUMP_DIR_NAME=\"workload_dumps_$(date +%Y%m%d_%H%M%S)\" and export FLASHINFER_DUMP_DIR=./${DUMP_DIR_NAME}, then use $DUMP_DIR_NAME here directly.

gemini-code-assist · 2026-02-03T02:05:51Z

.claude/skills/collect-workloads/SKILL.md

+/clone-repos
+
+# 2. Extract kernel definitions from model
+/extract-kernel-definitions --model-name deepseek_v3


There's an inconsistency in the model name used for deepseek. Here it's deepseek_v3, but elsewhere in this document (e.g., lines 26, 43, 413) and in CLAUDE.md, it's deepseek-v3. Using a consistent naming convention (with a hyphen) will prevent confusion and potential errors.

Suggested change

/extract-kernel-definitions --model-name deepseek_v3

/extract-kernel-definitions --model-name deepseek-v3

yyihuang added 3 commits January 30, 2026 17:42

upd

83dd7c4

Merge branch 'main' of github.com:flashinfer-ai/flashinfer-bench

18beb4d

init draft

bd98064

gemini-code-assist bot reviewed Feb 3, 2026

View reviewed changes

		cp ../../flashinfer-bench/flashinfer_trace/workloads/$op_type/$def_name.jsonl \
		workloads/$op_type/$def_name.jsonl

		cp -r ../../flashinfer-bench/flashinfer_trace/workloads/tensors/* \
		workloads/tensors/

-   # Wait for server to be ready
-   sleep 30
+   # Wait for server to be ready
+   echo "Waiting for SGLang server to be ready..."
+   for i in {1..60}; do
+     if curl -s --fail "http://localhost:30000/v1/models" > /dev/null; then
+       echo "Server is ready."
+       break
+     fi
+     if [ $i -eq 60 ]; then
+       echo "Server did not start within 60 seconds." >&2
+       exit 1
+     fi
+     sleep 1
+   done

	DUMP_DIR=$(ls -td workload_dumps_* \| head -1)
	DUMP_DIR="$FLASHINFER_DUMP_DIR"

	/extract-kernel-definitions --model-name deepseek_v3
	/extract-kernel-definitions --model-name deepseek-v3

Comments

Conversation

yyihuang commented Feb 3, 2026

Uh oh!

coderabbitai bot commented Feb 3, 2026

Review skipped

Uh oh!

gemini-code-assist bot commented Feb 3, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant