Skip to content

Comments

claude: add workload collection skill#182

Draft
yyihuang wants to merge 3 commits intoflashinfer-ai:mainfrom
yyihuang:workload_collect_skill
Draft

claude: add workload collection skill#182
yyihuang wants to merge 3 commits intoflashinfer-ai:mainfrom
yyihuang:workload_collect_skill

Conversation

@yyihuang
Copy link
Contributor

@yyihuang yyihuang commented Feb 3, 2026

No description provided.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 3, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

  • 🔍 Trigger a full review
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @yyihuang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant new capability by adding an automated skill for collecting real-world inference workloads. This skill streamlines the process of generating diverse and representative datasets for FlashInfer kernels by integrating SGLang inference with FlashInfer's advanced logging, processing the raw tensor data, and automatically proposing these workloads to a public dataset repository. This enhancement is crucial for improving benchmarking, validation, and optimization efforts for FlashInfer.

Highlights

  • New collect-workloads Skill Introduced: An automated skill has been added to collect real-world workloads from SGLang inference runs, leveraging FlashInfer Level 10 logging to dump tensors.
  • Automated Workload Processing and Submission: The new skill handles the entire pipeline from tensor dumping and sanitization (according to kernel definitions) to the automatic submission of a pull request to the flashinfer-ai/flashinfer-trace HuggingFace dataset repository.
  • Updated Guidance for Model Constant Sourcing: Documentation in add-reference-tests/SKILL.md and extract-kernel-definitions/SKILL.md has been revised to prioritize HuggingFace model pages as the authoritative source for model constants, with SGLang used for runtime-specific values.
  • Integration into Main Workflow and Documentation: The CLAUDE.md document has been updated to include the new collect-workloads skill in the recommended workflow and provides a detailed description of its functionality and parameters.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • .claude/skills/add-reference-tests/SKILL.md
    • Updated the 'For Model Constants' section to refer to HuggingFace + SGLang, linking to extract-kernel-definitions for detailed guidance.
  • .claude/skills/collect-workloads/SKILL.md
    • Added a new skill definition file for collect-workloads, detailing its description, usage, parameters, prerequisites, workflow phases (Environment Setup, Logging Configuration, SGLang Inference, Tensor Dump Processing, Workload Sanitization, PR Submission), output format, implementation steps, dataset format, kernel mapping, error handling, integration with other skills, advanced usage, notes, troubleshooting, and references.
  • .claude/skills/extract-kernel-definitions/SKILL.md
    • Modified the 'For Model Constants' section to emphasize HuggingFace model pages as the primary source for authoritative model constants and SGLang for runtime-specific constants.
  • CLAUDE.md
    • Added a new step '4. Collect real-world workloads from inference runs' to the quickstart guide.
    • Inserted an important note regarding sourcing model constants from HuggingFace model pages in the 'Map Modules to Definitions' section.
    • Added a new section describing the collect-workloads skill, including its parameters, output, and workflow.
    • Updated the 'References' section to include FlashInfer Logging API and the flashinfer-ai/flashinfer-trace dataset.
    • Updated the 'Example Workflow' to include the collect-workloads step and its expected output.
Activity
  • The author yyihuang created this pull request to introduce a new skill for workload collection.
  • The pull request adds a comprehensive new skill definition and updates existing documentation to reflect this new capability and refine guidance on model constant sourcing.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new collect-workloads skill, which is a significant addition for automating the collection of real-world workloads for benchmarking. The skill is well-documented in the new SKILL.md file. The changes also include updates to other documentation files to incorporate this new skill and to refine the process for sourcing model constants, which improves clarity and robustness. My review focuses on the new skill's implementation details as described in the documentation, suggesting improvements for robustness and correctness in the shell commands.

Comment on lines +320 to +321
cp ../../flashinfer-bench/flashinfer_trace/workloads/$op_type/$def_name.jsonl \
workloads/$op_type/$def_name.jsonl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The source path for the cp command appears to be incorrect. Given that the script's working directory is tmp/flashinfer-trace, the path ../../flashinfer-bench/flashinfer_trace/... seems to contain a redundant flashinfer-bench/ segment. The path should likely be relative from the repository root.

Suggested change
cp ../../flashinfer-bench/flashinfer_trace/workloads/$op_type/$def_name.jsonl \
workloads/$op_type/$def_name.jsonl
cp ../../flashinfer_trace/workloads/$op_type/$def_name.jsonl \
workloads/$op_type/$def_name.jsonl

Comment on lines +326 to +327
cp -r ../../flashinfer-bench/flashinfer_trace/workloads/tensors/* \
workloads/tensors/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the previous comment, the source path for copying tensor files seems incorrect due to the redundant flashinfer-bench/ segment in the path.

Suggested change
cp -r ../../flashinfer-bench/flashinfer_trace/workloads/tensors/* \
workloads/tensors/
cp -r ../../flashinfer_trace/workloads/tensors/* \
workloads/tensors/

Comment on lines +123 to +124
# Wait for server to be ready
sleep 30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a fixed sleep 30 is unreliable. The server might take longer to start on a slow machine or under load, or it might start faster, making the script wait unnecessarily. A more robust approach is to poll an endpoint on the server in a loop until it's ready.

Suggested change
# Wait for server to be ready
sleep 30
# Wait for server to be ready
echo "Waiting for SGLang server to be ready..."
for i in {1..60}; do
if curl -s --fail "http://localhost:30000/v1/models" > /dev/null; then
echo "Server is ready."
break
fi
if [ $i -eq 60 ]; then
echo "Server did not start within 60 seconds." >&2
exit 1
fi
sleep 1
done

3. **Shutdown Server**:
```bash
# Gracefully shutdown SGLang server
pkill -f "sglang.launch_server"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

pkill -f can be risky as it might terminate other unrelated processes if their command line matches the pattern. A safer way is to store the Process ID (PID) of the background server process and use kill with the specific PID.

When launching the server (around line 113), you can capture the PID:

python -m sglang.launch_server ... &
SERVER_PID=$!

Then, you can use it here to shut down the server.

Suggested change
pkill -f "sglang.launch_server"
kill $SERVER_PID


1. **Locate Dump Directory**:
```bash
DUMP_DIR=$(ls -td workload_dumps_* | head -1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Instead of using ls -td to find the dump directory, you can directly use the FLASHINFER_DUMP_DIR environment variable that was set earlier (around line 86). This is more robust and avoids potential race conditions or errors if other directories with similar names exist.

Suggested change
DUMP_DIR=$(ls -td workload_dumps_* | head -1)
DUMP_DIR="$FLASHINFER_DUMP_DIR"

tensor = tensors[input_name]

# Decision: random vs saved tensor
if tensor.numel() < 262144: # < 1MB for fp16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The value 262144 is a magic number. While the comment explains it, in a real script it would be better practice to define it as a named constant for clarity and easier maintenance, e.g., MAX_ELEMENTS_FOR_RANDOM = 1 * 1024 * 1024 // 2. This would make the code more readable and easier to modify if the threshold needs to be changed.

/clone-repos

# 2. Extract kernel definitions from model
/extract-kernel-definitions --model-name deepseek_v3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There appears to be a typo in the model name. To maintain consistency with other examples in this file and the likely model identifier, it should be deepseek-v3 instead of deepseek_v3.

Suggested change
/extract-kernel-definitions --model-name deepseek_v3
/extract-kernel-definitions --model-name deepseek-v3

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new and well-documented collect-workloads skill, which automates the process of collecting real-world workloads from SGLang inference runs. This is a valuable addition for streamlining the benchmarking and testing pipeline. The changes also include updates to other documentation to centralize and clarify the process of sourcing model constants. My review focuses on ensuring the correctness and robustness of the shell commands within the new skill's documentation, and I've identified a critical path issue and a minor inconsistency that should be addressed.

Comment on lines +317 to +329
# Copy new/updated workload JSONL files
for def_name in $COLLECTED_DEFINITIONS; do
op_type=$(get_op_type $def_name)
cp ../../flashinfer-bench/flashinfer_trace/workloads/$op_type/$def_name.jsonl \
workloads/$op_type/$def_name.jsonl
done

# Copy any new tensor safetensors files
if [ -d "../../flashinfer-bench/flashinfer_trace/workloads/tensors/" ]; then
cp -r ../../flashinfer-bench/flashinfer_trace/workloads/tensors/* \
workloads/tensors/
fi
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The source paths in these cp commands appear to be incorrect. Assuming the script is run from the repository root, after cd tmp/flashinfer-trace, the relative path to the flashinfer_trace directory at the root would be ../../flashinfer_trace, not ../../flashinfer-bench/flashinfer_trace. The extra flashinfer-bench/ directory in the path will likely cause the command to fail.

Suggested change
# Copy new/updated workload JSONL files
for def_name in $COLLECTED_DEFINITIONS; do
op_type=$(get_op_type $def_name)
cp ../../flashinfer-bench/flashinfer_trace/workloads/$op_type/$def_name.jsonl \
workloads/$op_type/$def_name.jsonl
done
# Copy any new tensor safetensors files
if [ -d "../../flashinfer-bench/flashinfer_trace/workloads/tensors/" ]; then
cp -r ../../flashinfer-bench/flashinfer_trace/workloads/tensors/* \
workloads/tensors/
fi
```
# Copy new/updated workload JSONL files
for def_name in $COLLECTED_DEFINITIONS; do
op_type=$(get_op_type $def_name)
cp ../../flashinfer_trace/workloads/$op_type/$def_name.jsonl \
workloads/$op_type/$def_name.jsonl
done
# Copy any new tensor safetensors files
if [ -d "../../flashinfer_trace/workloads/tensors/" ]; then
cp -r ../../flashinfer_trace/workloads/tensors/* \
workloads/tensors/
fi


1. **Locate Dump Directory**:
```bash
DUMP_DIR=$(ls -td workload_dumps_* | head -1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This command to find the latest dump directory is fragile. If other directories matching workload_dumps_* are created by other processes concurrently, this could pick the wrong one. A more robust approach would be to store the directory name in a variable when it's created in Phase 2 and reuse it here. For example, in Phase 2, you could do DUMP_DIR_NAME=\"workload_dumps_$(date +%Y%m%d_%H%M%S)\" and export FLASHINFER_DUMP_DIR=./${DUMP_DIR_NAME}, then use $DUMP_DIR_NAME here directly.

/clone-repos

# 2. Extract kernel definitions from model
/extract-kernel-definitions --model-name deepseek_v3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's an inconsistency in the model name used for deepseek. Here it's deepseek_v3, but elsewhere in this document (e.g., lines 26, 43, 413) and in CLAUDE.md, it's deepseek-v3. Using a consistent naming convention (with a hyphen) will prevent confusion and potential errors.

Suggested change
/extract-kernel-definitions --model-name deepseek_v3
/extract-kernel-definitions --model-name deepseek-v3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant