claude: add workload collection skill#182
claude: add workload collection skill#182yyihuang wants to merge 3 commits intoflashinfer-ai:mainfrom
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @yyihuang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant new capability by adding an automated skill for collecting real-world inference workloads. This skill streamlines the process of generating diverse and representative datasets for FlashInfer kernels by integrating SGLang inference with FlashInfer's advanced logging, processing the raw tensor data, and automatically proposing these workloads to a public dataset repository. This enhancement is crucial for improving benchmarking, validation, and optimization efforts for FlashInfer. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new collect-workloads skill, which is a significant addition for automating the collection of real-world workloads for benchmarking. The skill is well-documented in the new SKILL.md file. The changes also include updates to other documentation files to incorporate this new skill and to refine the process for sourcing model constants, which improves clarity and robustness. My review focuses on the new skill's implementation details as described in the documentation, suggesting improvements for robustness and correctness in the shell commands.
| cp ../../flashinfer-bench/flashinfer_trace/workloads/$op_type/$def_name.jsonl \ | ||
| workloads/$op_type/$def_name.jsonl |
There was a problem hiding this comment.
The source path for the cp command appears to be incorrect. Given that the script's working directory is tmp/flashinfer-trace, the path ../../flashinfer-bench/flashinfer_trace/... seems to contain a redundant flashinfer-bench/ segment. The path should likely be relative from the repository root.
| cp ../../flashinfer-bench/flashinfer_trace/workloads/$op_type/$def_name.jsonl \ | |
| workloads/$op_type/$def_name.jsonl | |
| cp ../../flashinfer_trace/workloads/$op_type/$def_name.jsonl \ | |
| workloads/$op_type/$def_name.jsonl |
| cp -r ../../flashinfer-bench/flashinfer_trace/workloads/tensors/* \ | ||
| workloads/tensors/ |
There was a problem hiding this comment.
Similar to the previous comment, the source path for copying tensor files seems incorrect due to the redundant flashinfer-bench/ segment in the path.
| cp -r ../../flashinfer-bench/flashinfer_trace/workloads/tensors/* \ | |
| workloads/tensors/ | |
| cp -r ../../flashinfer_trace/workloads/tensors/* \ | |
| workloads/tensors/ |
| # Wait for server to be ready | ||
| sleep 30 |
There was a problem hiding this comment.
Using a fixed sleep 30 is unreliable. The server might take longer to start on a slow machine or under load, or it might start faster, making the script wait unnecessarily. A more robust approach is to poll an endpoint on the server in a loop until it's ready.
| # Wait for server to be ready | |
| sleep 30 | |
| # Wait for server to be ready | |
| echo "Waiting for SGLang server to be ready..." | |
| for i in {1..60}; do | |
| if curl -s --fail "http://localhost:30000/v1/models" > /dev/null; then | |
| echo "Server is ready." | |
| break | |
| fi | |
| if [ $i -eq 60 ]; then | |
| echo "Server did not start within 60 seconds." >&2 | |
| exit 1 | |
| fi | |
| sleep 1 | |
| done |
| 3. **Shutdown Server**: | ||
| ```bash | ||
| # Gracefully shutdown SGLang server | ||
| pkill -f "sglang.launch_server" |
There was a problem hiding this comment.
pkill -f can be risky as it might terminate other unrelated processes if their command line matches the pattern. A safer way is to store the Process ID (PID) of the background server process and use kill with the specific PID.
When launching the server (around line 113), you can capture the PID:
python -m sglang.launch_server ... &
SERVER_PID=$!Then, you can use it here to shut down the server.
| pkill -f "sglang.launch_server" | |
| kill $SERVER_PID |
|
|
||
| 1. **Locate Dump Directory**: | ||
| ```bash | ||
| DUMP_DIR=$(ls -td workload_dumps_* | head -1) |
There was a problem hiding this comment.
Instead of using ls -td to find the dump directory, you can directly use the FLASHINFER_DUMP_DIR environment variable that was set earlier (around line 86). This is more robust and avoids potential race conditions or errors if other directories with similar names exist.
| DUMP_DIR=$(ls -td workload_dumps_* | head -1) | |
| DUMP_DIR="$FLASHINFER_DUMP_DIR" |
| tensor = tensors[input_name] | ||
|
|
||
| # Decision: random vs saved tensor | ||
| if tensor.numel() < 262144: # < 1MB for fp16 |
There was a problem hiding this comment.
The value 262144 is a magic number. While the comment explains it, in a real script it would be better practice to define it as a named constant for clarity and easier maintenance, e.g., MAX_ELEMENTS_FOR_RANDOM = 1 * 1024 * 1024 // 2. This would make the code more readable and easier to modify if the threshold needs to be changed.
| /clone-repos | ||
|
|
||
| # 2. Extract kernel definitions from model | ||
| /extract-kernel-definitions --model-name deepseek_v3 |
There was a problem hiding this comment.
There appears to be a typo in the model name. To maintain consistency with other examples in this file and the likely model identifier, it should be deepseek-v3 instead of deepseek_v3.
| /extract-kernel-definitions --model-name deepseek_v3 | |
| /extract-kernel-definitions --model-name deepseek-v3 |
There was a problem hiding this comment.
Code Review
This pull request introduces a new and well-documented collect-workloads skill, which automates the process of collecting real-world workloads from SGLang inference runs. This is a valuable addition for streamlining the benchmarking and testing pipeline. The changes also include updates to other documentation to centralize and clarify the process of sourcing model constants. My review focuses on ensuring the correctness and robustness of the shell commands within the new skill's documentation, and I've identified a critical path issue and a minor inconsistency that should be addressed.
| # Copy new/updated workload JSONL files | ||
| for def_name in $COLLECTED_DEFINITIONS; do | ||
| op_type=$(get_op_type $def_name) | ||
| cp ../../flashinfer-bench/flashinfer_trace/workloads/$op_type/$def_name.jsonl \ | ||
| workloads/$op_type/$def_name.jsonl | ||
| done | ||
|
|
||
| # Copy any new tensor safetensors files | ||
| if [ -d "../../flashinfer-bench/flashinfer_trace/workloads/tensors/" ]; then | ||
| cp -r ../../flashinfer-bench/flashinfer_trace/workloads/tensors/* \ | ||
| workloads/tensors/ | ||
| fi | ||
| ``` |
There was a problem hiding this comment.
The source paths in these cp commands appear to be incorrect. Assuming the script is run from the repository root, after cd tmp/flashinfer-trace, the relative path to the flashinfer_trace directory at the root would be ../../flashinfer_trace, not ../../flashinfer-bench/flashinfer_trace. The extra flashinfer-bench/ directory in the path will likely cause the command to fail.
| # Copy new/updated workload JSONL files | |
| for def_name in $COLLECTED_DEFINITIONS; do | |
| op_type=$(get_op_type $def_name) | |
| cp ../../flashinfer-bench/flashinfer_trace/workloads/$op_type/$def_name.jsonl \ | |
| workloads/$op_type/$def_name.jsonl | |
| done | |
| # Copy any new tensor safetensors files | |
| if [ -d "../../flashinfer-bench/flashinfer_trace/workloads/tensors/" ]; then | |
| cp -r ../../flashinfer-bench/flashinfer_trace/workloads/tensors/* \ | |
| workloads/tensors/ | |
| fi | |
| ``` | |
| # Copy new/updated workload JSONL files | |
| for def_name in $COLLECTED_DEFINITIONS; do | |
| op_type=$(get_op_type $def_name) | |
| cp ../../flashinfer_trace/workloads/$op_type/$def_name.jsonl \ | |
| workloads/$op_type/$def_name.jsonl | |
| done | |
| # Copy any new tensor safetensors files | |
| if [ -d "../../flashinfer_trace/workloads/tensors/" ]; then | |
| cp -r ../../flashinfer_trace/workloads/tensors/* \ | |
| workloads/tensors/ | |
| fi |
|
|
||
| 1. **Locate Dump Directory**: | ||
| ```bash | ||
| DUMP_DIR=$(ls -td workload_dumps_* | head -1) |
There was a problem hiding this comment.
This command to find the latest dump directory is fragile. If other directories matching workload_dumps_* are created by other processes concurrently, this could pick the wrong one. A more robust approach would be to store the directory name in a variable when it's created in Phase 2 and reuse it here. For example, in Phase 2, you could do DUMP_DIR_NAME=\"workload_dumps_$(date +%Y%m%d_%H%M%S)\" and export FLASHINFER_DUMP_DIR=./${DUMP_DIR_NAME}, then use $DUMP_DIR_NAME here directly.
| /clone-repos | ||
|
|
||
| # 2. Extract kernel definitions from model | ||
| /extract-kernel-definitions --model-name deepseek_v3 |
There was a problem hiding this comment.
There's an inconsistency in the model name used for deepseek. Here it's deepseek_v3, but elsewhere in this document (e.g., lines 26, 43, 413) and in CLAUDE.md, it's deepseek-v3. Using a consistent naming convention (with a hyphen) will prevent confusion and potential errors.
| /extract-kernel-definitions --model-name deepseek_v3 | |
| /extract-kernel-definitions --model-name deepseek-v3 |
No description provided.