Mooncake Trace Replay

A script to replay Mooncake traces (https://github.com/kvcache-ai/Mooncake/blob/main/mooncake_trace.jsonl) against vLLM servers for performance testing and benchmarking.

Dependencies

vLLM (installed via pip)
Required Python packages:
```
pip install vllm transformers aiohttp
```

Quick Start

1. Start vLLM Server

# Run from outside the vLLM source directory to avoid import conflicts
cd /home/ie-user
source kobe/vllm/.venv/bin/activate
vllm serve NousResearch/Llama-3.2-1B

2. Run Trace Replay

cd /path/to/vllm/source
bash run_mooncake_replay.sh

Configuration

Modify these environment variables in run_mooncake_replay.sh or set them before running:

MODEL="NousResearch/Llama-3.2-1B"    # Model to test
HOST="localhost"                      # Server host
PORT="8000"                          # Server port
BACKEND="vllm"                       # Backend type
DURATION="60"                        # Test duration (seconds)
TIME_SCALE="1.0"                     # Speed up/slow down replay
PRESERVE_TIMING="true"               # Keep original request timing

Example Output

Successful requests: 313
Failed requests: 0
Total duration: 60.48s
Mean TTFT: 2187.98ms
Mean TPOT: 26.59ms

Results are saved to mooncake_replay_results.json with detailed metrics.

Notes

The script preserves original request timing for realistic load testing
Multiple requests run concurrently to simulate real traffic patterns
Ensure vLLM server is running and accessible before starting replay

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md
mooncake_trace.jsonl		mooncake_trace.jsonl
replay_mooncake_trace.py		replay_mooncake_trace.py
run_mooncake_replay.sh		run_mooncake_replay.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mooncake Trace Replay

Dependencies

Quick Start

1. Start vLLM Server

2. Run Trace Replay

Configuration

Example Output

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mooncake Trace Replay

Dependencies

Quick Start

1. Start vLLM Server

2. Run Trace Replay

Configuration

Example Output

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages