The simulator offers flexible tokenization to balance accuracy vs. performance. The mode is automatically selected based on the model name provided in the --model argument.
This mode is activated when the --model parameter specifies a valid HuggingFace model ID (e.g., meta-llama/Llama-3.1-8B-Instruct or Qwen/Qwen2.5-1.5B-Instruct).
- Behavior: The simulator downloads the actual tokenizer from HuggingFace and uses it for incoming requests.
- Accuracy: Ensures exact token counts and boundaries.
- Requirements:
- Network Access: Required on the first run to download the tokenizer.
- Authentication: Requires the
HF_TOKENenvironment variable if the model is gated or private.
- Storage: Tokenizers are stored in the directory specified by
--tokenizers-cache-dir(default:hf_cache). - Performance: Adds startup overhead (downloading/loading) and runtime overhead (tokenization logic).
This mode is activated when the --model parameter specifies a name that does not exist on HuggingFace (e.g., my-fake-model, test-cluster-1, gpt-4-turbo-sim).
- Behavior: Uses a simple regex-based tokenizer to split text and generates token hashes using the FNV-32a algorithm.
- Accuracy: Approximate. Token boundaries will not match real models.
- Pros:
- Zero Startup Overhead: No downloads or heavy model loading.
- High Throughput: Ideal for infrastructure testing where exact token boundaries are irrelevant.
- No Network/Auth: Works completely offline without API tokens.
Important: If you want to avoid the time and network overhead of HuggingFace tokenization:
- Use a "fake" or non-existent model name (e.g.,
--model fake-model) - This is recommended for testing scenarios where exact tokenization accuracy is not required
- HuggingFace tokenization is only necessary when you need accurate token counts matching actual HuggingFace models
| Parameter | Description | Default |
|---|---|---|
| --model | The model name. Determines the tokenization mode.(Mandatory) | |
| --tokenizers-cache-dir | Directory for caching HuggingFace tokenizers | hf_cache |
| HF_TOKEN | Environment variable for authenticating with HuggingFace |
Running with HuggingFace Tokenization:
export HF_TOKEN=hf_... ./bin/llm-d-inference-sim --model meta-llama/Llama-3.1-8B-Instruct --tokenizers-cache-dir /tmp/hf_cacheRunning with Simulated Tokenization:
./bin/llm-d-inference-sim --model test-sim-model