GitSniff finding from PR #49 (Warning level - non-blocking)
Current implementation uses rough approximation: Math.ceil(text.length / 4)
Different embedding models use varying tokenization methods. This approximation may lead to:
- Inaccurate cost tracking for non-English texts
- Misleading budget alerts
- Discrepancies with actual API billing
Suggested improvements:
- Integrate proper tokenizer (tiktoken for OpenAI models)
- Add configurable charsPerToken ratio per model
- Document limitation clearly in config schema
Priority: Medium - impacts cost tracking accuracy
See: https://github.com/openai/tiktoken
GitSniff finding from PR #49 (Warning level - non-blocking)
Current implementation uses rough approximation: Math.ceil(text.length / 4)
Different embedding models use varying tokenization methods. This approximation may lead to:
Suggested improvements:
Priority: Medium - impacts cost tracking accuracy
See: https://github.com/openai/tiktoken