Improve: Implement accurate token estimation for cost calculations

GitSniff finding from PR #49 (Warning level - non-blocking)

Current implementation uses rough approximation: Math.ceil(text.length / 4)

Different embedding models use varying tokenization methods. This approximation may lead to:
- Inaccurate cost tracking for non-English texts
- Misleading budget alerts
- Discrepancies with actual API billing

Suggested improvements:
1. Integrate proper tokenizer (tiktoken for OpenAI models)
2. Add configurable charsPerToken ratio per model
3. Document limitation clearly in config schema

Priority: Medium - impacts cost tracking accuracy

See: https://github.com/openai/tiktoken

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve: Implement accurate token estimation for cost calculations #52

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Improve: Implement accurate token estimation for cost calculations #52

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions