Required prerequisites
Motivation
The current token counting implementation using BaseTokenCounter and its subclasses (OpenAITokenCounter,
AnthropicTokenCounter, LiteLLMTokenCounter, MistralTokenCounter) presents several significant challenges:
- Accuracy Issues: Manual token counting via tiktoken and other tokenizers is prone to inaccuracies, especially
with:
- Different model-specific tokenization rules (GPT-3.5, GPT-4, O1 models each have different tokens_per_message
and tokens_per_name values)
- Image token calculations for vision models requiring complex logic
- Model-specific edge cases and special tokens
- Streaming Mode Limitations: Token counting in streaming mode is particularly problematic as:
- The full response isn't available until streaming completes
- Manual accumulation of streamed chunks is error-prone
- OpenAI now supports stream_options: {"include_usage": true} to get accurate usage in the final chunk
- Maintenance Burden: Supporting all models requires:
- Model-specific token counter implementations for each provider
- Keeping up with changes in tokenization rules
- Complex logic for different content types (text, images, structured outputs)
Proposed Solution
Deprecate BaseTokenCounter and its implementations in favor of using the native usage data from LLM responses:
- OpenAI/Compatible APIs: Use response.usage which provides accurate prompt_tokens, completion_tokens, and
total_tokens
- Streaming: Leverage stream_options: {"include_usage": true} to get usage data in the final streamed chunk
- Other providers: Each provider's SDK returns usage information in their response objects
Benefits
- Accuracy: Usage data comes directly from the model provider, ensuring 100% accuracy
- Simplicity: Eliminates ~500+ lines of complex token counting code
- Maintainability: No need to update tokenization logic when providers change their models
- Streaming support: Native support for token usage in streaming responses
- Universal compatibility: All major LLM providers include usage data in their responses
Migration Path
- Update model implementations to extract and return usage data from native responses
- Provide a deprecation warning for BaseTokenCounter usage
- Update documentation and examples to use the new approach
- Remove BaseTokenCounter and related code in a future major version
Code References
- Token counting implementation: camel/utils/token_counting.py:77-544
- Usage data already captured in some models: camel/models/litellm_model.py:217
- Streaming with usage example: examples/agents/chatagent_stream.py:44
Solution
No response
Alternatives
No response
Additional context
No response
Required prerequisites
Motivation
The current token counting implementation using BaseTokenCounter and its subclasses (OpenAITokenCounter,
AnthropicTokenCounter, LiteLLMTokenCounter, MistralTokenCounter) presents several significant challenges:
with:
- Different model-specific tokenization rules (GPT-3.5, GPT-4, O1 models each have different tokens_per_message
and tokens_per_name values)
- Image token calculations for vision models requiring complex logic
- Model-specific edge cases and special tokens
- The full response isn't available until streaming completes
- Manual accumulation of streamed chunks is error-prone
- OpenAI now supports stream_options: {"include_usage": true} to get accurate usage in the final chunk
- Model-specific token counter implementations for each provider
- Keeping up with changes in tokenization rules
- Complex logic for different content types (text, images, structured outputs)
Proposed Solution
Deprecate BaseTokenCounter and its implementations in favor of using the native usage data from LLM responses:
total_tokens
Benefits
Migration Path
Code References
Solution
No response
Alternatives
No response
Additional context
No response