[Feature Request] Deprecate current token usage calculation

### Required prerequisites

- [x] I have searched the [Issue Tracker](https://github.com/camel-ai/camel/issues) and [Discussions](https://github.com/camel-ai/camel/discussions) that this hasn't already been reported. (+1 or comment there if it has.)
- [ ] Consider asking first in a [Discussion](https://github.com/camel-ai/camel/discussions/new).

### Motivation

The current token counting implementation using BaseTokenCounter and its subclasses (OpenAITokenCounter,
  AnthropicTokenCounter, LiteLLMTokenCounter, MistralTokenCounter) presents several significant challenges:

  1. Accuracy Issues: Manual token counting via tiktoken and other tokenizers is prone to inaccuracies, especially
  with:
    - Different model-specific tokenization rules (GPT-3.5, GPT-4, O1 models each have different tokens_per_message
  and tokens_per_name values)
    - Image token calculations for vision models requiring complex logic
    - Model-specific edge cases and special tokens
  2. Streaming Mode Limitations: Token counting in streaming mode is particularly problematic as:
    - The full response isn't available until streaming completes
    - Manual accumulation of streamed chunks is error-prone
    - OpenAI now supports stream_options: {"include_usage": true} to get accurate usage in the final chunk
  3. Maintenance Burden: Supporting all models requires:
    - Model-specific token counter implementations for each provider
    - Keeping up with changes in tokenization rules
    - Complex logic for different content types (text, images, structured outputs)

  Proposed Solution

  Deprecate BaseTokenCounter and its implementations in favor of using the native usage data from LLM responses:

  - OpenAI/Compatible APIs: Use response.usage which provides accurate prompt_tokens, completion_tokens, and
  total_tokens
  - Streaming: Leverage stream_options: {"include_usage": true} to get usage data in the final streamed chunk
  - Other providers: Each provider's SDK returns usage information in their response objects

  Benefits

  1. Accuracy: Usage data comes directly from the model provider, ensuring 100% accuracy
  2. Simplicity: Eliminates ~500+ lines of complex token counting code
  3. Maintainability: No need to update tokenization logic when providers change their models
  4. Streaming support: Native support for token usage in streaming responses
  5. Universal compatibility: All major LLM providers include usage data in their responses

  Migration Path

  1. Update model implementations to extract and return usage data from native responses
  2. Provide a deprecation warning for BaseTokenCounter usage
  3. Update documentation and examples to use the new approach
  4. Remove BaseTokenCounter and related code in a future major version

  Code References

  - Token counting implementation: camel/utils/token_counting.py:77-544
  - Usage data already captured in some models: camel/models/litellm_model.py:217
  - Streaming with usage example: examples/agents/chatagent_stream.py:44

### Solution

_No response_

### Alternatives

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Deprecate current token usage calculation #3026

Required prerequisites

Motivation

Solution

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Deprecate current token usage calculation #3026

Description

Required prerequisites

Motivation

Solution

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions