fix: Migrate token usage extraction from native responses from legacy manual counting#3121
fix: Migrate token usage extraction from native responses from legacy manual counting#3121MaxonZhao wants to merge 4 commits into
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
thanks @MaxonZhao 's contribution, please double check the function we implemented
| # 3. Check if slicing is necessary | ||
| try: | ||
| current_tokens = token_counter.count_tokens_from_messages( | ||
| current_tokens = token_counter.extract_usage_from_response( |
There was a problem hiding this comment.
i think extract_usage_from_response would return dict instead of int?
fengju0213
left a comment
There was a problem hiding this comment.
thansk for contribution!
There was a problem hiding this comment.
i think we can use existing method directly,the methods below already has been recorded the original usage,we can just use it ,then synchronize this usage to token_counter.
cc @Wendong-Fan @MaxonZhao @waleedalzarooni
| int: Number of tokens in the messages. | ||
| """ | ||
| return self.token_counter.count_tokens_from_messages(messages) | ||
| return self.token_counter.extract_usage_from_response(messages) |
There was a problem hiding this comment.
this should return an int here as well
waleedalzarooni
left a comment
There was a problem hiding this comment.
Great work @MaxonZhao, very comprehensive updates!
Left a few comments, may seem like a lot but they only require slight tweaks to the implementation design, on the whole good job!
| # 3. Check if slicing is necessary | ||
| try: | ||
| current_tokens = token_counter.count_tokens_from_messages( | ||
| current_tokens = token_counter.extract_usage_from_response( |
There was a problem hiding this comment.
extract_usage_from_response expects a usage parameter to find token count, however, I believe the output of [message.to_openai_message(role)] will not include a usage field. Try passing a response object instead as this will include the usage field the counting mechanism requires.
| """ | ||
| counter = self.model_backend.token_counter | ||
| content_token_count = counter.count_tokens_from_messages( | ||
| content_token_count = counter.extract_usage_from_response( |
There was a problem hiding this comment.
I believe this will have the same issue as previous comment
|
|
||
| token_count = self.token_counter.count_tokens_from_messages( | ||
| token_count = self.token_counter.extract_usage_from_response( | ||
| [record.memory_record.to_openai_message()] |
There was a problem hiding this comment.
same as above, also returns a list of messages with no usage field
|
|
||
| message = first_record.memory_record.to_openai_message() | ||
| tokens = self.token_counter.count_tokens_from_messages([message]) | ||
| tokens = self.token_counter.extract_usage_from_response([message]) |
|
|
||
| original_count_tokens = ( | ||
| AnthropicTokenCounter.count_tokens_from_messages | ||
| AnthropicTokenCounter.extract_usage_from_response |
There was a problem hiding this comment.
Since extract_usage_from_response processes responses directly from the API it doesn't require the whitespace patching that count_tokens_from_messages required. If we are still keeping the old method I would revert this back to the original implementation as it is not necessary for the new response based method.
| return self._completion_cost | ||
|
|
||
| def extract_usage_from_response(self, response: Any) -> Optional[Dict[str, int]]: | ||
| r"""Extract native usage data from LiteLLM response. |
There was a problem hiding this comment.
Since these fields are identical to the OpenAI method simplify by creating an inheritance version
class LiteLLMTokenCounter(OpenAITokenCounter):
| return stream_options.get("include_usage", False) | ||
| elif provider.lower() in ["anthropic", "mistral"]: | ||
| return True | ||
| elif provider.lower() == "litellm": |
There was a problem hiding this comment.
May need to change this depending on whether LiteLLM approach is changed to inheritance based design
| print(f"Model: {model.model_type}") | ||
| print(f"Messages: {messages}") | ||
|
|
||
| traditional_count = model.count_tokens_from_messages(messages) |
There was a problem hiding this comment.
I'm getting a problem here, I think it's from the faulty count_tokens_from_messages(self, messages: List[OpenAIMessage]) -> int: in base_model.py. Should be fixed once that comment is implemented!
| pass | ||
|
|
||
| try: | ||
| from camel.utils import MistralTokenCounter |
There was a problem hiding this comment.
I don't think mistral_common is currently present in camel's dependencies, please add to pyproject.toml and follow the instructions on adding dependencies in CONTRIBUTING.md!
| ANTHROPIC_AVAILABLE = False | ||
|
|
||
| try: | ||
| from camel.utils import MistralTokenCounter |
There was a problem hiding this comment.
Same mistral_common dependencies update as above
Fixes #3026
This PR migrates token counting system from manual tokenization to native usage data provided by LLM providers. This fundamental change addresses accuracy and reliability issues with manual token counting while also enabling proper streaming support. The implementation adds native usage data extraction capabilities to the
BaseTokenCountersystem, providing accurate token tracking across all major LLM providers.Key Changes Made:
Enhanced BaseTokenCounter with Streaming Methods:
extract_usage_from_streaming_response()for synchronous streamsextract_usage_from_async_streaming_response()for asynchronous streamsextract_usage_from_response()abstract method for all providerscount_tokens_from_messages()methodProvider-Specific Native Usage Extraction:
response.usagewhenstream_options: {"include_usage": true}is enabledresponse.usage(input_tokens, output_tokens)response.usagewith provider-agnostic handlingresponse.usage(prompt_tokens, completion_tokens)Comprehensive Documentation:
STREAMING_TOKEN_COUNTING.md: Complete guide for streaming token countingSTREAMING_API_REFERENCE.md: API reference documentationOFFICIAL_STREAMING_API_REFERENCES.md: Links to official provider documentationarchitectural_analysis.md: Technical analysis and design rationaleExample Implementation:
streaming_token_counting_example.py: Comprehensive examples showing:streaming_token_counting_utils.py: Utility functions for configuration and validationRobust Error Handling:
Benefits Delivered:
Technical Implementation Details:
Streaming Usage Extraction Pattern:
Provider-Specific Configurations:
stream_options: {"include_usage": true}Migration Path:
The implementation provides a clear deprecation path while maintaining full backward compatibility:
count_tokens_from_messages()continues to work with deprecation warningsextract_usage_from_response()methods provide accurate native dataFuture Work
This migration to native token counting opens up several opportunities for further enhancements:
Short-term Improvements
BaseTokenCountermanual counting methods in a future major versionextract_usage_from_response()for consistent Langfuse integration, following the pattern already established inlitellm_model.pyMedium-term Enhancements
Long-term Vision
Migration Roadmap
deprecation>=2.1.0,<3- Added to main dependencies for deprecation warnings on legacy token counting methodspython-dotenv>=1.0.0- Already included in optional dependency groups (owl,eigent) for example environment loadingThis migration from manual to native token counting significantly improves the accuracy and reliability of token usage tracking in CAMEL. The implementation provides a clear deprecation path for existing manual counting methods while enabling proper streaming support and comprehensive documentation for adopting the new native approach.
Checklist
Go over all the following points, and put an
xin all the boxes that apply.Fixes #issue-numberin the PR description (required)pyproject.tomlanduv lock