add context_log stat by ahgpt · Pull Request #789 · inclusionAI/AWorld

ahgpt · 2026-02-28T03:58:44Z

No description provided.

gemini-code-assist · 2026-02-28T03:59:01Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the observability and analysis capabilities for context usage within the aworld system. It establishes a new, detailed logging mechanism for context length metrics and introduces a suite of Python tools for parsing, extracting, and visualizing these logs. These additions will empower developers and users to gain deeper insights into how context windows are being utilized, facilitating better optimization and understanding of token consumption across different agents and models.

Highlights

Enhanced Context Logging: A new digest_logger.info call has been added to prompt_log.py to record detailed context length statistics, including agent ID, task ID, user, session, model name, total context length, token breakdown, and message count. This log entry is conditional, only firing if there are messages and the last message is not from an assistant.
New Log Parsing Tool (context_log.py): A new Python script aworld/logs/tools/context_log.py is introduced. This script is a log parser designed to extract and analyze ContextRecord objects from amnicontext_prompt.log files. It supports concurrent processing and batch optimization, allowing users to find specific context records, retrieve the latest task record based on a search string, and save records to a file.
New Context Statistics Analysis Tool (context_stat.py): A new Python script aworld/logs/tools/context_stat.py is added. This tool analyzes context usage from digest logs, providing tree structure visualizations of context usage by agent and subtasks, and time-series charts showing context usage trends. It defines data structures for various statistics and can parse log files, calculate statistics, and generate different types of plots (trend, agent comparison, model comparison, session comparison).
Command-Line Interface for Statistics (context_stat_tool.py): A new command-line interface aworld/logs/tools/context_stat_tool.py is created to wrap the ContextAnalyzer from context_stat.py. This tool allows users to easily list sessions, print tree structures, and generate trend, agent, and model comparison charts from digest logs. It also includes a stat_log function for programmatic use and handles default log file paths.
Example Usage and Documentation: New example files (__init__.py, README.md, run.py) are added under examples/aworld_quick_start/context_stat/ to demonstrate how to use the new context statistics tools. The README.md provides clear instructions for both the example run.py script and the full command-line interface.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

aworld/logs/prompt_log.py
- Added a new digest_logger.info call to log detailed context length statistics, including agent ID, task ID, user, session, model, total context, token breakdown, and message count, under specific conditions.
aworld/logs/tools/context_log.py
- Added a new Python script for parsing amnicontext_prompt.log files.
- Defined ContextRecord for data structure.
- Implemented LogParser for reading and parsing log chunks concurrently, extracting context details, and saving records.
aworld/logs/tools/context_stat.py
- Added a new Python script for analyzing context usage from digest logs.
- Included data classes for ContextRecord, AgentStats, SubtaskStats, ModelStats, SessionStats.
- Implemented a ContextAnalyzer class to parse logs, calculate statistics, and generate various comparison and trend plots.
aworld/logs/tools/context_stat_tool.py
- Added a new command-line interface tool that leverages context_stat.py to provide user-friendly access to context usage analysis.
- Enabled functionality for listing sessions, printing tree structures, and generating charts.
examples/aworld_quick_start/context_stat/README.md
- Added documentation explaining the usage of the new context statistics analysis tools.
examples/aworld_quick_start/context_stat/init.py
- Added an empty __init__.py file to define the context_stat directory as a Python package.
examples/aworld_quick_start/context_stat/run.py
- Added an example script demonstrating how to use the context_stat_tool.py for digest log analysis.

Activity

The pull request introduces new functionality and tools. No specific comments or reviews have been recorded yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a suite of tools for analyzing context usage from logs, including a new log line for capturing context length statistics and new scripts: context_log.py for parsing prompt logs, context_stat.py for statistical analysis and visualization, and context_stat_tool.py as a command-line interface. While these tools are useful, they introduce significant security vulnerabilities. Specifically, there is a Log Injection vulnerability in aworld/logs/prompt_log.py where untrusted user input is logged using specific separators (| and │) without sanitization, potentially allowing attackers to spoof log entries. Additionally, the new CLI tools in aworld/logs/tools/ are vulnerable to Path Traversal, as they accept unvalidated file paths and output directories from command-line arguments, which could lead to arbitrary file writes. Beyond security, a critical issue was identified in the log parsing logic of context_log.py that could lead to incorrect data processing when handling large files. Several medium-severity issues also exist, related to code cleanliness, maintainability (e.g., unused imports and inconsistent comments), and the usability of the example script. Addressing these security and functional concerns is crucial before merging.

gemini-code-assist · 2026-02-28T04:01:06Z

aworld/logs/tools/context_log.py

+    def read_file_chunks(self) -> List[List[str]]:
+        """
+        读取文件并分块处理
+
+        Returns:
+            文件内容分块列表
+        """
+        logger.info(f"🚀 开始读取日志文件: {self.log_file_path}")
+
+        try:
+            with open(self.log_file_path, 'r', encoding='utf-8') as file:
+                lines = []
+                chunks = []
+
+                for line in file:
+                    lines.append(line.strip())
+
+                    # 当达到分块大小时，处理当前块
+                    if len(lines) >= self.chunk_size:
+                        chunks.append(lines.copy())
+                        lines.clear()
+                        logger.debug(f"📦 处理文件块，当前块数: {len(chunks)}")
+
+                # 处理最后一块
+                if lines:
+                    chunks.append(lines)
+                    logger.debug(f"📦 处理最后文件块，总块数: {len(chunks)}")
+
+                logger.info(f"✅ 文件读取完成，共分 {len(chunks)} 个块")
+                return chunks
+
+        except Exception as e:
+            logger.error(f"❌ 读取文件失败: {e}")
+            raise


The current implementation of read_file_chunks splits the log file into fixed-size chunks of lines. This can cause a single log record (which can span multiple lines) to be split across two different chunks. When this happens, parse_chunk and parse_context_record will fail to parse the record correctly, potentially leading to missed records or incomplete data. This is a critical issue for a log parser.

A more robust approach would be to make the chunking logic aware of record boundaries. For example, you could read a block of lines, find the last record start marker (🚀 AGENT EXECUTION START), and process the chunk up to that point. The remaining lines would then be prepended to the next block of lines read from the file. This would ensure that parse_context_record always receives complete records. However, this would make parallel processing with ThreadPoolExecutor more complex as it would introduce sequential dependency.

An alternative for parallel processing is to have each worker seek to a calculated offset in the file, then scan backwards to the first record separator to find a clean starting point. This avoids splitting records.

gemini-code-assist · 2026-02-28T04:01:06Z

aworld/logs/prompt_log.py

+                digest_logger.info(
+                    f"context_length|{agent.id()}|{context.task_id}|{getattr(context, 'user', 'default')}|{context.session_id}|{model_name}|{total_context_length}|{json.dumps(token_breakdown)}|{len(messages)}")


The digest_logger writes structured data using the | separator. However, fields like agent.id(), context.task_id, and model_name are not sanitized for this separator. If any of these fields contain a | character, it will corrupt the log entry and cause the context_stat.py parser to misalign fields or fail. Additionally, if these fields contain newlines, an attacker could inject entirely new log entries.

gemini-code-assist · 2026-02-28T04:01:06Z

aworld/logs/tools/context_log.py

+            if not output_path.suffix.lower() == '.txt':
+                output_path = output_path.with_suffix('.txt')
+
+            with open(output_path, 'w', encoding='utf-8') as f:


The save_record_to_file function uses the output_file path provided via command-line arguments without validation. This allows for arbitrary file write via path traversal (e.g., using ../../ in the -o or --output-file argument). An attacker running this tool could overwrite sensitive files the user has access to.

gemini-code-assist · 2026-02-28T04:01:06Z

aworld/logs/tools/context_stat.py

+
+        # Save or show plot
+        if save_path:
+            plt.savefig(save_path, dpi=300, bbox_inches='tight')


The tool uses user-provided paths from command-line arguments (e.g., --save-trend, --save-compare) for saving charts via plt.savefig without validation. This allows for arbitrary file write via path traversal.

gemini-code-assist · 2026-02-28T04:01:06Z

aworld/logs/tools/context_stat_tool.py

+    out_dir = os.path.abspath(args.output_dir)
+    if out_dir != "." and not os.path.isdir(out_dir):
+        os.makedirs(out_dir, exist_ok=True)


The output_dir argument is used to create directories and construct file paths for saving charts without validation. This allows an attacker to specify an arbitrary directory on the system for file writes via path traversal.

gemini-code-assist · 2026-02-28T04:01:06Z

aworld/logs/prompt_log.py

                    prompt_logger.info(f"│ {line}")

            prompt_logger.info(_generate_separator())
+            if messages and len(messages) > 0 and messages[-1].get('role') != 'assistant':


The condition messages and len(messages) > 0 is redundant. In Python, a non-empty list evaluates to True in a boolean context, so you can simplify this to just if messages and ... for better readability.

Suggested change

if messages and len(messages) > 0 and messages[-1].get('role') != 'assistant':

if messages and messages[-1].get('role') != 'assistant':

gemini-code-assist · 2026-02-28T04:01:07Z

aworld/logs/tools/context_log.py

+"""
+日志解析器 - 从amnicontext_prompt.log中根据Context ID获取最后一条task_id记录
+支持并发处理和批次优化
+"""


The docstrings and many comments in this file are in Chinese, while the rest of the codebase and this PR seem to use English. For consistency and maintainability, it's recommended to write all comments and documentation in English.

gemini-code-assist · 2026-02-28T04:01:07Z

aworld/logs/tools/context_stat.py

+import pandas as pd
+import numpy as np


The pandas and numpy libraries are imported but do not appear to be used in this file. It's best to remove unused imports to keep the code clean and reduce dependencies.

gemini-code-assist · 2026-02-28T04:01:07Z

examples/aworld_quick_start/context_stat/run.py

+import glob
+import os


The glob and os modules are imported but not used in this script. It's best to remove unused imports.

gemini-code-assist · 2026-02-28T04:01:07Z

examples/aworld_quick_start/context_stat/run.py

+from aworld.logs.tools.context_stat_tool import stat_log
+
+
+log_file = "yourpath/logs/digest_logger.log"


The log_file is hardcoded to a placeholder path "yourpath/logs/digest_logger.log", which will cause the script to fail if not edited by the user. For a better user experience, consider using a more robust way to determine the log file path. For example, you could use a relative path to an example log file within the repository, or accept a command-line argument.

add context_log stat

85d09d7

gemini-code-assist bot reviewed Feb 28, 2026

View reviewed changes

add context_log stat

ff0bcc4

rainsonGain approved these changes Feb 28, 2026

View reviewed changes

rainsonGain merged commit 43e5344 into main Feb 28, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add context_log stat#789

add context_log stat#789
rainsonGain merged 2 commits intomainfrom
feat/aworld-cli_commands

ahgpt commented Feb 28, 2026

Uh oh!

gemini-code-assist bot commented Feb 28, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 28, 2026

Uh oh!

gemini-code-assist bot Feb 28, 2026

Uh oh!

gemini-code-assist bot Feb 28, 2026

Uh oh!

gemini-code-assist bot Feb 28, 2026

Uh oh!

gemini-code-assist bot Feb 28, 2026

Uh oh!

gemini-code-assist bot Feb 28, 2026

Uh oh!

gemini-code-assist bot Feb 28, 2026

Uh oh!

gemini-code-assist bot Feb 28, 2026

Uh oh!

gemini-code-assist bot Feb 28, 2026

Uh oh!

gemini-code-assist bot Feb 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		digest_logger.info(
		f"context_length\|{agent.id()}\|{context.task_id}\|{getattr(context, 'user', 'default')}\|{context.session_id}\|{model_name}\|{total_context_length}\|{json.dumps(token_breakdown)}\|{len(messages)}")

	if messages and len(messages) > 0 and messages[-1].get('role') != 'assistant':
	if messages and messages[-1].get('role') != 'assistant':

		from aworld.logs.tools.context_stat_tool import stat_log


		log_file = "yourpath/logs/digest_logger.log"

Conversation

ahgpt commented Feb 28, 2026

Uh oh!

gemini-code-assist bot commented Feb 28, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants