Skip to content

Conversation

@ZiTao-Li
Copy link
Collaborator

@ZiTao-Li ZiTao-Li commented Sep 12, 2025

AgentScope Version

1.0.2

Description

Change the __call__ function of metric to async function for more general usage (e.g., LLM as a judge metrics)

  • update related modules
  • update tutorial
  • add a test for the evaluators in evaluate module

Checklist

Please check the following items before code is ready to be reviewed.

  • Code has been formatted with pre-commit run --all-files command
  • All tests are passing
  • Docstrings are in Google style
  • Related documentation has been updated (e.g. links, examples, etc.)
  • Code is ready for review

@ZiTao-Li ZiTao-Li requested a review from DavdGao September 12, 2025 04:22
@ZiTao-Li ZiTao-Li added Ready for Review Evaluation Evaluation related PR labels Sep 12, 2025
Copy link
Member

@DavdGao DavdGao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see inline comments, others look good to me.

@DavdGao DavdGao requested a review from Copilot September 15, 2025 01:47
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR changes the metric __call__ function from synchronous to asynchronous to support more general usage patterns, particularly LLM-as-a-judge metrics that require async operations.

Key changes:

  • Made the MetricBase.__call__ method abstract and async
  • Updated all evaluator classes to handle async metric calls
  • Added comprehensive test coverage for the evaluation module

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/evaluation_test.py Added comprehensive test suite for evaluators with async metric support
src/agentscope/evaluate/_task.py Made task evaluation method async to support async metric calls
src/agentscope/evaluate/_metric_base.py Changed abstract metric __call__ method to async
src/agentscope/evaluate/_evaluator/_ray_evaluator.py Refactored Ray evaluator to use async actors and proper async handling
src/agentscope/evaluate/_evaluator/_general_evaluator.py Updated general evaluator to handle async metric evaluation
src/agentscope/evaluate/_ace_benchmark/_ace_metric.py Made ACE benchmark metrics async-compatible
docs/tutorial/zh_CN/src/task_eval.py Updated Chinese tutorial example with async metric and corrected name
docs/tutorial/en/src/task_eval.py Updated English tutorial example with async metric and corrected name
Comments suppressed due to low confidence (1)

src/agentscope/evaluate/_evaluator/_ray_evaluator.py:1

  • Using __file__ in py_modules for Ray runtime_env is incorrect. __file__ refers to the current Python file being executed, but py_modules expects module names or paths to modules that should be made available to Ray workers. This will likely cause import errors in Ray workers.
# -*- coding: utf-8 -*-

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Member

@DavdGao DavdGao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DavdGao DavdGao merged commit 811fb28 into agentscope-ai:main Sep 16, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Evaluation Evaluation related PR Ready for Review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants