-
Notifications
You must be signed in to change notification settings - Fork 69
feat: add ImageReward metric for text-to-image evaluation #270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* Introduced ImageRewardMetric to evaluate text-to-image generation quality, outperforming existing methods. * Updated pyproject.toml to include new dependencies: image-reward and clip. * Integrated ImageRewardMetric into the evaluation task processing. * Added unit tests for ImageRewardMetric, covering registration, prompt extraction, scoring, and error handling.
|
@begumcig I found that all prompts are truncated to 35 tokens... not sure if that makes sense but otherwise we could overwrite the score function as most of the prompt mentioned in their dataset actually extend to longer than 35 tokens. |
* Added ImageReward as a dependency in pyproject.toml for enhanced text-to-image evaluation. * Included timm>=1.0.0 in dependencies for improved model performance. * Updated uv.lock to reflect changes in package versions and ensure consistency across environments.
….lock * Changed the GitHub repository source for the image-reward dependency from THUDM to PrunaAI in both pyproject.toml and uv.lock files. * Removed the timm>=1.0.0 dependency from pyproject.toml to streamline the dependency list.
begumcig
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really great work David! Thank you so much for being so attentive to the details of the evaluation module. It's already almost there, requested some small changes :)
| assert all(prompt.startswith("prompt_") for prompt in extracted) | ||
|
|
||
|
|
||
| def test_score_image(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are really comprehensive thanks a lot! Shall we also add a case or cases using either PrunaModel.run_inference() or the EvaluationAgent, similar to our tests in cmmd metric? That way we would also ensure the metric is compatible with our engine and agent!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
This PR has been inactive for 10 days and is now marked as stale. |
|
|
||
| # Compute the result | ||
| result = metric.compute() | ||
| import pdb; pdb.set_trace() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description
Related Issue
Fixes #268
Type of Change
How Has This Been Tested?
Checklist
Additional Notes