feat: add ImageReward metric for text-to-image evaluation #270

davidberenstein1957 · 2025-07-22T14:28:47Z

Description

Introduced ImageRewardMetric to evaluate text-to-image generation quality, outperforming existing methods.
Updated pyproject.toml to include new dependencies: image-reward and clip.
Integrated ImageRewardMetric into the evaluation task processing.
Added unit tests for ImageRewardMetric, covering registration, prompt extraction, scoring, and error handling.

Related Issue

Fixes #268

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Additional Notes

* Introduced ImageRewardMetric to evaluate text-to-image generation quality, outperforming existing methods. * Updated pyproject.toml to include new dependencies: image-reward and clip. * Integrated ImageRewardMetric into the evaluation task processing. * Added unit tests for ImageRewardMetric, covering registration, prompt extraction, scoring, and error handling.

davidberenstein1957 · 2025-07-22T14:31:33Z

@begumcig I found that all prompts are truncated to 35 tokens... not sure if that makes sense but otherwise we could overwrite the score function as most of the prompt mentioned in their dataset actually extend to longer than 35 tokens.

https://github.com/THUDM/ImageReward/blob/c1392c6dd0fd6ecd6d416c96959ab744a6d0a8fb/ImageReward/ImageReward.py#L110

* Added ImageReward as a dependency in pyproject.toml for enhanced text-to-image evaluation. * Included timm>=1.0.0 in dependencies for improved model performance. * Updated uv.lock to reflect changes in package versions and ensure consistency across environments.

….lock * Changed the GitHub repository source for the image-reward dependency from THUDM to PrunaAI in both pyproject.toml and uv.lock files. * Removed the timm>=1.0.0 dependency from pyproject.toml to streamline the dependency list.

davidberenstein1957 · 2025-07-22T15:50:35Z

@sharpenb @sdiazlor @begumcig I've implemented a more generalisable reward metric here: #272

begumcig

Really great work David! Thank you so much for being so attentive to the details of the evaluation module. It's already almost there, requested some small changes :)

src/pruna/evaluation/metrics/metric_imagereward.py

uv.lock

begumcig · 2025-07-31T17:25:56Z

tests/evaluation/test_image_reward_metric.py

+    assert all(prompt.startswith("prompt_") for prompt in extracted)
+
+
+def test_score_image():


These tests are really comprehensive thanks a lot! Shall we also add a case or cases using either PrunaModel.run_inference() or the EvaluationAgent, similar to our tests in cmmd metric? That way we would also ensure the metric is compatible with our engine and agent!

sharpenb

The PR description suggests some big overlap between #272 and #270. It would be nice to merge this two PR togethers (and address all @begum comments ;))

github-actions · 2025-08-12T00:07:52Z

This PR has been inactive for 10 days and is now marked as stale.

cursor · 2025-08-20T16:08:56Z

tests/evaluation/test_image_reward_metric.py

+
+    # Compute the result
+    result = metric.compute()
+    import pdb; pdb.set_trace()


Bug: Debugging Code Left in Test

A pdb.set_trace() call was left in test_update_and_compute. This debugging statement pauses test execution, which breaks automated test runs and CI/CD pipelines.

davidberenstein1957 linked an issue Jul 22, 2025 that may be closed by this pull request

[FEATURE] Add ImageReward metric #268

Open

davidberenstein1957 requested review from begumcig, sdiazlor and sharpenb July 22, 2025 14:28

davidberenstein1957 added 2 commits July 22, 2025 16:43

davidberenstein1957 mentioned this pull request Jul 22, 2025

feat: 271 feature implement hps hpsv2 VQA #272

Open

10 tasks

begumcig requested changes Jul 31, 2025

View reviewed changes

sharpenb requested changes Aug 1, 2025

View reviewed changes

github-actions bot added the stale label Aug 12, 2025

github-actions bot closed this Aug 20, 2025

davidberenstein1957 reopened this Aug 20, 2025

cursor bot reviewed Aug 20, 2025

View reviewed changes

github-actions bot removed the stale label Aug 21, 2025

davidberenstein1957 closed this Aug 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add ImageReward metric for text-to-image evaluation #270

feat: add ImageReward metric for text-to-image evaluation #270

Uh oh!

davidberenstein1957 commented Jul 22, 2025

Uh oh!

davidberenstein1957 commented Jul 22, 2025

Uh oh!

davidberenstein1957 commented Jul 22, 2025

Uh oh!

begumcig left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

begumcig Jul 31, 2025

Uh oh!

sharpenb left a comment •

edited

Loading

Uh oh!

github-actions bot commented Aug 12, 2025

Uh oh!

cursor bot Aug 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		assert all(prompt.startswith("prompt_") for prompt in extracted)


		def test_score_image():

feat: add ImageReward metric for text-to-image evaluation #270

feat: add ImageReward metric for text-to-image evaluation #270

Uh oh!

Conversation

davidberenstein1957 commented Jul 22, 2025

Description

Related Issue

Type of Change

How Has This Been Tested?

Checklist

Additional Notes

Uh oh!

davidberenstein1957 commented Jul 22, 2025

Uh oh!

davidberenstein1957 commented Jul 22, 2025

Uh oh!

begumcig left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

begumcig Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

sharpenb left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 12, 2025

Uh oh!

cursor bot Aug 20, 2025

Choose a reason for hiding this comment

Bug: Debugging Code Left in Test

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sharpenb left a comment •

edited

Loading