feat: allow users to define task groups in LMEvalJob #471

yhwang · 2025-05-14T05:52:19Z

Add new field: TaskGroups under the TaskList to
support custom task group. User can define a
custom task group and specify a list of aggregate
metrics. In the result JSON, the task groups have
a dedicated section of their results.

related: #445

Summary by Sourcery

Enable defining named task groups with custom aggregate metrics in LMEvalJob and propagate them through the CLI, controller, and driver

New Features:

Add TaskGroup API type to CRD with fields for name, task names or recipes, and aggregate metrics
Extend TaskList spec to include TaskGroups for grouping tasks into named sets

Enhancements:

Update controller and CLI generation logic to include task groups in command-line flags
Implement driver support to write task-group definitions alongside task recipes
Add validation to detect duplicate task names across recipes and groups

Tests:

Add unit tests for task group argument generation, driver file creation, and duplicate-name validation in controller and driver components

sourcery-ai · 2025-05-14T05:52:23Z

Reviewer's Guide

This PR adds support for user-defined TaskGroups within LMEvalJob by extending the Kubernetes API schema, controller validation, CLI flags, command-generation logic, and driver functionality to accept, validate, and produce grouped task definitions alongside existing tasks.

File-Level Changes

Change	Details	Files
Define TaskGroups and AggregateMetrics in the API and CRD schema	Add AggregateMetric and TaskGroup types to the TaskList API Expose TaskGroups field on TaskList with JSON tags Extend CRD YAML to document TaskGroups, TaskRecipes, and AggregateMetrics Generate DeepCopy functions for TaskGroup and AggregateMetric	`api/lmes/v1alpha1/lmevaljob_types.go` `config/crd/bases/trustyai.opendatahub.io_lmevaljobs.yaml` `api/lmes/v1alpha1/zz_generated.deepcopy.go`
Enhance validation logic for TaskGroups and custom recipes	Merge duplicate-name checks across TaskRecipes and TaskGroups Require at least TaskNames or TaskRecipes in each TaskGroup Reuse custom-card unmarshal validation for group recipes	`controllers/lmes/lmevaljob_controller.go`
Extend controller to include TaskGroups in command construction	Update concatTasks to append group names Modify generateArgs and generateCmd to emit --task-group flags Implement taskGroupString helper to serialize group definitions	`controllers/lmes/lmevaljob_controller.go`
Add CLI flag and driver support for TaskGroups	Expose --task-group flag in lmes_driver CLI Include TaskGroups in driver option struct Implement createTaskGroups to write group YAML files	`cmd/lmes_driver/main.go` `controllers/lmes/driver/driver.go`
Add tests covering TaskGroups and duplicate validations	Controller tests for generateArgs and generateCmd with TaskGroups Validation tests for missing group definitions and duplicate names Driver tests for writing group and recipe YAML files	`controllers/lmes/lmevaljob_controller_test.go` `controllers/lmes/driver/driver_test.go`

Possibly linked issues

Provide the capability to support benchmarks in LMEvalJob #445: The PR adds TaskGroups, TaskRecipes, and AggregateMetrics to enable custom tasks and metrics grouping and aggregation as requested in the issue.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

openshift-ci · 2025-05-14T05:52:26Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

yhwang · 2025-05-14T05:55:12Z

provide a simple LMEvalJob which uses the new TaskGroups:

apiVersion: trustyai.opendatahub.io/v1alpha1
kind: LMEvalJob
metadata:
  name: evaljob-sample
spec:
  allowOnline: true
  allowCodeExecution: true
  model: hf
  modelArgs:
  - name: pretrained
    value: google/flan-t5-base
  taskList:
    taskGroups:
    - name: mygroup
      taskNames:
      - leaderboard_gpqa_diamond
      - leaderboard_gpqa_extended
      - leaderboard_gpqa_main
      aggregateMetrics:
      - metricName: acc_norm
  logSamples: true
  limit: "100"
  pod:
    container:
      env:
      - name: HF_TOKEN
        value: HF_TOKENxxxxxxxxxxxxxxxxxx

sourcery-ai

Hey @yhwang - I've reviewed your changes and they look great!

Here's what I looked at during the review

🟡 General issues: 3 issues found
🟢 Security: all looks good
🟢 Testing: all looks good
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

config/crd/bases/trustyai.opendatahub.io_lmevaljobs.yaml

github-actions · 2025-07-01T21:37:54Z

PR image build and manifest generation completed successfully!

📦 PR image: quay.io/trustyai/trustyai-service-operator-ci:56b672e3fd23d81c0d13d6b3961d282e582fb9a6

📦 LMES driver image: quay.io/trustyai/ta-lmes-driver-ci:56b672e3fd23d81c0d13d6b3961d282e582fb9a6

📦 LMES job image: quay.io/trustyai/ta-lmes-job-ci:56b672e3fd23d81c0d13d6b3961d282e582fb9a6

📦 Guardrails orchestrator image: quay.io/trustyai/ta-guardrails-orchestrator-ci:56b672e3fd23d81c0d13d6b3961d282e582fb9a6

🗂️ CI manifests

      devFlags:
        manifests:
          - contextDir: config
            sourcePath: ''
            uri: https://api.github.com/repos/trustyai-explainability/trustyai-service-operator-ci/tarball/operator-56b672e3fd23d81c0d13d6b3961d282e582fb9a6

ruivieira

@yhwang LGTM, thanks for the PR!

I've left some comments, but just related to field validation.

api/lmes/v1alpha1/lmevaljob_types.go

Add new field: TaskGroups under the TaskList to support custom task group. User can define a custom task group and specify a list of aggregate metrics. In the result JSON, the task groups have a dedicated section of their results. Signed-off-by: Yihong Wang <[email protected]>

yhwang · 2025-07-09T04:15:32Z

Hi @ruivieira, thanks for the review and comments. I applied this pattern ^[a-zA-Z0-9._]+$ to the task/recipe/group/metric name. I removed the -. Recently, I just learned that in some JSON-enabled datastores, using - in the field name in a JSON object may cause some issues, although it's totally legit. And I don't think limiting the usage of - in the name field would impact the usage of the LMEvalJob CR.

openshift-ci · 2025-07-09T04:58:48Z

@yhwang: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/trustyai-service-operator-e2e	`56b672e`	link	true	`/test trustyai-service-operator-e2e`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

yhwang requested a review from ruivieira May 14, 2025 05:52

yhwang added the lm-eval Issues related to LM-Eval label May 14, 2025

sourcery-ai bot reviewed May 14, 2025

View reviewed changes

config/crd/bases/trustyai.opendatahub.io_lmevaljobs.yaml Show resolved Hide resolved

config/crd/bases/trustyai.opendatahub.io_lmevaljobs.yaml Show resolved Hide resolved

config/crd/bases/trustyai.opendatahub.io_lmevaljobs.yaml Show resolved Hide resolved

ruivieira added this to TrustyAI planning May 14, 2025

ruivieira added this to the ODH 2.29 milestone May 14, 2025

yhwang force-pushed the lmes-task-group branch from 23d8f5b to 5061334 Compare May 14, 2025 23:52

ruivieira moved this to In Review in TrustyAI planning May 19, 2025

ruivieira modified the milestones: ODH 2.29, ODH 2.30 May 19, 2025

openshift-merge-robot added the needs-rebase label May 26, 2025

yhwang force-pushed the lmes-task-group branch from 5061334 to b6d716e Compare May 27, 2025 16:58

openshift-merge-robot removed the needs-rebase label May 27, 2025

yhwang force-pushed the lmes-task-group branch from b6d716e to a813c85 Compare June 24, 2025 23:43

ruivieira assigned yhwang Jul 1, 2025

ruivieira added kind/enhancement New feature or request ok-to-test labels Jul 1, 2025

ruivieira requested changes Jul 8, 2025

View reviewed changes

api/lmes/v1alpha1/lmevaljob_types.go Show resolved Hide resolved

api/lmes/v1alpha1/lmevaljob_types.go Show resolved Hide resolved

api/lmes/v1alpha1/lmevaljob_types.go Show resolved Hide resolved

api/lmes/v1alpha1/lmevaljob_types.go Show resolved Hide resolved

openshift-ci bot assigned ruivieira Jul 8, 2025

yhwang force-pushed the lmes-task-group branch from a813c85 to 56b672e Compare July 9, 2025 04:04

feat: allow users to define task groups in LMEvalJob #471

Are you sure you want to change the base?

feat: allow users to define task groups in LMEvalJob #471

Uh oh!

Conversation

yhwang commented May 14, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

File-Level Changes

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

openshift-ci bot commented May 14, 2025

Uh oh!

yhwang commented May 14, 2025

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ruivieira left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yhwang commented Jul 9, 2025

Uh oh!

openshift-ci bot commented Jul 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yhwang commented May 14, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented May 14, 2025 •

edited

Loading

github-actions bot commented Jul 1, 2025 •

edited

Loading