Skip to content

Conversation

@yhwang
Copy link
Collaborator

@yhwang yhwang commented May 14, 2025

Add new field: TaskGroups under the TaskList to
support custom task group. User can define a
custom task group and specify a list of aggregate
metrics. In the result JSON, the task groups have
a dedicated section of their results.

related: #445

Summary by Sourcery

Enable defining named task groups with custom aggregate metrics in LMEvalJob and propagate them through the CLI, controller, and driver

New Features:

  • Add TaskGroup API type to CRD with fields for name, task names or recipes, and aggregate metrics
  • Extend TaskList spec to include TaskGroups for grouping tasks into named sets

Enhancements:

  • Update controller and CLI generation logic to include task groups in command-line flags
  • Implement driver support to write task-group definitions alongside task recipes
  • Add validation to detect duplicate task names across recipes and groups

Tests:

  • Add unit tests for task group argument generation, driver file creation, and duplicate-name validation in controller and driver components

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented May 14, 2025

Reviewer's Guide

This PR adds support for user-defined TaskGroups within LMEvalJob by extending the Kubernetes API schema, controller validation, CLI flags, command-generation logic, and driver functionality to accept, validate, and produce grouped task definitions alongside existing tasks.

File-Level Changes

Change Details Files
Define TaskGroups and AggregateMetrics in the API and CRD schema
  • Add AggregateMetric and TaskGroup types to the TaskList API
  • Expose TaskGroups field on TaskList with JSON tags
  • Extend CRD YAML to document TaskGroups, TaskRecipes, and AggregateMetrics
  • Generate DeepCopy functions for TaskGroup and AggregateMetric
api/lmes/v1alpha1/lmevaljob_types.go
config/crd/bases/trustyai.opendatahub.io_lmevaljobs.yaml
api/lmes/v1alpha1/zz_generated.deepcopy.go
Enhance validation logic for TaskGroups and custom recipes
  • Merge duplicate-name checks across TaskRecipes and TaskGroups
  • Require at least TaskNames or TaskRecipes in each TaskGroup
  • Reuse custom-card unmarshal validation for group recipes
controllers/lmes/lmevaljob_controller.go
Extend controller to include TaskGroups in command construction
  • Update concatTasks to append group names
  • Modify generateArgs and generateCmd to emit --task-group flags
  • Implement taskGroupString helper to serialize group definitions
controllers/lmes/lmevaljob_controller.go
Add CLI flag and driver support for TaskGroups
  • Expose --task-group flag in lmes_driver CLI
  • Include TaskGroups in driver option struct
  • Implement createTaskGroups to write group YAML files
cmd/lmes_driver/main.go
controllers/lmes/driver/driver.go
Add tests covering TaskGroups and duplicate validations
  • Controller tests for generateArgs and generateCmd with TaskGroups
  • Validation tests for missing group definitions and duplicate names
  • Driver tests for writing group and recipe YAML files
controllers/lmes/lmevaljob_controller_test.go
controllers/lmes/driver/driver_test.go

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@yhwang yhwang requested a review from ruivieira May 14, 2025 05:52
@openshift-ci
Copy link

openshift-ci bot commented May 14, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@yhwang yhwang added the lm-eval Issues related to LM-Eval label May 14, 2025
@yhwang
Copy link
Collaborator Author

yhwang commented May 14, 2025

provide a simple LMEvalJob which uses the new TaskGroups:

apiVersion: trustyai.opendatahub.io/v1alpha1
kind: LMEvalJob
metadata:
  name: evaljob-sample
spec:
  allowOnline: true
  allowCodeExecution: true
  model: hf
  modelArgs:
  - name: pretrained
    value: google/flan-t5-base
  taskList:
    taskGroups:
    - name: mygroup
      taskNames:
      - leaderboard_gpqa_diamond
      - leaderboard_gpqa_extended
      - leaderboard_gpqa_main
      aggregateMetrics:
      - metricName: acc_norm
  logSamples: true
  limit: "100"
  pod:
    container:
      env:
      - name: HF_TOKEN
        value: HF_TOKENxxxxxxxxxxxxxxxxxx

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @yhwang - I've reviewed your changes and they look great!

Here's what I looked at during the review
  • 🟡 General issues: 3 issues found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@github-actions
Copy link

github-actions bot commented Jul 1, 2025

PR image build and manifest generation completed successfully!

📦 PR image: quay.io/trustyai/trustyai-service-operator-ci:56b672e3fd23d81c0d13d6b3961d282e582fb9a6

📦 LMES driver image: quay.io/trustyai/ta-lmes-driver-ci:56b672e3fd23d81c0d13d6b3961d282e582fb9a6

📦 LMES job image: quay.io/trustyai/ta-lmes-job-ci:56b672e3fd23d81c0d13d6b3961d282e582fb9a6

📦 Guardrails orchestrator image: quay.io/trustyai/ta-guardrails-orchestrator-ci:56b672e3fd23d81c0d13d6b3961d282e582fb9a6

🗂️ CI manifests

      devFlags:
        manifests:
          - contextDir: config
            sourcePath: ''
            uri: https://api.github.com/repos/trustyai-explainability/trustyai-service-operator-ci/tarball/operator-56b672e3fd23d81c0d13d6b3961d282e582fb9a6

Copy link
Member

@ruivieira ruivieira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yhwang LGTM, thanks for the PR!

I've left some comments, but just related to field validation.

Add new field: TaskGroups under the TaskList to
support custom task group. User can define a
custom task group and specify a list of aggregate
metrics. In the result JSON, the task groups have
a dedicated section of their results.

Signed-off-by: Yihong Wang <[email protected]>
@yhwang yhwang force-pushed the lmes-task-group branch from a813c85 to 56b672e Compare July 9, 2025 04:04
@yhwang
Copy link
Collaborator Author

yhwang commented Jul 9, 2025

Hi @ruivieira, thanks for the review and comments. I applied this pattern ^[a-zA-Z0-9._]+$ to the task/recipe/group/metric name. I removed the -. Recently, I just learned that in some JSON-enabled datastores, using - in the field name in a JSON object may cause some issues, although it's totally legit. And I don't think limiting the usage of - in the name field would impact the usage of the LMEvalJob CR.

@openshift-ci
Copy link

openshift-ci bot commented Jul 9, 2025

@yhwang: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/trustyai-service-operator-e2e 56b672e link true /test trustyai-service-operator-e2e

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/enhancement New feature or request lm-eval Issues related to LM-Eval ok-to-test

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

3 participants