Skip to content

Refactor EPP initialization for unified integration testing#2351

Merged
k8s-ci-robot merged 7 commits intokubernetes-sigs:mainfrom
zetxqx:integrationtest-crdwatcher
Feb 20, 2026
Merged

Refactor EPP initialization for unified integration testing#2351
k8s-ci-robot merged 7 commits intokubernetes-sigs:mainfrom
zetxqx:integrationtest-crdwatcher

Conversation

@zetxqx
Copy link
Copy Markdown
Contributor

@zetxqx zetxqx commented Feb 17, 2026

What type of PR is this?

/kind test

What this PR does / why we need it:

  • Runner Refactoring (cmd/epp/runner):
    • Extracted Setup() from Run(). This method initializes the Manager, Datastore, and ExtProcServerRunner but yields them before starting, allowing tests to inspect state or inject dependencies.
    • Added IsIntegrationTest to Options. When enabled,
      * this injects a FakePodMetricsClient
      * bypasses k8s controller name validation.
    • Fix: Corrected the argument order for setupDatastore, swapping PoolName and PoolNamespace to match the function signature in EPP startup setupDatastore is passing in wrong parameter in standalone mode #2367
  • Integration Test Harness (test/integration):
    • Rewrote NewTestHarness to use eppRunner.Setup() instead of manually constructing the server stack.
    • Updated hermetic_test.go to configure tests via server.Options, ensuring tests run with a configuration path identical to the main binary.
  • Utilities:
    • Refactored StartExtProcServer and extracted ExtProcServerClient to separate server execution from client connection management.

Which issue(s) this PR fixes:

Fixes partially #2332 , further refactor is needed, current implementation still rely on a IsIntegrationTest flag in opts.

This flag is now used in limited two places:

  1. replace the metrics client with a fake one.
  2. for k8s controller skipNameValidation to true.

Fixes #2367

Does this PR introduce a user-facing change?:


@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@zetxqx: The label(s) kind/test cannot be applied, because the repository doesn't have them.

Details

In response to this:

What type of PR is this?

/kind test

What this PR does / why we need it:

Integration test for the fix in #2300

  • small refactor: Split Runner.Run in cmd/epp/runner/runner.go into Run and Setup. The new Setup method returns the ctrl.Manager and datastore.Datastore instances, enabling testability in integration testing.
  • Added TestCRDWatchers. it verifies that the EPP correctly watches InferenceModelRewrite and InferenceObjective resources and update the dataStore.

Which issue(s) this PR fixes:

Fixes partially #2332

Does this PR introduce a user-facing change?:


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@netlify
Copy link
Copy Markdown

netlify Bot commented Feb 17, 2026

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit b2ce70d
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/699816eda02cf300083f0875
😎 Deploy Preview https://deploy-preview-2351--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 17, 2026
Comment thread cmd/epp/runner/runner.go Outdated
Comment thread test/integration/epp/hermetic_test.go Outdated
@zetxqx zetxqx force-pushed the integrationtest-crdwatcher branch from b6d63bc to 68fce88 Compare February 18, 2026 06:48
@zetxqx zetxqx changed the title adds integration test validation for CRD watchers and small refactor on EPP runner Refactor EPP initialization for unified integration testing Feb 18, 2026
@zetxqx
Copy link
Copy Markdown
Contributor Author

zetxqx commented Feb 18, 2026

Note: the new commit introduces opts.IsIntegrationTest in the prod code path. But I think it's an acceptable state before more refactoring. Given opts.IsIntegrationTest is only used for two places.

  1. inject the fake metrics client.
  2. bypass the k8s name validation check.

Open to any comment for further refactoring.

@nirrozenbaum
Copy link
Copy Markdown
Contributor

Note: the new commit introduces opts.IsIntegrationTest in the prod code path. But I think it's an acceptable state before more refactoring. Given opts.IsIntegrationTest is only used for two places.

  1. inject the fake metrics client.
  2. bypass the k8s name validation check.

Open to any comment for further refactoring.

I'm not in favor of adding IsIntegrationTest flag in production. this is error prone and could lead to wrong deployments.

I propose to try an alternative approach:
inside cmd/epp, we can have additional test_runner.go file.
we need to expose setting some of the things in runner (add private fields in runner, add private functions to set them, e.g., function called withPodMetricsClient).
so these functions cannot be called outside of epp/runner package.
then in runner.go, we add conditionals, for example if podMetricsClient is already set, don't create it but use whatever was set.

then tests can use test_runner.go.
what do you think?

@zetxqx
Copy link
Copy Markdown
Contributor Author

zetxqx commented Feb 18, 2026

Note: the new commit introduces opts.IsIntegrationTest in the prod code path. But I think it's an acceptable state before more refactoring. Given opts.IsIntegrationTest is only used for two places.

  1. inject the fake metrics client.
  2. bypass the k8s name validation check.

Open to any comment for further refactoring.

I'm not in favor of adding IsIntegrationTest flag in production. this is error prone and could lead to wrong deployments.

I propose to try an alternative approach: inside cmd/epp, we can have additional test_runner.go file. we need to expose setting some of the things in runner (add private fields in runner, add private functions to set them, e.g., function called withPodMetricsClient). so these functions cannot be called outside of epp/runner package. then in runner.go, we add conditionals, for example if podMetricsClient is already set, don't create it but use whatever was set.

then tests can use test_runner.go. what do you think?

Great suggestion, I moved those custom method for integration test to separate test_runner.go under cmd/epp in the new commit 22808d1 . However since the integration test is under different folder, it's not possible to make them private. Should we consider move the current integration test test/integration/epp/hermetic_test.go under cmd/epp ?

@nirrozenbaum
Copy link
Copy Markdown
Contributor

Great suggestion, I moved those custom method for integration test to separate test_runner.go under cmd/epp in the new commit 22808d1 . However since the integration test is under different folder, it's not possible to make them private. Should we consider move the current integration test test/integration/epp/hermetic_test.go under cmd/epp ?

would adding a NewTestRunner function help?
that should return NewRunner().WithTestPodMetricsClient...

Comment thread cmd/epp/runner/runner.go Outdated
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 18, 2026
@zetxqx
Copy link
Copy Markdown
Contributor Author

zetxqx commented Feb 18, 2026

Great suggestion, I moved those custom method for integration test to separate test_runner.go under cmd/epp in the new commit 22808d1 . However since the integration test is under different folder, it's not possible to make them private. Should we consider move the current integration test test/integration/epp/hermetic_test.go under cmd/epp ?

would adding a NewTestRunner function help? that should return NewRunner().WithTestPodMetricsClient...

Good call, updated. Refactored more, now we only need to set withSkipNameValidation before call setup(). this can be further eliminated by not share testEnv for every test cases. But that may make the test run slower.

@zetxqx zetxqx force-pushed the integrationtest-crdwatcher branch from 9682cce to f4fd95c Compare February 18, 2026 17:51
Copy link
Copy Markdown
Contributor

@ahg-g ahg-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

Comment thread cmd/epp/runner/runner.go Outdated
Comment thread cmd/epp/runner/runner.go Outdated
Comment thread cmd/epp/runner/test_runner.go Outdated
Comment thread pkg/epp/server/controller_manager.go Outdated
Comment thread cmd/epp/runner/runner.go Outdated
Comment thread cmd/epp/runner/runner.go Outdated
…r.setup, remove serverRunner in testHarness and setUp return
testEnv *envtest.Environment
testScheme = runtime.NewScheme()
logger = zap.New(zap.UseDevMode(true), zap.Level(zapcore.Level(logutil.DEFAULT)))
logger = zap.New(zap.UseDevMode(true), zap.Level(-1*zapcore.Level(logutil.DEFAULT)))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does the -1 mean?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found there is no log while debugging and found this zapcore actually needs to multiply -1 to take effect

Image

Comment thread test/integration/epp/harness.go Outdated
// 7. Register Cleanup
t.Cleanup(func() {
serverCancel()
// serverCancel()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we cancel the server now? don't we need to create a context with cancel, and pass the context to NewTestRunnerSetup above and call cancel here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the comment. The server was ext_proc server started manually in the harness. Now since we are using setup we only needs to cancel the k8s manager, ext_proc is one of the runners managed by the manger.

Comment thread test/integration/epp/hermetic_test.go Outdated
Comment thread test/integration/epp/hermetic_test.go Outdated
@zetxqx
Copy link
Copy Markdown
Contributor Author

zetxqx commented Feb 19, 2026

/retest

@ahg-g
Copy link
Copy Markdown
Contributor

ahg-g commented Feb 19, 2026

You didn't push a commit, also the failed test is a lint error: https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_gateway-api-inference-extension/2351/pull-gateway-api-inference-extension-verify-main/2024575702841430016

@zetxqx zetxqx force-pushed the integrationtest-crdwatcher branch from 1034b2f to 4a56d94 Compare February 20, 2026 00:33
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 20, 2026
Comment thread test/integration/epp/hermetic_test.go Outdated
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 20, 2026
@ahg-g
Copy link
Copy Markdown
Contributor

ahg-g commented Feb 20, 2026

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 20, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, zetxqx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 20, 2026
@capri-xiyue
Copy link
Copy Markdown
Contributor

/retest

@k8s-ci-robot k8s-ci-robot merged commit 688c0cc into kubernetes-sigs:main Feb 20, 2026
11 checks passed
@zetxqx zetxqx deleted the integrationtest-crdwatcher branch February 20, 2026 18:09
kaushikmitr pushed a commit to tomatillo-and-multiverse/gateway-api-inference-extension-merged that referenced this pull request Feb 24, 2026
…es-sigs#2351)

* add integration test for checking the inferenceObjective and inferenceModelRewrites is being watched.

* continue refactor integration

* not use opts.IsIntegrationTest.

* inject PodMetricsClient via SetUp

* remove unneeded function, rename skipNameValidation, unexported runner.setup,  remove serverRunner in testHarness and setUp return

* encapsulate more to harness

* Update test/integration/epp/hermetic_test.go

pass in the context

---------

Co-authored-by: Abdullah Gharaibeh <[email protected]>
RyanRosario pushed a commit to RyanRosario/gateway-api-inference-extension that referenced this pull request Mar 9, 2026
…es-sigs#2351)

* add integration test for checking the inferenceObjective and inferenceModelRewrites is being watched.

* continue refactor integration

* not use opts.IsIntegrationTest.

* inject PodMetricsClient via SetUp

* remove unneeded function, rename skipNameValidation, unexported runner.setup,  remove serverRunner in testHarness and setUp return

* encapsulate more to harness

* Update test/integration/epp/hermetic_test.go

pass in the context

---------

Co-authored-by: Abdullah Gharaibeh <[email protected]>
elevran pushed a commit to llm-d/llm-d-inference-scheduler that referenced this pull request Apr 23, 2026
…es-sigs/gateway-api-inference-extension#2351)

* add integration test for checking the inferenceObjective and inferenceModelRewrites is being watched.

* continue refactor integration

* not use opts.IsIntegrationTest.

* inject PodMetricsClient via SetUp

* remove unneeded function, rename skipNameValidation, unexported runner.setup,  remove serverRunner in testHarness and setUp return

* encapsulate more to harness

* Update test/integration/epp/hermetic_test.go

pass in the context

---------

Co-authored-by: Abdullah Gharaibeh <[email protected]>
nirrozenbaum pushed a commit to llm-d/llm-d-inference-payload-processor that referenced this pull request Apr 28, 2026
…es-sigs/gateway-api-inference-extension#2351)

* add integration test for checking the inferenceObjective and inferenceModelRewrites is being watched.

* continue refactor integration

* not use opts.IsIntegrationTest.

* inject PodMetricsClient via SetUp

* remove unneeded function, rename skipNameValidation, unexported runner.setup,  remove serverRunner in testHarness and setUp return

* encapsulate more to harness

* Update test/integration/epp/hermetic_test.go

pass in the context

---------

Co-authored-by: Abdullah Gharaibeh <[email protected]>
elevran pushed a commit to llm-d/llm-d-inference-scheduler that referenced this pull request May 3, 2026
…es-sigs/gateway-api-inference-extension#2351)

* add integration test for checking the inferenceObjective and inferenceModelRewrites is being watched.

* continue refactor integration

* not use opts.IsIntegrationTest.

* inject PodMetricsClient via SetUp

* remove unneeded function, rename skipNameValidation, unexported runner.setup,  remove serverRunner in testHarness and setUp return

* encapsulate more to harness

* Update test/integration/epp/hermetic_test.go

pass in the context

---------

Co-authored-by: Abdullah Gharaibeh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

EPP startup setupDatastore is passing in wrong parameter in standalone mode

5 participants