KFLUXINRA-2965: Fix misbehaving kflux-fedora-01 cluster apps by enkeefe00 · Pull Request #10665 · redhat-appstudio/infra-deployments

enkeefe00 · 2026-02-25T21:09:00Z

Fix some kflux-fedora-01 applications

Add cluster name to ApplicationSets the new-cluster playbook missed
Create cluster etcd-defrag config
Remove logging applications until Konflux is up and running 100%

KFLUXINRA-2965

github-actions · 2026-02-25T21:09:11Z

🤖 Gemini AI Assistant Available

Hi @enkeefe00! I'm here to help with your pull request. You can interact with me using the following commands:

Available Commands

@gemini-cli /review - Request a comprehensive code review
- Example: @gemini-cli /review Please focus on security and performance
@gemini-cli <your question> - Ask me anything about the codebase
- Example: @gemini-cli How can I improve this function?
- Example: @gemini-cli What are the best practices for error handling here?

How to Use

Simply type one of the commands above in a comment on this PR
I'll analyze your code and provide detailed feedback
You can track my progress in the workflow logs

Permissions

Only OWNER, MEMBER, or COLLABORATOR users can trigger my responses. This ensures secure and appropriate usage.

This message was automatically added to help you get started with the Gemini AI assistant. Feel free to delete this comment if you don't need assistance.

github-actions · 2026-02-25T21:09:13Z

🤖 Hi @enkeefe00, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

konflux-ci-qe-bot · 2026-02-25T22:12:59Z

🤖 Pipeline Failure Analysis

Category: Infrastructure

The pipeline failed due to the end-to-end test timing out while an ArgoCD application refresh was pending, concurrently with critical infrastructure issues preventing connectivity to the OpenShift API server.

📋 Technical Details

Immediate Cause

The appstudio-e2e-tests/redhat-appstudio-e2e step was terminated by the Prow entrypoint after timing out. This was caused by an ArgoCD application "hard refresh" operation that remained pending for over seven minutes, preventing the completion of the ci:teste2e target.

Contributing Factors

Concurrent and subsequent diagnostic gather steps (gather-audit-logs, gather-extra, gather-must-gather, redhat-appstudio-gather) consistently failed to connect to the OpenShift API server. These failures were characterized by DNS resolution errors ("no such host") and TCP I/O timeouts against the API server's hostname and IP address, indicating that the target cluster became critically unhealthy or unreachable during or immediately after the primary test execution. This loss of connectivity likely contributed to the ArgoCD refresh hanging and prevented the collection of crucial diagnostic artifacts.

Impact

The end-to-end tests were unable to complete due to the application refresh stalling, and subsequent attempts to collect diagnostic information from the cluster failed entirely due to a fundamental lack of OpenShift API server connectivity. This combined failure prevents proper assessment of the cluster's state and root cause analysis of the initial application refresh issue.

🔍 Evidence

appstudio-e2e-tests/gather-audit-logs

Category: infrastructure
Root Cause: The DNS resolution for the OpenShift API server hostname failed, indicating a networking issue or an unavailable/misconfigured cluster. This prevented the must-gather utility from connecting to the cluster's API to collect audit logs.

Logs:

artifacts/appstudio-e2e-tests/gather-audit-logs/build-log.txt

Get "https://api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com:6443/apis/image.openshift.io/v1/namespaces/openshift/imagestreams/must-gather": dial tcp: lookup api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

artifacts/appstudio-e2e-tests/gather-audit-logs/build-log.txt

error getting cluster version: Get "https://api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusterversions/version": dial tcp [REDACTED: Public IP (ipv4)]: i/o timeout

artifacts/appstudio-e2e-tests/gather-audit-logs/build-log.txt

error getting cluster operators: Get "https://api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusteroperators": dial tcp: lookup api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

artifacts/appstudio-e2e-tests/gather-audit-logs/build-log.txt

Error running must-gather collection:
    creating temp namespace: Post "https://api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com:6443/api/v1/namespaces": dial tcp: lookup api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

appstudio-e2e-tests/gather-extra

Category: infrastructure
Root Cause: The root cause is a DNS resolution failure, specifically "no such host", which prevented the step from establishing a connection to the Kubernetes API server. This likely indicates an issue with the cluster's networking or DNS configuration.

Logs:

artifacts/appstudio-e2e-tests/gather-extra/build-log.txt

E0225 22:09:43.685294      28 memcache.go:265] couldn't get current server API group list: Get "https://api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com:6443/api?timeout=5s": dial tcp: lookup api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

artifacts/appstudio-e2e-tests/gather-extra/build-log.txt

Unable to connect to the server: dial tcp: lookup api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

appstudio-e2e-tests/gather-must-gather

Category: infrastructure
Root Cause: The OpenShift API server was unreachable, leading to "i/o timeout" errors when attempting to connect via TCP and "no such host" errors during DNS resolution. This network connectivity issue prevented oc adm must-gather from collecting diagnostic information from the cluster.

Logs:

artifacts/appstudio-e2e-tests/gather-must-gather/build-log.txt

Error running must-gather collection:
    creating temp namespace: Post "https://api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com:6443/api/v1/namespaces": dial tcp [REDACTED: Public IP (ipv4)]: i/o timeout

artifacts/appstudio-e2e-tests/gather-must-gather/build-log.txt

E0225 22:09:26.306998      54 memcache.go:265] couldn't get current server API group list: Get "https://api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com:6443/api?timeout=32s": dial tcp [REDACTED: Public IP (ipv4)]: i/o timeout

artifacts/appstudio-e2e-tests/gather-must-gather/build-log.txt

E0225 22:09:26.319729      54 memcache.go:265] couldn't get current server API group list: Get "https://api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com:6443/api?timeout=32s": dial tcp: lookup api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

appstudio-e2e-tests/redhat-appstudio-e2e

Category: timeout
Root Cause: The CI job timed out because a "hard refresh" operation for one of the ArgoCD applications remained pending for an excessive duration (over 7 minutes), causing the entire ci:teste2e target to be terminated by the Prow entrypoint.

Logs:

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 398

[2026-02-25 21:50:28] [SUBSTEP] Waiting for refresh operations to complete

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 400

[2026-02-25 21:50:34] [PROGRESS] Refresh: 0/39 complete | 39 still refreshing (10s elapsed)

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 490

[2026-02-25 22:03:58] [PROGRESS] Refresh: 38/39 complete | 1 still refreshing (710s elapsed)

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 491

{"component":"entrypoint","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:173","func":"sigs.k8s.io/prow/pkg/entrypoint.Options.ExecuteProcess","level":"error","msg":"Entrypoint received interrupt: terminated","severity":"error","time":"2026-02-25T22:04:02Z"}

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 492

make: *** [Makefile:25: ci/test/e2e] Terminated

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 497

{"component":"entrypoint","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:267","func":"sigs.k8s.io/prow/pkg/entrypoint.gracefullyTerminate","level":"error","msg":"Process did not exit before 15s grace period","severity":"error","time":"2026-02-25T22:04:17Z"}

appstudio-e2e-tests/redhat-appstudio-gather

Category: infrastructure
Root Cause: The CI agent could not resolve the hostname of the OpenShift API server due to a DNS lookup failure, indicating a problem with the network configuration or the DNS service in the environment where the tests were executed.

Logs:

artifacts/appstudio-e2e-tests/redhat-appstudio-gather/build-log.txt line 20

E0225 22:10:26.646752      31 memcache.go:265] couldn't get current server API group list: Get "https://api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com:6443/api?timeout=5s": dial tcp: lookup api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

artifacts/appstudio-e2e-tests/redhat-appstudio-gather/build-log.txt line 49

Unable to connect to the server: dial tcp: lookup api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

artifacts/appstudio-e2e-tests/redhat-appstudio-gather/build-log.txt line 679

Error running must-gather collection:
    creating temp namespace: Post "https://api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com:6443/api/v1/namespaces": dial tcp: lookup api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

artifacts/appstudio-e2e-tests/redhat-appstudio-gather/build-log.txt line 689

error running backup collection: Get "https://api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com:6443/api?timeout=32s": dial tcp: lookup api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

Analysis powered by prow-failure-analysis | Build: 2026770453082673152

enkeefe00 · 2026-02-26T14:36:37Z

/re-test

.../all-clusters/infra-deployments/monitoring-workload-logging/monitoring-workload-logging.yaml

hugares

/lgtm

components/monitoring/logging/production/kflux-fedora-01/kustomization.yaml

hugares

/lgtm

konflux-ci-qe-bot · 2026-02-27T16:20:15Z

🤖 Pipeline Failure Analysis

Category: Build

The pipeline failed because a Git rebase operation encountered an unresolvable merge conflict in components/monitoring/prometheus/production/kflux-fedora-01/kustomization.yaml, preventing the successful bootstrapping of the cluster for e2e tests.

📋 Technical Details

Immediate Cause

The primary failure was a merge conflict during a Git rebase of commit 136600c74 in the infra-deployments repository. This conflict occurred in the components/monitoring/prometheus/production/kflux-fedora-01/kustomization.yaml file, rendering the rebase unsuccessful.

Contributing Factors

The failure to apply the necessary Git commit directly resulted in an inability to clone the infra-deployments repository, as indicated by the "failed to clone infra-deployments repository: exit status 1" error. This repository cloning issue was a direct consequence of the rebase conflict, not a separate contributing factor.

Impact

The unsuccessful Git operation and subsequent failure to clone the infra-deployments repository prevented the test environment from being properly bootstrapped. This directly blocked the execution of the appstudio-e2e-tests/redhat-appstudio-e2e step, causing the overall pipeline to fail before any e2e tests could run.

🔍 Evidence

appstudio-e2e-tests/redhat-appstudio-e2e

Category: build
Root Cause: A Git rebase operation on the infra-deployments repository failed due to an unresolvable merge conflict in components/monitoring/prometheus/production/kflux-fedora-01/kustomization.yaml when applying commit 136600c74. This prevented the successful bootstrapping of the cluster needed for the e2e tests.

Logs:

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt

CONFLICT (content): Merge conflict in components/monitoring/prometheus/production/kflux-fedora-01/kustomization.yaml
error: could not apply 136600c74... Fix some kflux-fedora-01 applications

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt

Error: error when bootstrapping cluster: reached maximum number of attempts (2). error: failed to clone infra-deployments repository: exit status 1

Analysis powered by prow-failure-analysis | Build: 2027411499886055424

openshift-ci · 2026-02-27T16:20:18Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enkeefe00, hugares

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [hugares]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

konflux-ci-qe-bot · 2026-02-27T16:47:04Z

🤖 Pipeline Failure Analysis

Category: Configuration

The pipeline failed to set up the e2e test environment due to a git rebase merge conflict in the infra-deployments repository, preventing the successful cloning of the necessary components.

📋 Technical Details

Immediate Cause

The appstudio-e2e-tests/redhat-appstudio-e2e step failed because a git rebase operation encountered an unresolvable content merge conflict in components/monitoring/prometheus/production/kflux-fedora-01/kustomization.yaml. This conflict occurred while attempting to apply a commit titled "Fix some kflux-fedora-01 applications" during the rebase of the fix-fedora-deployment branch.

Contributing Factors

The specific branch fix-fedora-deployment in the infra-deployments repository had a state that was incompatible with its base, leading to the merge conflict. The system retried the cloning operation multiple times, but the underlying git rebase failure persisted, indicating a fundamental issue with the repository's configuration for the build.

Impact

The failure to resolve the git rebase conflict directly prevented the successful cloning of the infra-deployments repository. This, in turn, blocked the bootstrapping of the cluster and the subsequent execution of the e2e tests, causing the entire appstudio-e2e-tests/redhat-appstudio-e2e step, and thus the overall job, to fail.

🔍 Evidence

appstudio-e2e-tests/redhat-appstudio-e2e

Category: configuration
Root Cause: The git rebase operation failed due to an unresolvable content merge conflict in components/monitoring/prometheus/production/kflux-fedora-01/kustomization.yaml within the infra-deployments repository's fix-fedora-deployment branch. This indicates an issue with the branch's state relative to its base, preventing a clean rebase required for the e2e test setup.

Logs:

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 202

CONFLICT (content): Merge conflict in components/monitoring/prometheus/production/kflux-fedora-01/kustomization.yaml

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 203

error: could not apply 136600c74... Fix some kflux-fedora-01 applications

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 210

I0227 16:24:16.397258   17476 utils.go:93] got an error: failed to clone infra-deployments repository: exit status 1 - will retry in 10s

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 295

Error: error when bootstrapping cluster: reached maximum number of attempts (2). error: failed to clone infra-deployments repository: exit status 1

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 296

make: *** [Makefile:25: ci/test/e2e] Error 1

Analysis powered by prow-failure-analysis | Build: 2027418153083998208

* Add cluster name to ApplicationSets the new-cluster playbook missed * Create cluster etcd-defrag config * Remove logging applications until Konflux is up and running 100% KFLUXINRA-2965

openshift-ci · 2026-02-27T17:14:38Z

New changes are detected. LGTM label has been removed.

konflux-ci-qe-bot · 2026-02-27T19:07:50Z

🤖 Pipeline Failure Analysis

Category: Test

The E2E test pipeline failed during its cleanup phase due to the inability to delete a temporary test namespace, which was blocked by a lingering Tekton PipelineRun resource that had not properly terminated.

📋 Technical Details

Immediate Cause

The appstudio-e2e-tests/redhat-appstudio-e2e step failed during its AfterAll cleanup hook. The test framework was unable to delete the temporary namespace konflux-wyuc within the expected timeframe due to a context deadline exceeded error.

Contributing Factors

The namespace deletion was blocked by a persistent Tekton PipelineRun resource identified as my-integration-test-sjbd-s8dvm. The analysis indicates this PipelineRun was not properly terminated or cleaned up, preventing the namespace from being removed and ultimately causing the cleanup operation to time out.

Impact

The failure to properly clean up test resources, specifically the temporary namespace and its contained PipelineRun, prevented the successful completion of the appstudio-e2e-tests/redhat-appstudio-e2e step, leading to the overall pipeline failure.

🔍 Evidence

appstudio-e2e-tests/redhat-appstudio-e2e

Category: test
Root Cause: The E2E test failed during its cleanup phase, unable to delete a temporary test namespace 'konflux-wyuc' because a Tekton PipelineRun resource 'my-integration-test-sjbd-s8dvm' was not properly terminated or cleaned up, causing a context deadline exceeded error.

Logs:

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 865-870

• [FAILED] [621.191 seconds]
[konflux-demo-suite] Maven project - Default build [AfterAll] when Release PipelineRun is completed should lead to Release CR being marked as succeeded [konflux, upstream-konflux]
  [AfterAll] /tmp/tmp.P8V4wz0r9t/tests/konflux-demo/konflux-demo.go:131
  [It] /tmp/tmp.P8V4wz0r9t/tests/konflux-demo/konflux-demo.go:416

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 876-882

  [FAILED] Expected success, but got an error:
      <*errors.errorString | 0xc001b92ce0>: 
      namespace was not deleted in expected timeframe: 'konflux-wyuc': context deadline exceeded. Remaining resources in namespace: ( pipelineruns: my-integration-test-sjbd-s8dvm )
      
      {
          s: "namespace was not deleted in expected timeframe: 'konflux-wyuc': context deadline exceeded. Remaining resources in namespace: ( pipelineruns: my-integration-test-sjbd-s8dvm )\n",
      }

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 888

Error: error when running e2e tests: running "ginkgo --seed=1772213316 --timeout=1h30m0s --grace-period=30s --output-interceptor-mode=none --label-filter=konflux --no-color --json-report=e2e-report.json --junit-report=e2e-report.xml --procs=20 --nodes=20 --p --output-dir=/logs/artifacts ./cmd --" failed with exit code 1

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 889

make: *** [Makefile:25: ci/test/e2e] Error 1

Analysis powered by prow-failure-analysis | Build: 2027432193000738816

enkeefe00 · 2026-02-27T20:28:41Z

/retest

enkeefe00 self-assigned this Feb 25, 2026

openshift-ci bot requested review from mftb and scoheb February 25, 2026 21:09

enkeefe00 force-pushed the fix-fedora-deployment branch from 97e78f7 to a25e8c1 Compare February 25, 2026 21:09

github-actions bot added environment/development environment/production environment/staging infra/hold-production labels Feb 25, 2026

enkeefe00 force-pushed the fix-fedora-deployment branch from a25e8c1 to 9df2fbe Compare February 25, 2026 22:04

enkeefe00 changed the title ~~KFLUXINRA-2965: Add Fedora cluster to ApplicationSets missing it~~ KFLUXINRA-2965: Fix misbehaving kflux-fedora-01 cluster apps Feb 25, 2026

hugares reviewed Feb 27, 2026

View reviewed changes

.../all-clusters/infra-deployments/monitoring-workload-logging/monitoring-workload-logging.yaml Show resolved Hide resolved

enkeefe00 force-pushed the fix-fedora-deployment branch from 9df2fbe to 136600c Compare February 27, 2026 14:05

hugares approved these changes Feb 27, 2026

View reviewed changes

openshift-ci bot assigned hugares Feb 27, 2026

openshift-ci bot added lgtm approved labels Feb 27, 2026

hugares removed the infra/hold-production label Feb 27, 2026

openshift-ci bot removed the lgtm label Feb 27, 2026

github-actions bot added the infra/hold-production label Feb 27, 2026

hugares reviewed Feb 27, 2026

View reviewed changes

components/monitoring/logging/production/kflux-fedora-01/kustomization.yaml Show resolved Hide resolved

enkeefe00 force-pushed the fix-fedora-deployment branch from beac2f9 to d04ca08 Compare February 27, 2026 16:18

hugares approved these changes Feb 27, 2026

View reviewed changes

openshift-ci bot added the lgtm label Feb 27, 2026

hugares removed the infra/hold-production label Feb 27, 2026

Fix some kflux-fedora-01 applications

52d19cb

* Add cluster name to ApplicationSets the new-cluster playbook missed * Create cluster etcd-defrag config * Remove logging applications until Konflux is up and running 100% KFLUXINRA-2965

enkeefe00 force-pushed the fix-fedora-deployment branch from d04ca08 to 52d19cb Compare February 27, 2026 17:14

openshift-ci bot removed the lgtm label Feb 27, 2026

github-actions bot added the infra/hold-production label Feb 27, 2026

Conversation

enkeefe00 commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 25, 2026

🤖 Gemini AI Assistant Available

Available Commands

How to Use

Permissions

Uh oh!

github-actions bot commented Feb 25, 2026

Uh oh!

konflux-ci-qe-bot commented Feb 25, 2026

🤖 Pipeline Failure Analysis

📋 Technical Details

Immediate Cause

Contributing Factors

Impact

appstudio-e2e-tests/gather-audit-logs

appstudio-e2e-tests/gather-extra

appstudio-e2e-tests/gather-must-gather

appstudio-e2e-tests/redhat-appstudio-e2e

appstudio-e2e-tests/redhat-appstudio-gather

Uh oh!

enkeefe00 commented Feb 26, 2026

Uh oh!

Uh oh!

hugares left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hugares left a comment

Choose a reason for hiding this comment

Uh oh!

konflux-ci-qe-bot commented Feb 27, 2026

🤖 Pipeline Failure Analysis

📋 Technical Details

Immediate Cause

Contributing Factors

Impact

appstudio-e2e-tests/redhat-appstudio-e2e

Uh oh!

openshift-ci bot commented Feb 27, 2026

Uh oh!

konflux-ci-qe-bot commented Feb 27, 2026

🤖 Pipeline Failure Analysis

📋 Technical Details

Immediate Cause

Contributing Factors

Impact

appstudio-e2e-tests/redhat-appstudio-e2e

Uh oh!

openshift-ci bot commented Feb 27, 2026

Uh oh!

konflux-ci-qe-bot commented Feb 27, 2026

🤖 Pipeline Failure Analysis

📋 Technical Details

Immediate Cause

Contributing Factors

Impact

appstudio-e2e-tests/redhat-appstudio-e2e

Uh oh!

enkeefe00 commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

enkeefe00 commented Feb 25, 2026 •

edited

Loading