Skip to content

KFLUXINRA-2965: Fix misbehaving kflux-fedora-01 cluster apps#10665

Open
enkeefe00 wants to merge 1 commit intoredhat-appstudio:mainfrom
enkeefe00:fix-fedora-deployment
Open

KFLUXINRA-2965: Fix misbehaving kflux-fedora-01 cluster apps#10665
enkeefe00 wants to merge 1 commit intoredhat-appstudio:mainfrom
enkeefe00:fix-fedora-deployment

Conversation

@enkeefe00
Copy link
Contributor

@enkeefe00 enkeefe00 commented Feb 25, 2026

Fix some kflux-fedora-01 applications

  • Add cluster name to ApplicationSets the new-cluster playbook missed
  • Create cluster etcd-defrag config
  • Remove logging applications until Konflux is up and running 100%

KFLUXINRA-2965

@enkeefe00 enkeefe00 self-assigned this Feb 25, 2026
@openshift-ci openshift-ci bot requested review from mftb and scoheb February 25, 2026 21:09
@enkeefe00 enkeefe00 force-pushed the fix-fedora-deployment branch from 97e78f7 to a25e8c1 Compare February 25, 2026 21:09
@github-actions
Copy link
Contributor

🤖 Gemini AI Assistant Available

Hi @enkeefe00! I'm here to help with your pull request. You can interact with me using the following commands:

Available Commands

  • @gemini-cli /review - Request a comprehensive code review

    • Example: @gemini-cli /review Please focus on security and performance
  • @gemini-cli <your question> - Ask me anything about the codebase

    • Example: @gemini-cli How can I improve this function?
    • Example: @gemini-cli What are the best practices for error handling here?

How to Use

  1. Simply type one of the commands above in a comment on this PR
  2. I'll analyze your code and provide detailed feedback
  3. You can track my progress in the workflow logs

Permissions

Only OWNER, MEMBER, or COLLABORATOR users can trigger my responses. This ensures secure and appropriate usage.


This message was automatically added to help you get started with the Gemini AI assistant. Feel free to delete this comment if you don't need assistance.

@github-actions
Copy link
Contributor

🤖 Hi @enkeefe00, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

@enkeefe00 enkeefe00 force-pushed the fix-fedora-deployment branch from a25e8c1 to 9df2fbe Compare February 25, 2026 22:04
@enkeefe00 enkeefe00 changed the title KFLUXINRA-2965: Add Fedora cluster to ApplicationSets missing it KFLUXINRA-2965: Fix misbehaving kflux-fedora-01 cluster apps Feb 25, 2026
@konflux-ci-qe-bot
Copy link

🤖 Pipeline Failure Analysis

Category: Infrastructure

The pipeline failed due to the end-to-end test timing out while an ArgoCD application refresh was pending, concurrently with critical infrastructure issues preventing connectivity to the OpenShift API server.

📋 Technical Details

Immediate Cause

The appstudio-e2e-tests/redhat-appstudio-e2e step was terminated by the Prow entrypoint after timing out. This was caused by an ArgoCD application "hard refresh" operation that remained pending for over seven minutes, preventing the completion of the ci:teste2e target.

Contributing Factors

Concurrent and subsequent diagnostic gather steps (gather-audit-logs, gather-extra, gather-must-gather, redhat-appstudio-gather) consistently failed to connect to the OpenShift API server. These failures were characterized by DNS resolution errors ("no such host") and TCP I/O timeouts against the API server's hostname and IP address, indicating that the target cluster became critically unhealthy or unreachable during or immediately after the primary test execution. This loss of connectivity likely contributed to the ArgoCD refresh hanging and prevented the collection of crucial diagnostic artifacts.

Impact

The end-to-end tests were unable to complete due to the application refresh stalling, and subsequent attempts to collect diagnostic information from the cluster failed entirely due to a fundamental lack of OpenShift API server connectivity. This combined failure prevents proper assessment of the cluster's state and root cause analysis of the initial application refresh issue.

🔍 Evidence

appstudio-e2e-tests/gather-audit-logs

Category: infrastructure
Root Cause: The DNS resolution for the OpenShift API server hostname failed, indicating a networking issue or an unavailable/misconfigured cluster. This prevented the must-gather utility from connecting to the cluster's API to collect audit logs.

Logs:

artifacts/appstudio-e2e-tests/gather-audit-logs/build-log.txt
Get "https://api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com:6443/apis/image.openshift.io/v1/namespaces/openshift/imagestreams/must-gather": dial tcp: lookup api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host
artifacts/appstudio-e2e-tests/gather-audit-logs/build-log.txt
error getting cluster version: Get "https://api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusterversions/version": dial tcp [REDACTED: Public IP (ipv4)]: i/o timeout
artifacts/appstudio-e2e-tests/gather-audit-logs/build-log.txt
error getting cluster operators: Get "https://api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusteroperators": dial tcp: lookup api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host
artifacts/appstudio-e2e-tests/gather-audit-logs/build-log.txt
Error running must-gather collection:
    creating temp namespace: Post "https://api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com:6443/api/v1/namespaces": dial tcp: lookup api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

appstudio-e2e-tests/gather-extra

Category: infrastructure
Root Cause: The root cause is a DNS resolution failure, specifically "no such host", which prevented the step from establishing a connection to the Kubernetes API server. This likely indicates an issue with the cluster's networking or DNS configuration.

Logs:

artifacts/appstudio-e2e-tests/gather-extra/build-log.txt
E0225 22:09:43.685294      28 memcache.go:265] couldn't get current server API group list: Get "https://api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com:6443/api?timeout=5s": dial tcp: lookup api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host
artifacts/appstudio-e2e-tests/gather-extra/build-log.txt
Unable to connect to the server: dial tcp: lookup api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

appstudio-e2e-tests/gather-must-gather

Category: infrastructure
Root Cause: The OpenShift API server was unreachable, leading to "i/o timeout" errors when attempting to connect via TCP and "no such host" errors during DNS resolution. This network connectivity issue prevented oc adm must-gather from collecting diagnostic information from the cluster.

Logs:

artifacts/appstudio-e2e-tests/gather-must-gather/build-log.txt
Error running must-gather collection:
    creating temp namespace: Post "https://api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com:6443/api/v1/namespaces": dial tcp [REDACTED: Public IP (ipv4)]: i/o timeout
artifacts/appstudio-e2e-tests/gather-must-gather/build-log.txt
E0225 22:09:26.306998      54 memcache.go:265] couldn't get current server API group list: Get "https://api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com:6443/api?timeout=32s": dial tcp [REDACTED: Public IP (ipv4)]: i/o timeout
artifacts/appstudio-e2e-tests/gather-must-gather/build-log.txt
E0225 22:09:26.319729      54 memcache.go:265] couldn't get current server API group list: Get "https://api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com:6443/api?timeout=32s": dial tcp: lookup api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

appstudio-e2e-tests/redhat-appstudio-e2e

Category: timeout
Root Cause: The CI job timed out because a "hard refresh" operation for one of the ArgoCD applications remained pending for an excessive duration (over 7 minutes), causing the entire ci:teste2e target to be terminated by the Prow entrypoint.

Logs:

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 398
[2026-02-25 21:50:28] [SUBSTEP] Waiting for refresh operations to complete
artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 400
[2026-02-25 21:50:34] [PROGRESS] Refresh: 0/39 complete | 39 still refreshing (10s elapsed)
artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 490
[2026-02-25 22:03:58] [PROGRESS] Refresh: 38/39 complete | 1 still refreshing (710s elapsed)
artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 491
{"component":"entrypoint","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:173","func":"sigs.k8s.io/prow/pkg/entrypoint.Options.ExecuteProcess","level":"error","msg":"Entrypoint received interrupt: terminated","severity":"error","time":"2026-02-25T22:04:02Z"}
artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 492
make: *** [Makefile:25: ci/test/e2e] Terminated
artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 497
{"component":"entrypoint","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:267","func":"sigs.k8s.io/prow/pkg/entrypoint.gracefullyTerminate","level":"error","msg":"Process did not exit before 15s grace period","severity":"error","time":"2026-02-25T22:04:17Z"}

appstudio-e2e-tests/redhat-appstudio-gather

Category: infrastructure
Root Cause: The CI agent could not resolve the hostname of the OpenShift API server due to a DNS lookup failure, indicating a problem with the network configuration or the DNS service in the environment where the tests were executed.

Logs:

artifacts/appstudio-e2e-tests/redhat-appstudio-gather/build-log.txt line 20
E0225 22:10:26.646752      31 memcache.go:265] couldn't get current server API group list: Get "https://api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com:6443/api?timeout=5s": dial tcp: lookup api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host
artifacts/appstudio-e2e-tests/redhat-appstudio-gather/build-log.txt line 49
Unable to connect to the server: dial tcp: lookup api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host
artifacts/appstudio-e2e-tests/redhat-appstudio-gather/build-log.txt line 679
Error running must-gather collection:
    creating temp namespace: Post "https://api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com:6443/api/v1/namespaces": dial tcp: lookup api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host
artifacts/appstudio-e2e-tests/redhat-appstudio-gather/build-log.txt line 689
error running backup collection: Get "https://api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com:6443/api?timeout=32s": dial tcp: lookup api.konflux-4-17-us-west-2-rgbqs.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

Analysis powered by prow-failure-analysis | Build: 2026770453082673152

@enkeefe00
Copy link
Contributor Author

/re-test

@enkeefe00 enkeefe00 force-pushed the fix-fedora-deployment branch from 9df2fbe to 136600c Compare February 27, 2026 14:05
Copy link
Contributor

@hugares hugares left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@enkeefe00 enkeefe00 force-pushed the fix-fedora-deployment branch from beac2f9 to d04ca08 Compare February 27, 2026 16:18
Copy link
Contributor

@hugares hugares left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Feb 27, 2026
@konflux-ci-qe-bot
Copy link

🤖 Pipeline Failure Analysis

Category: Build

The pipeline failed because a Git rebase operation encountered an unresolvable merge conflict in components/monitoring/prometheus/production/kflux-fedora-01/kustomization.yaml, preventing the successful bootstrapping of the cluster for e2e tests.

📋 Technical Details

Immediate Cause

The primary failure was a merge conflict during a Git rebase of commit 136600c74 in the infra-deployments repository. This conflict occurred in the components/monitoring/prometheus/production/kflux-fedora-01/kustomization.yaml file, rendering the rebase unsuccessful.

Contributing Factors

The failure to apply the necessary Git commit directly resulted in an inability to clone the infra-deployments repository, as indicated by the "failed to clone infra-deployments repository: exit status 1" error. This repository cloning issue was a direct consequence of the rebase conflict, not a separate contributing factor.

Impact

The unsuccessful Git operation and subsequent failure to clone the infra-deployments repository prevented the test environment from being properly bootstrapped. This directly blocked the execution of the appstudio-e2e-tests/redhat-appstudio-e2e step, causing the overall pipeline to fail before any e2e tests could run.

🔍 Evidence

appstudio-e2e-tests/redhat-appstudio-e2e

Category: build
Root Cause: A Git rebase operation on the infra-deployments repository failed due to an unresolvable merge conflict in components/monitoring/prometheus/production/kflux-fedora-01/kustomization.yaml when applying commit 136600c74. This prevented the successful bootstrapping of the cluster needed for the e2e tests.

Logs:

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt
CONFLICT (content): Merge conflict in components/monitoring/prometheus/production/kflux-fedora-01/kustomization.yaml
error: could not apply 136600c74... Fix some kflux-fedora-01 applications
artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt
Error: error when bootstrapping cluster: reached maximum number of attempts (2). error: failed to clone infra-deployments repository: exit status 1

Analysis powered by prow-failure-analysis | Build: 2027411499886055424

@openshift-ci
Copy link

openshift-ci bot commented Feb 27, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enkeefe00, hugares

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@konflux-ci-qe-bot
Copy link

🤖 Pipeline Failure Analysis

Category: Configuration

The pipeline failed to set up the e2e test environment due to a git rebase merge conflict in the infra-deployments repository, preventing the successful cloning of the necessary components.

📋 Technical Details

Immediate Cause

The appstudio-e2e-tests/redhat-appstudio-e2e step failed because a git rebase operation encountered an unresolvable content merge conflict in components/monitoring/prometheus/production/kflux-fedora-01/kustomization.yaml. This conflict occurred while attempting to apply a commit titled "Fix some kflux-fedora-01 applications" during the rebase of the fix-fedora-deployment branch.

Contributing Factors

The specific branch fix-fedora-deployment in the infra-deployments repository had a state that was incompatible with its base, leading to the merge conflict. The system retried the cloning operation multiple times, but the underlying git rebase failure persisted, indicating a fundamental issue with the repository's configuration for the build.

Impact

The failure to resolve the git rebase conflict directly prevented the successful cloning of the infra-deployments repository. This, in turn, blocked the bootstrapping of the cluster and the subsequent execution of the e2e tests, causing the entire appstudio-e2e-tests/redhat-appstudio-e2e step, and thus the overall job, to fail.

🔍 Evidence

appstudio-e2e-tests/redhat-appstudio-e2e

Category: configuration
Root Cause: The git rebase operation failed due to an unresolvable content merge conflict in components/monitoring/prometheus/production/kflux-fedora-01/kustomization.yaml within the infra-deployments repository's fix-fedora-deployment branch. This indicates an issue with the branch's state relative to its base, preventing a clean rebase required for the e2e test setup.

Logs:

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 202
CONFLICT (content): Merge conflict in components/monitoring/prometheus/production/kflux-fedora-01/kustomization.yaml
artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 203
error: could not apply 136600c74... Fix some kflux-fedora-01 applications
artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 210
I0227 16:24:16.397258   17476 utils.go:93] got an error: failed to clone infra-deployments repository: exit status 1 - will retry in 10s
artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 295
Error: error when bootstrapping cluster: reached maximum number of attempts (2). error: failed to clone infra-deployments repository: exit status 1
artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 296
make: *** [Makefile:25: ci/test/e2e] Error 1

Analysis powered by prow-failure-analysis | Build: 2027418153083998208

* Add cluster name to ApplicationSets the new-cluster playbook missed
* Create cluster etcd-defrag config
* Remove logging applications until Konflux is up and running 100%

KFLUXINRA-2965
@enkeefe00 enkeefe00 force-pushed the fix-fedora-deployment branch from d04ca08 to 52d19cb Compare February 27, 2026 17:14
@openshift-ci openshift-ci bot removed the lgtm label Feb 27, 2026
@openshift-ci
Copy link

openshift-ci bot commented Feb 27, 2026

New changes are detected. LGTM label has been removed.

@konflux-ci-qe-bot
Copy link

🤖 Pipeline Failure Analysis

Category: Test

The E2E test pipeline failed during its cleanup phase due to the inability to delete a temporary test namespace, which was blocked by a lingering Tekton PipelineRun resource that had not properly terminated.

📋 Technical Details

Immediate Cause

The appstudio-e2e-tests/redhat-appstudio-e2e step failed during its AfterAll cleanup hook. The test framework was unable to delete the temporary namespace konflux-wyuc within the expected timeframe due to a context deadline exceeded error.

Contributing Factors

The namespace deletion was blocked by a persistent Tekton PipelineRun resource identified as my-integration-test-sjbd-s8dvm. The analysis indicates this PipelineRun was not properly terminated or cleaned up, preventing the namespace from being removed and ultimately causing the cleanup operation to time out.

Impact

The failure to properly clean up test resources, specifically the temporary namespace and its contained PipelineRun, prevented the successful completion of the appstudio-e2e-tests/redhat-appstudio-e2e step, leading to the overall pipeline failure.

🔍 Evidence

appstudio-e2e-tests/redhat-appstudio-e2e

Category: test
Root Cause: The E2E test failed during its cleanup phase, unable to delete a temporary test namespace 'konflux-wyuc' because a Tekton PipelineRun resource 'my-integration-test-sjbd-s8dvm' was not properly terminated or cleaned up, causing a context deadline exceeded error.

Logs:

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 865-870
• [FAILED] [621.191 seconds]
[konflux-demo-suite] Maven project - Default build [AfterAll] when Release PipelineRun is completed should lead to Release CR being marked as succeeded [konflux, upstream-konflux]
  [AfterAll] /tmp/tmp.P8V4wz0r9t/tests/konflux-demo/konflux-demo.go:131
  [It] /tmp/tmp.P8V4wz0r9t/tests/konflux-demo/konflux-demo.go:416

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 876-882
  [FAILED] Expected success, but got an error:
      <*errors.errorString | 0xc001b92ce0>: 
      namespace was not deleted in expected timeframe: 'konflux-wyuc': context deadline exceeded. Remaining resources in namespace: ( pipelineruns: my-integration-test-sjbd-s8dvm )
      
      {
          s: "namespace was not deleted in expected timeframe: 'konflux-wyuc': context deadline exceeded. Remaining resources in namespace: ( pipelineruns: my-integration-test-sjbd-s8dvm )\n",
      }
artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 888
Error: error when running e2e tests: running "ginkgo --seed=1772213316 --timeout=1h30m0s --grace-period=30s --output-interceptor-mode=none --label-filter=konflux --no-color --json-report=e2e-report.json --junit-report=e2e-report.xml --procs=20 --nodes=20 --p --output-dir=/logs/artifacts ./cmd --" failed with exit code 1
artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 889
make: *** [Makefile:25: ci/test/e2e] Error 1

Analysis powered by prow-failure-analysis | Build: 2027432193000738816

@enkeefe00
Copy link
Contributor Author

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants