Skip to content

Conversation

@gustavolira
Copy link
Member

Description

Extract common functions from utils.sh into focused modules to improve maintainability and reduce code duplication.

Which issue(s) does this PR fix

PR acceptance criteria

Please make sure that the following steps are complete:

  • GitHub Actions are completed and successful
  • Unit Tests are updated and passing
  • E2E Tests are updated and passing
  • Documentation is updated if necessary (requirement for new features)
  • Add a screenshot if the change is UX/UI related

How to test changes / Special notes to the reviewer

@gustavolira
Copy link
Member Author

/review

@rhdh-qodo-merge
Copy link

rhdh-qodo-merge bot commented Dec 8, 2025

PR Reviewer Guide 🔍

(Review updated until commit b3ffc23)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🏅 Score: 86
🔒 Security concerns

Sensitive information handling:
common::oc_login echoes "OCP version: $(oc version)". If oc is configured to print server URLs, ensure no tokens are included. Also, several functions manipulate secrets (Postgres credentials) and patch files using sed; confirm no secrets are logged. Add explicit checks to avoid echoing sensitive env vars (like K8S_CLUSTER_TOKEN) on failures.

🔀 Multiple PR themes

Sub-PR theme: Extract common, wait, and operator modules and wire into utils.sh

Relevant files:

  • .ibm/pipelines/lib/common.sh
  • .ibm/pipelines/lib/k8s-wait.sh
  • .ibm/pipelines/lib/operators.sh
  • .ibm/pipelines/utils.sh

Sub-PR theme: Refactor jobs to use common::oc_login and new lib utilities

Relevant files:

  • .ibm/pipelines/jobs/auth-providers.sh
  • .ibm/pipelines/jobs/ocp-nightly.sh
  • .ibm/pipelines/jobs/ocp-operator.sh
  • .ibm/pipelines/jobs/ocp-pull.sh
  • .ibm/pipelines/jobs/upgrade.sh

Sub-PR theme: Documentation updates for modular pipeline libraries

Relevant files:

  • .ibm/pipelines/README.md
  • .ibm/pipelines/lib/README.md

⚡ Recommended focus areas for review

Matching Pods

k8s_wait::deployment picks the first pod whose name matches the resource string via grep. This can select unintended pods (partial matches, multiple replicas) and miss readiness of Deployments/StatefulSets. Consider scoping by label/owner or using rollout status for Deployment/StatefulSet to avoid flaky waits.

for ((i = 1; i <= max_attempts; i++)); do
  local pod_name
  pod_name=$(oc get pods -n "$namespace" | grep "$resource_name" | awk '{print $1}' | head -n 1)

  if [[ -n "$pod_name" ]]; then
    local pod_ready_status
    pod_ready_status=$(oc get pod "$pod_name" -n "$namespace" -o jsonpath='{.status.conditions[?(@.type=="Ready")].status}')

    if [[ "$pod_ready_status" == "True" ]]; then
      log::success "Resource '$resource_name' is ready in namespace '$namespace'"
      return 0
    fi

    local container_statuses
    container_statuses=$(oc get pod "$pod_name" -n "$namespace" -o jsonpath='{.status.containerStatuses}')
    log::debug "Pod '$pod_name' status: Ready=$pod_ready_status, Containers=$container_statuses"
  else
    log::debug "No pods found matching '$resource_name' in namespace '$namespace'"
  fi
📚 Focus areas based on broader codebase context

Missing Dependency Check

The function common::oc_login calls log::error, but this module does not source the logging library. Ensure lib/log.sh is sourced or avoid using log:: calls here to prevent runtime errors when used standalone. (Ref 8)

common::oc_login() {
  if ! command -v oc &> /dev/null; then
    log::error "oc command not found. Please install OpenShift CLI."
    return 1
  fi

  oc login --token="${K8S_CLUSTER_TOKEN}" --server="${K8S_CLUSTER_URL}" --insecure-skip-tls-verify=true
  echo "OCP version: $(oc version)"
  return 0

Reference reasoning: Other bash utilities in the referenced scripts rely on local logging helpers before calling log functions, demonstrating the pattern of ensuring dependencies are available within the script context.

📄 References
  1. redhat-developer/rhdh-operator/hack/validate-image-digests.sh [1-55]
  2. redhat-developer/rhdh-operator/hack/validate-image-digests.sh [56-69]
  3. redhat-developer/rhdh-operator/hack/validate-image-digests.sh [104-145]
  4. redhat-developer/rhdh-operator/hack/validate-image-digests.sh [71-103]
  5. redhat-developer/rhdh-operator/hack/validate-image-digests.sh [146-157]
  6. redhat-developer/rhdh-operator/hack/validate-image-digests.sh [159-190]
  7. redhat-developer/rhdh-operator/hack/validate-image-digests.sh [192-199]
  8. redhat-developer/rhdh-operator/config/profile/rhdh/plugin-infra/plugin-infra.sh [1-49]

@gustavolira
Copy link
Member Author

/review
-i
--pr_reviewer.require_score_review=true
--pr_reviewer.require_can_be_split_review=true
--pr_reviewer.require_soc2_ticket=false
--pr_reviewer.num_code_suggestions="2"

@rhdh-qodo-merge
Copy link

Persistent review updated to latest commit b3ffc23

@gustavolira
Copy link
Member Author

/improve
--pr_code_suggestions.num_code_suggestions_per_chunk="2"
--pr_code_suggestions.commitable_code_suggestions=true

@github-actions
Copy link
Contributor

github-actions bot commented Dec 8, 2025

@github-actions
Copy link
Contributor

github-actions bot commented Dec 8, 2025

@zdrapela
Copy link
Member

zdrapela commented Dec 9, 2025

Just testing my privileges /approve

@zdrapela
Copy link
Member

zdrapela commented Dec 9, 2025

/approve

@openshift-ci openshift-ci bot added the approved label Dec 9, 2025
@zdrapela
Copy link
Member

zdrapela commented Dec 9, 2025

/unapprove

@zdrapela
Copy link
Member

zdrapela commented Dec 9, 2025

/remove-approve

@openshift-ci openshift-ci bot removed the approved label Dec 9, 2025
@zdrapela
Copy link
Member

zdrapela commented Dec 9, 2025

/lgtm

Updated the scripts to use the k8s_wait::crd function for waiting on Backstage CRD availability after operator installation. This change enhances consistency across the operator.sh and auth-providers.sh scripts, ensuring that the necessary CRDs are ready before proceeding with subsequent operations.

Changes:
- Removed unnecessary blank lines for cleaner code.
- Added comments to clarify the purpose of CRD availability checks.
…yment checks

Updated the OpenShift authentication process to log errors if the login fails. Additionally, added return statements to the deployment and endpoint checks in the cluster setup functions to ensure proper error handling and prevent proceeding with operations if the checks fail.
Updated the deployment name in the cluster_setup_ocp_helm() and cluster_setup_ocp_operator() functions to ensure accurate waiting for the OpenShift Pipelines operator. Additionally, refined the pod name retrieval logic in the k8s_wait::deployment function for improved reliability in identifying pods. This change enhances the overall accuracy of the deployment checks.
Updated the timeout for the Tekton Pipelines webhook endpoint checks in the cluster_setup_ocp_helm() and cluster_setup_ocp_operator() functions from 30 seconds to 1800 seconds. This change ensures that the scripts have sufficient time to wait for the endpoint to become available, improving the reliability of the deployment process.
Replaced the hardcoded OPENSHIFT_OPERATORS_NAMESPACE with OPERATOR_NAMESPACE in the cluster_setup_ocp_helm() and cluster_setup_ocp_operator() functions. This change improves flexibility and consistency in the deployment checks for OpenShift Pipelines.
Enhanced the configmap retrieval process by implementing a wait mechanism that checks for the existence of the default dynamic plugins configmap created by the operator. This change allows the script to wait for up to 2.5 minutes before failing, improving reliability in scenarios where the configmap may take time to be created. Additionally, updated error logging to provide more context if the configmap is not found after the wait period.
Removed unnecessary blank lines and adjusted spacing in the configmap retrieval logic to enhance code readability. These changes contribute to a cleaner codebase without altering functionality.
Added detailed diagnostic information to the k8s_wait::deployment function. When a timeout occurs, the script now logs pod status, pod description, pod logs, and recent events in the specified namespace. This improvement aids in troubleshooting deployment issues by providing more context on the state of resources at the time of failure.
Refactored the plugin merging process to intelligently combine custom and default plugins, ensuring that custom plugins take precedence while avoiding conflicts. This change enhances the flexibility of plugin management and preserves the operator's default plugin states.
Updated the plugin merging process to extract default plugins into a separate array and ensure deduplication by package name. This change improves the clarity of the merging strategy and enhances the robustness of plugin management while maintaining custom plugin precedence.
@github-actions
Copy link
Contributor

github-actions bot commented Jan 6, 2026

The image is available at:

/test e2e-ocp-helm

@gustavolira
Copy link
Member Author

/test e2e-ocp-operator-nightly

Removed the complex merging process for orchestrator plugins and streamlined the function to focus on waiting for the Backstage resource to be ready. Updated logging to reflect the new approach, enhancing clarity and maintainability of the code.
@github-actions
Copy link
Contributor

github-actions bot commented Jan 6, 2026

🚫 Image Push Skipped.

The container image push was skipped because the build was skipped (either due to [skip-build] tag or no relevant changes with existing image)

/test e2e-ocp-helm

…nt function

Implemented a wait mechanism to ensure that the external PostgreSQL database is fully ready before proceeding with the RBAC instance deployment. This change enhances the reliability of the deployment process by allowing immediate connection for the database creation job, and includes error logging for deployment failures.
@github-actions
Copy link
Contributor

github-actions bot commented Jan 6, 2026

🚫 Image Push Skipped.

The container image push was skipped because the build was skipped (either due to [skip-build] tag or no relevant changes with existing image)

/test e2e-ocp-helm

@gustavolira
Copy link
Member Author

/test e2e-ocp-operator-nightly

… and deployment

Enhanced the `enable_orchestrator_plugins_op` function to include error handling for the Backstage resource check, logging an error if the resource is not found. Additionally, implemented a wait mechanism in the `deploy_rhdh_operator` function to ensure the Backstage deployment is created by the operator, with appropriate logging for success and warnings for potential asynchronous creation.
@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2026

🚫 Image Push Skipped.

The container image push was skipped because the build was skipped (either due to [skip-build] tag or no relevant changes with existing image)

/test e2e-ocp-helm

@sonarqubecloud
Copy link

sonarqubecloud bot commented Jan 7, 2026

@gustavolira
Copy link
Member Author

/test e2e-ocp-operator-nightly

@openshift-ci
Copy link

openshift-ci bot commented Jan 7, 2026

@gustavolira: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aks-helm-nightly 7fef4a8 link false /test e2e-aks-helm-nightly
ci/prow/e2e-ocp-operator-nightly 1244c92 link false /test e2e-ocp-operator-nightly

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants