Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
02bf12e
add workflow test
jnels124 Oct 1, 2025
7e537a0
add trigger
jnels124 Oct 1, 2025
ec3f529
add trigger
jnels124 Oct 1, 2025
cc6847b
add audience
jnels124 Oct 1, 2025
09be118
trigger workflow
jnels124 Oct 1, 2025
c6446b2
test permission change
jnels124 Oct 1, 2025
966530e
test permission change
jnels124 Oct 1, 2025
71f1f2f
test permission change
jnels124 Oct 1, 2025
0422737
fix config
jnels124 Oct 1, 2025
5abfcd9
remove invalid flag
jnels124 Oct 1, 2025
395b9a9
resolve 141 error
jnels124 Oct 1, 2025
255e0cf
test cluster
jnels124 Oct 2, 2025
ac2e0d3
test copy setup
jnels124 Oct 2, 2025
ddcd77a
use staging
jnels124 Oct 2, 2025
2defcf2
use common kube config
jnels124 Oct 2, 2025
6737a8e
move env to job level
jnels124 Oct 2, 2025
ff0e074
set kube config at runtime
jnels124 Oct 2, 2025
4f4ee8e
pin kubeconfig
jnels124 Oct 2, 2025
1b8d3bc
move pin source step
jnels124 Oct 2, 2025
3fbbf3e
check syntax
jnels124 Oct 2, 2025
1b74e14
run copy script
jnels124 Oct 2, 2025
cd8aad9
cd before running
jnels124 Oct 2, 2025
cec431c
Merge remote-tracking branch 'origin/main' into copy-env-action
jnels124 Oct 2, 2025
f26a796
test perm change + cleanup
jnels124 Oct 2, 2025
5fb9b82
mask sensitive values + install ksd
jnels124 Oct 2, 2025
7837356
fix snapshot name
jnels124 Oct 2, 2025
b5171fc
use auth provider for longer auth window
jnels124 Oct 3, 2025
64d044b
move scripts and use gcloud auth plugin
jnels124 Oct 3, 2025
027165b
fix path
jnels124 Oct 3, 2025
2d8b3ce
update docs prepare action
jnels124 Oct 3, 2025
ff5cc51
cleanup
jnels124 Oct 3, 2025
63745b8
address pr feedback
jnels124 Oct 7, 2025
1c9d555
add retries + workflow trigger
jnels124 Oct 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
237 changes: 237 additions & 0 deletions .github/workflows/copy-stackgres-cluster.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,237 @@
# SPDX-License-Identifier: Apache-2.0

name: Copy StackGres Citus Cluster

concurrency:
group: ${{ github.workflow }}-${{ inputs.target_cluster_name }}
cancel-in-progress: true

on:
workflow_call:
inputs:
source_cluster_location:
type: string
default: "us-central1"
required: true
source_cluster_name:
type: string
default: "mainnet-na"
required: true
source_project:
type: string
default: "prod"
required: true
target_cluster_location:
type: string
default: "us-central1"
required: true
target_cluster_name:
type: string
default: "mainnet-staging-na"
required: true
target_default_pool:
type: string
default: "mainnet-staging-na"
required: true
target_project:
type: string
default: "nonprod"
required: true
teardown_target:
type: boolean
default: true
required: true
secrets:
GH_ACTIONS_KUBECTL_MAIN_PROJECT_ID:
required: true
GH_ACTIONS_KUBECTL_GCP_SERVICE_ACCOUNT:
required: true
GH_ACTIONS_KUBECTL_STAGING_PROJECT_ID:
required: true
GH_ACTIONS_KUBECTL_WORKLOAD_ID_PROVIDER:
required: true
workflow_dispatch:
inputs:
source_cluster_location:
default: "us-central1"
description: "Source: GKE location (zone or region)"
required: true
source_cluster_name:
description: "Source: GKE cluster name"
default: "mainnet-na"
required: true
source_project:
default: prod
description: "Source: GCP project"
options: [prod, nonprod]
required: true
type: choice
target_cluster_location:
default: "us-central1"
description: "Target: GKE location (zone or region)"
required: true
target_cluster_name:
default: "mainnet-staging-na"
description: "Target: GKE cluster name"
required: true
target_default_pool:
default: "mainnet-staging-na"
description: "Target: default pool name"
required: true
target_project:
default: nonprod
description: "Target: GCP project"
options: [prod, nonprod]
required: true
type: choice
teardown_target:
default: true
description: "Tear down target cluster after k6 tests run"
required: true
type: boolean

permissions:
id-token: write
contents: read

jobs:
run-copy:
name: Copy Citus from SOURCE ➜ TARGET
runs-on: hiero-mirror-node-linux-medium
env:
FLUX_VERSION: "2.3.0"
GCP_SNAPSHOT_PROJECT: ${{ inputs.source_project == 'prod' && secrets.GH_ACTIONS_KUBECTL_MAIN_PROJECT_ID || secrets.GH_ACTIONS_KUBECTL_STAGING_PROJECT_ID }}
GCP_K8S_SOURCE_CLUSTER_NAME: ${{ inputs.source_cluster_name }}
GCP_K8S_SOURCE_CLUSTER_REGION: ${{ inputs.source_cluster_location }}
GCP_K8S_TARGET_CLUSTER_NAME: ${{ inputs.target_cluster_name }}
GCP_K8S_TARGET_CLUSTER_REGION: ${{ inputs.target_cluster_location }}
GCP_TARGET_PROJECT: ${{ inputs.target_project == 'prod' && secrets.GH_ACTIONS_KUBECTL_MAIN_PROJECT_ID || secrets.GH_ACTIONS_KUBECTL_STAGING_PROJECT_ID }}
K8S_SOURCE_CLUSTER_CONTEXT: "source_gke_context"
K8S_TARGET_CLUSTER_CONTEXT: "target_gke_context"
PINNED_KUBECONFIG: ${{ github.workspace }}/.kube/config
SA_EMAIL: ${{ secrets.GH_ACTIONS_KUBECTL_GCP_SERVICE_ACCOUNT }}
WIF_PROVIDER: ${{ secrets.GH_ACTIONS_KUBECTL_WORKLOAD_ID_PROVIDER }}

steps:
- name: Checkout
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8

- name: Ensure jq is available
run: jq --version || (sudo apt-get update && sudo apt-get install -y jq)

- name: Setup gcloud + kubectl + GKE auth plugin
uses: google-github-actions/setup-gcloud@e427ad8a34f8676edf47cf7d7925499adf3eb74f
with:
install_components: gke-gcloud-auth-plugin, kubectl

- name: Create application default credentials
shell: bash
run: |
set -euo pipefail
: "${ACTIONS_ID_TOKEN_REQUEST_URL:?missing OIDC URL}"
: "${ACTIONS_ID_TOKEN_REQUEST_TOKEN:?missing OIDC token}"
ADC_DIR="${RUNNER_TEMP}/wif-adc"
mkdir -p "$ADC_DIR"
SUBJECT_TOKEN_FILE="${ADC_DIR}/subject.jwt"
ADC_JSON="${ADC_DIR}/adc.json"
: > "$SUBJECT_TOKEN_FILE"
AUD="//iam.googleapis.com/${WIF_PROVIDER}"
cat >"$ADC_JSON" <<EOF
{
"type": "external_account",
"audience": "${AUD}",
"subject_token_type": "urn:ietf:params:oauth:token-type:jwt",
"token_url": "https://sts.googleapis.com/v1/token",
"service_account_impersonation_url": "https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/${SA_EMAIL}:generateAccessToken",
"credential_source": {
"file": "${SUBJECT_TOKEN_FILE}"
}
}
EOF
export GOOGLE_APPLICATION_CREDENTIALS="$ADC_JSON"
export CLOUDSDK_AUTH_CREDENTIAL_FILE_OVERRIDE="$ADC_JSON"
{
echo "GOOGLE_APPLICATION_CREDENTIALS=$ADC_JSON"
echo "CLOUDSDK_AUTH_CREDENTIAL_FILE_OVERRIDE=$ADC_JSON"
echo "SUBJECT_TOKEN_FILE=$SUBJECT_TOKEN_FILE"
echo "AUD=$AUD"
} >> "$GITHUB_ENV"
ENC_AUD="$(jq -rn --arg s "$AUD" '$s|@uri')"
TOKEN_JSON="$(curl -sSf -H "Authorization: Bearer ${ACTIONS_ID_TOKEN_REQUEST_TOKEN}" "${ACTIONS_ID_TOKEN_REQUEST_URL}&audience=${ENC_AUD}")"
OIDC="$(jq -r '.value // empty' <<<"$TOKEN_JSON")"
test -n "$OIDC"
printf '%s' "$OIDC" > "$SUBJECT_TOKEN_FILE"
gcloud auth application-default print-access-token >/dev/null
gcloud config set container/use_application_default_credentials true

- name: Get GKE credentials (source)
shell: bash
run: |
set -euo pipefail
gcloud container clusters get-credentials "${GCP_K8S_SOURCE_CLUSTER_NAME}" \
--region "${GCP_K8S_SOURCE_CLUSTER_REGION}" \
--project "${GCP_SNAPSHOT_PROJECT}"
SRC_CURR="$(kubectl config current-context)"
kubectl config rename-context "${SRC_CURR}" "${K8S_SOURCE_CLUSTER_CONTEXT}"

- name: Get GKE credentials (target)
shell: bash
run: |
set -euo pipefail
gcloud container clusters get-credentials "${GCP_K8S_TARGET_CLUSTER_NAME}" \
--region "${GCP_K8S_TARGET_CLUSTER_REGION}" \
--project "${GCP_TARGET_PROJECT}"
TGT_CURR="$(kubectl config current-context)"
kubectl config rename-context "${TGT_CURR}" "${K8S_TARGET_CLUSTER_CONTEXT}"

- name: Cache Cloud SDK paths
shell: bash
run: |
set -euo pipefail
GCLOUD_BIN="$(command -v gcloud)"
KUBECTL_BIN="$(command -v kubectl)"
CLOUDSDK_BIN_DIR="$(dirname "$GCLOUD_BIN")"
echo "GCLOUD_BIN=$GCLOUD_BIN" >> "$GITHUB_ENV"
echo "KUBECTL_BIN=$KUBECTL_BIN" >> "$GITHUB_ENV"
echo "CLOUDSDK_BIN_DIR=$CLOUDSDK_BIN_DIR" >> "$GITHUB_ENV"

- name: Setup Testkube CLI
uses: kubeshop/setup-testkube@970d643ec9ecbe5707049c1d65b851da72aab3d9
with:
version: v2.3.0

- name: Setup Flux CLI
uses: fluxcd/flux2/action@ca29bb1a41d662495cbf3a8ee6dba7f088ae7310
with:
version: v2.3.0

- name: Execute copy script
shell: bash
env:
AUTO_CONFIRM: "true"
DEFAULT_POOL_NAME: ${{ inputs.target_default_pool }}
WAIT_FOR_K6: "true"
run: |
set -Eeuo pipefail
export PATH="${CLOUDSDK_BIN_DIR}:${PATH}"
hash -r
(
set -euo pipefail
while true; do
ENC_AUD="$(jq -rn --arg s "$AUD" '$s|@uri')"
if TOKEN_JSON="$(curl -sS -H "Authorization: Bearer ${ACTIONS_ID_TOKEN_REQUEST_TOKEN}" "${ACTIONS_ID_TOKEN_REQUEST_URL}&audience=${ENC_AUD}")"; then
OIDC="$(jq -r '.value // empty' <<<"$TOKEN_JSON")"
if [[ -n "$OIDC" ]]; then
printf '%s' "$OIDC" > "$SUBJECT_TOKEN_FILE"
gcloud auth application-default print-access-token >/dev/null 2>&1 || true
fi
fi
kubectl --context "${K8S_SOURCE_CLUSTER_CONTEXT}" --request-timeout=10s get --raw=/version >/dev/null 2>&1 || true
kubectl --context "${K8S_TARGET_CLUSTER_CONTEXT}" --request-timeout=10s get --raw=/version >/dev/null 2>&1 || true
sleep 120
done
) &
REFRESH_PID=$!
trap 'kill "$REFRESH_PID" 2>/dev/null || true' EXIT INT TERM
cd ./tools/cluster-management/
./copy-live-environment.sh
31 changes: 31 additions & 0 deletions .github/workflows/tirgger-staging-deploy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# SPDX-License-Identifier: Apache-2.0

name: Trigger Staging on Deploy Change

on:
push:
branches: [deploy]
paths:
- clusters/mainnet-staging-na/mainnet-citus/helmrelease.yaml

permissions:
id-token: write
contents: read

jobs:
run-reusable-copy:
uses: ./.github/workflows/copy-stackgres-cluster.yml
with:
source_cluster_location: "us-central1"
source_cluster_name: "mainnet-na"
source_project: "prod"
target_cluster_location: "us-central1"
target_cluster_name: "mainnet-staging-na"
target_default_pool: "mainnet-staging-na"
target_project: "nonprod"
teardown_target: true
secrets:
GH_ACTIONS_KUBECTL_MAIN_PROJECT_ID: ${{ secrets.GH_ACTIONS_KUBECTL_MAIN_PROJECT_ID }}
GH_ACTIONS_KUBECTL_GCP_SERVICE_ACCOUNT: ${{ secrets.GH_ACTIONS_KUBECTL_GCP_SERVICE_ACCOUNT }}
GH_ACTIONS_KUBECTL_STAGING_PROJECT_ID: ${{ secrets.GH_ACTIONS_KUBECTL_STAGING_PROJECT_ID }}
GH_ACTIONS_KUBECTL_WORKLOAD_ID_PROVIDER: ${{ secrets.GH_ACTIONS_KUBECTL_WORKLOAD_ID_PROVIDER }}
2 changes: 1 addition & 1 deletion docs/runbook/change-citus-node-pool-machine-type.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Need to Change Machine Type for Citus Node Pool(s)

- Have `jq` and `yq` installed
- kubectl is pointing to the cluster you want to change the machine type for
- All bash commands assume your working directory is `docs/runbook/scripts`
- All bash commands assume your working directory is `tools/cluster-management`

## Solution

Expand Down
5 changes: 3 additions & 2 deletions docs/runbook/copy-live-environment.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,15 @@ Need to copy live environment with zero downtime on source

## Prerequisites

- Have `jq`, `yq`, and `ksd`(kubernetes secret decrypter) installed
- Have `jq`, `yq`, and `base64` installed
- Have `testkube` kubectl plugin installed
- The source and target have compatible versions of postgres
- The `target cluster` has a running Citus cluster deployed with `hedera-mirror` chart
- The `target cluster` you are restoring to doesn't have any pvcs with a size larger than the size of the pvc in the
snapshot. You can't decrease the size of a pvc. If needed, you can delete the existing cluster in the `target cluster`
and redeploy the `hedera-mirror` chart with the default disk sizes.
- If you have multiple Citus clusters in the `target cluster`, you will need to restore all of them
- All bash commands assume your working directory is `docs/runbook/scripts`
- All bash commands assume your working directory is `tools/cluster-management`
- Only a single citus cluster is installed per namespace

## Steps
Expand Down
2 changes: 1 addition & 1 deletion docs/runbook/create-disk-snapshot-for-citus-cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Need to create disk snapshots for Citus cluster(s)

- Have access to a running Citus cluster deployed by the `hedera-mirror` chart
- Have `jq` and `yq` installed
- All bash commands assume your working directory is `docs/runbook/scripts`
- All bash commands assume your working directory is `tools/cluster-management`
- The kubectl context is set to the cluster you want to create snapshots from

## Solution
Expand Down
4 changes: 2 additions & 2 deletions docs/runbook/restore-citus-from-disk-snapshot.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,14 @@ Need to restore Citus cluster from disk snapshots
## Prerequisites

- Snapshots of disks were created by following the [create snapshot](create-disk-snapshot-for-citus-cluster.md) runbook
- Have `jq`, `yq`, and `ksd`(kubernetes secret decrypter) installed
- Have `jq`, `yq`, and `base64` installed
- The snapshots are from a compatible version of `postgres`
- The `target cluster` has a running Citus cluster deployed with `hedera-mirror` chart
- The `target cluster` you are restoring to doesn't have any pvcs with a size larger than the size of the pvc in the
snapshot. You can't decrease the size of a pvc. If needed, you can delete the existing cluster in the `target cluster`
and redeploy the `hedera-mirror` chart with the default disk sizes.
- If you have multiple Citus clusters in the `target cluster`, you will need to restore all of them
- All bash commands assume your working directory is `docs/runbook/scripts`
- All bash commands assume your working directory is `tools/cluster-management`
- Only a single citus cluster is installed per namespace
- The kubectl context is set to the cluster you want to restore snapshots to

Expand Down
4 changes: 2 additions & 2 deletions docs/runbook/restore-citus-from-stackgres-backup.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@ Need to restore Citus cluster from a StackGres sharded backup

## Prerequisites

- Have `jq`, `yq`, and `ksd`(kubernetes secret decrypter) installed
- Have `jq`, `yq`, and `base64` installed
- The cluster has a running Citus cluster deployed with `hedera-mirror` chart
- StackGresShardedCluster backup is enabled
- All bash commands assume your working directory is `docs/runbook/scripts`
- All bash commands assume your working directory is `tools/cluster-management`
- Only a single citus cluster is installed per namespace
- The kubectl context is set to the cluster you want to restore backup to and the namespace is set to the one
`hedera-mirror` chart is installed in
Expand Down
2 changes: 1 addition & 1 deletion docs/runbook/upgrade-k8s-version-citus-nodepool.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Need to update k8s version for clusters with citus installed

- Have `jq` and `yq` installed
- The kubectl context is set to the cluster you want to upgrade
- All bash commands assume your working directory is `docs/runbook/scripts`
- All bash commands assume your working directory is `tools/cluster-management`

## Solution

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

set -euo pipefail

source ./utils.sh
source ./utils/utils.sh

Check warning on line 7 in tools/cluster-management/change-machine-type.sh

View check run for this annotation

Codacy Production / Codacy Static Code Analysis

tools/cluster-management/change-machine-type.sh#L7

Not following: ./utils/utils.sh: openBinaryFile: does not exist (No such file or directory)

GCP_TARGET_PROJECT="$(readUserInput "Enter GCP Project for target: ")"
if [[ -z "${GCP_TARGET_PROJECT}" ]]; then
Expand Down
Loading
Loading