-
Notifications
You must be signed in to change notification settings - Fork 637
Github Action Debugging #4338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Github Action Debugging #4338
Conversation
f0bcc54 to
bff9d68
Compare
Prevents race conditions where pod is found but Postgres isn't ready yet.
Reduce PVC sizes from 1Gi to 256Mi and use Foreground deletion with explicit PVC cleanup waits to prevent disk exhaustion on GitHub-hosted runners.
Use env var instead of JMESPath expression in shell script.
Prevent script timeout (5s default) from killing 2m kubectl wait.
Only prefetch images actually used by the test: pgbackrest and postgres. Removes ~500MB of unused images (pgbouncer, pgadmin, exporter, upgrade).
- Add pgbouncer, exporter, upgrade, pgadmin images to prefetch - Increase KUTTL timeout from 300s to 450s - Increase prefetch timeout to 5m
5a2aee5 to
aa19cad
Compare
bd01f63 to
63c4ecb
Compare
|
The two biggest improvements here were reducing the number of prefetch images and splitting chainsaw and kuttl tests into separate jobs. I think we were hitting issues with disk pressure and these two changes help with that. For the prefetch images change, we aren't testing GIS here so there is no point in fetching them. In addition to removing these extra images, this change tries to ensure we are using the images that we fetch. This does mean we might need to add images, specifically to chainsaw, as we test more functionality. For the split change, this feels like a stopgap fix. We are really just giving the tests more space to run by splitting them up. We should continue to consider ways reduce disk space usage. One option may be to refactor the chainsaw pgbackrest-restore test to only use one cluster. Other changes in this pr are mostly either timeout increases or adding extra logging or checks. The change to reduce volume request size likely doesn't have much impact on actual disk usage but feels like a safe change. |
.github/workflows/test.yaml
Outdated
| --env 'RELATED_IMAGE_PGEXPORTER=registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi9-0.17.1-2542' \ | ||
| --env 'RELATED_IMAGE_PGUPGRADE=registry.developers.crunchydata.com/crunchydata/crunchy-upgrade:ubi9-18.0-2542' \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess these (and PGBOUNCER) could be removed as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking leave related images. That way if an image isn't pre-fetched it could be pulled.
Or we could fail when we don't have all the images we need?
.github/workflows/test.yaml
Outdated
| --env 'RELATED_IMAGE_STANDALONE_PGADMIN=registry.developers.crunchydata.com/crunchydata/crunchy-pgadmin4:ubi9-9.8-2542' \ | ||
| --env 'RELATED_IMAGE_COLLECTOR=registry.developers.crunchydata.com/crunchydata/postgres-operator:ubi9-5.8.4-0' \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These can be removed too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple nitpicks, but LGTM
63c4ecb to
51f6ea0
Compare
Checklist:
Type of Changes:
What is the current behavior (link to any open issues here)?
What is the new behavior (if this is a feature change)?
Other Information: