Skip to content

Conversation

@vishpillai123
Copy link
Collaborator

@vishpillai123 vishpillai123 commented Oct 31, 2025

My strategy here was to combine our current git flow process (documented by @kaylawilding here in confluence) with a CICD process that ensures we are validating our pipeline prior to releases. I am planning on adding more validation and assertions in each script (where applicable) after this.

We're running all of these integration tests on a new synthetic schema that we created (I got help from @emmaeturner here): synthetic_integration. Part of the reason why we created a new schema for just integration tests is because it's easier to keep track of all artifacts (models, tables, volumes, experiments, etc.) for cleanup. We'll use our other synthetic datasets more adhoc, and this institution will be dedicated to integration tests.

NOTE: Prod deployment is still being weird and not letting me deploy as the service account. I need to debug this more with Emma, but I think this is still OK to merge for now. We likely still need to deploy manually until we get a fix.

Anyways, there's a lot in here! I summarized each process below, but let me know if you have questions.

Release Process:

  • We create a tag and start the release by running: git flow release start <version_number>
  • Our integration test (release-integration.yml) is triggered off of the new release branch created by git flow. It then runs training then inference on dev-sst-02 and target=dev.
  • If it fails, the action fails and we need to manually correct develop, then restart our release
  • If it passes, then we can proceed with our manual metadata update. The metadata update comprises of the CHANGELOG.md update, pyproject bump & uv.lock, and also updating our templates for custom schools. This step is manual, but I want to automate this to happen as soon as our integration test passes.
  • Release is finished using git flow release finish -m "Releasing our super cool version 0.2.0" '0.2.0'.
  • Tag is pushed git push origin --tags and we automatically deploy to staging and dev with our newest tag (deploy-main.yml). The deployment is still not working here due to a privilege issue unfortunately. It's still safe to merge, since we'll just deploy manually if it fails.

Weekly Health check:

  • I wanted to also add a health check to ensure we are proactive with pipeline issues/bugs.
  • Every week on Mondays around noon EST, we kick off an integration test (weekly-develop-integration.yml)on develop that runs training then inference on dev-sst-02 and target=dev.
  • This isn't gating anything, if it fails, then it's a flag that we need to fix something. If it passes, we're good to go.
  • Once a month, we will cleanup using another action (weekly-cleanup.yml). It cleans up tables, volumes, models, and experiments. It also has hard validation around only deleting schemas that are synthetic and in dev-sst-02 (we can change this to staging too but highly recommend we only delete synthetic to be safe).

kaylawilding and others added 3 commits October 11, 2025 15:25
… and another that will when we create a release and will gate creating a tag for production, then once a tag is pushed, then our deployment action will do CD
@vishpillai123 vishpillai123 changed the title feat/setting up integration CICD action (kayla's old PR) feat/setting up integration CICD action Nov 3, 2025
@vishpillai123 vishpillai123 removed the request for review from kaylawilding November 6, 2025 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants