Add workflow to detect stuck submodule update PRs#590
Add workflow to detect stuck submodule update PRs#590hdwhdw wants to merge 3 commits intosonic-net:masterfrom
Conversation
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
This PR adds automation to detect and file issues for “stuck” sonic-buildimage submodule-update PRs, and also changes Redis client initialization behavior in sonic_data_client to skip databases whose address/socket cannot be resolved (instead of failing fast).
Changes:
- Add a scheduled GitHub Actions workflow that searches for long-open submodule update PRs in
sonic-buildimageand files an issue insonic-gnmi. - Update Redis client initialization (TCP test path and default unix-socket path) to log warnings and continue when DB address/socket lookup fails.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
sonic_data_client/db_client.go |
Changes init/TCP-client setup to “skip on error” when DB socket/address lookup fails. |
.github/workflows/check-submodule-update.yml |
Introduces scheduled workflow to detect and report stuck submodule-update PRs. |
sonic_data_client/db_client.go
Outdated
| log.Warningf("Skipping %s in namespace %s: %v", dbName, dbNamespace, err) | ||
| continue |
There was a problem hiding this comment.
Changing GetDbTcpAddr failure handling from return err to continue can leave Target2RedisDb[namespace][dbName] unset. Much of db_client.go later assumes the redis client is non-nil (e.g., TableData2Msi reads it without an ok check) and will panic if a skipped DB is later used (including via V2R mappings that bypass the presence check in populateDbtablePath). Consider either keeping fail-fast behavior for required DBs, or consistently guarding all Target2RedisDb[...] lookups (returning a clear error) so skipping cannot lead to a nil dereference at runtime.
| log.Warningf("Skipping %s in namespace %s: %v", dbName, dbNamespace, err) | |
| continue | |
| log.Warningf("Failed to get TCP address for %s in namespace %s: %v", dbName, dbNamespace, err) | |
| return err |
sonic_data_client/db_client.go
Outdated
| addr, err := sdcfg.GetDbSock(dbName, dbNamespace) | ||
| if err != nil { | ||
| log.Errorf("init error: %v", err) | ||
| return | ||
| log.Warningf("Skipping %s in namespace %s: %v", dbName, dbNamespace, err) | ||
| continue |
There was a problem hiding this comment.
With the new continue behavior on GetDbSock errors, init() may partially populate Target2RedisDb and leave some DB clients missing. Since several call sites in this file index into Target2RedisDb without checking presence, this can turn a config/init problem into a later nil-pointer panic. Recommend either failing initialization for mandatory DBs or adding presence checks (and error returns) at all Target2RedisDb access points to make skipped DBs safe.
| issues: write | ||
|
|
||
| jobs: | ||
| check: |
There was a problem hiding this comment.
This workflow hard-codes actions that only make sense in the upstream repo (searching sonic-net/sonic-buildimage and creating issues in sonic-net/sonic-gnmi). Other workflows in this repo guard against running on forks (e.g., .github/workflows/semgrep.yml uses if: github.repository_owner == 'sonic-net'). Add a similar if: guard to this job to prevent failures/noisy runs when triggered from forks or non-sonic-net repos.
| check: | |
| check: | |
| if: github.repository_owner == 'sonic-net' |
Add a scheduled GitHub Actions workflow that checks sonic-buildimage for sonic-gnmi submodule update PRs that have been open longer than 96 hours. When found, it files an issue in sonic-gnmi with details about the stuck PR to ensure visibility and prompt investigation. Signed-off-by: Dawei Huang <daweihuang@microsoft.com>
e1d151d to
e6f1c7c
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Dawei Huang <daweihuang@microsoft.com>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Load assignees and threshold_hours from .github/submodule-watchers.json so the policy can be updated without modifying the workflow itself. Signed-off-by: Dawei Huang <daweihuang@microsoft.com>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Why I did it
The mssonicbld bot creates submodule update PRs in sonic-buildimage automatically, but these PRs can get stuck for days or weeks without anyone noticing (e.g. CI failures, merge conflicts). This delays rollout of merged fixes to the build.
For example, sonic-net/sonic-buildimage#25285 has been open since Jan 31 — over a month — blocking multiple merged commits from reaching the build.
How I did it
Added a GitHub Actions workflow (
.github/workflows/check-submodule-update.yml) that:submodule-stucklabelAssignees and threshold are read from
.github/submodule-watchers.jsonso the policy can be updated without modifying the workflow.How to verify it
workflow_dispatchDescription for the changelog
Add scheduled workflow to detect and alert on stuck sonic-gnmi submodule update PRs in sonic-buildimage.