feat: add waitsecs with buildstatus feedback#36057
Conversation
There was a problem hiding this comment.
Pull request overview
Adds support for waiting on Vespa Cloud production build completion (via the build-status API), including optional per-job log streaming, and wires this into vespa prod deploy through the --wait flag.
Changes:
- Add
AwaitBuildpolling of build-status plus per-job run log streaming utilities. - Expose
System.BuildStatusURLand add unit tests for build waiting/log formatting. - Extend
prod deployto optionally wait for production build completion and report outcome.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| client/go/internal/vespa/target_cloud.go | Implements build-status polling (AwaitBuild) and job log streaming/printing helpers. |
| client/go/internal/vespa/system.go | Adds URL builder for the build-status endpoint. |
| client/go/internal/vespa/deploy.go | Changes Submit signature to accept waitSecs. |
| client/go/internal/vespa/deploy_test.go | Updates submit tests to match new Submit signature. |
| client/go/internal/vespa/target_test.go | Adds tests for AwaitBuild, log streaming, and log formatting/filtering. |
| client/go/internal/cli/cmd/prod.go | Adds --wait handling for prod deploy to wait for build completion and print result. |
| client/go/internal/cli/cmd/prod_test.go | Adds CLI test coverage for prod deploy --wait success/skip/failure scenarios. |
Comments suppressed due to low confidence (1)
client/go/internal/vespa/deploy.go:399
- Submit gained a waitSecs parameter but it is not used in the function body, which makes the API change confusing and easy to misuse. Either remove the parameter and keep waiting logic at the call site, or implement the intended behavior (e.g., pass waitSecs to the submit endpoint via a query parameter/body field if the API supports it).
func Submit(opts DeploymentOptions, submission Submission, waitSecs int) (int64, error) {
if !opts.Target.IsCloud() {
return 0, fmt.Errorf("%s: deploy is unsupported by %s target", opts, opts.Target.Type())
}
if err := checkDeploymentOpts(opts); err != nil {
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| wg.Wait() | ||
| if mainErr != nil { | ||
| return false, mainErr | ||
| } |
There was a problem hiding this comment.
AwaitBuild waits for all log-streaming goroutines (wg.Wait) before checking mainErr. If the build-status poll exits early with an error (e.g. job failure/cancelled/timeout), this can block returning the error until each streamBuildJobLogs call finishes (potentially up to its full timeout). Consider checking/returning mainErr before waiting, and/or using a cancellation mechanism (context) so spawned log streams stop promptly on main failure.
| wg.Wait() | |
| if mainErr != nil { | |
| return false, mainErr | |
| } | |
| if mainErr != nil { | |
| return false, mainErr | |
| } | |
| wg.Wait() |
| wg.Add(1) | ||
| go func() { | ||
| defer wg.Done() | ||
| if err := streamBuildJobLogs(target, job, timeout, logWriter, retryInterval); err != nil && !errors.Is(err, ErrWaitTimeout) { | ||
| mu.Lock() | ||
| jobErrors = append(jobErrors, err) | ||
| mu.Unlock() | ||
| } | ||
| }() |
There was a problem hiding this comment.
AwaitBuild passes the full timeout duration to each per-job streamBuildJobLogs goroutine, which can make the total wall-clock time exceed the requested timeout (jobs started late can still run for a full timeout). Consider computing a single deadline at the start and passing the remaining time budget to each stream, or share a context with a single deadline across the polling + log streaming.
| go func() { | ||
| defer wg.Done() | ||
| if err := streamBuildJobLogs(target, job, timeout, logWriter, retryInterval); err != nil && !errors.Is(err, ErrWaitTimeout) { | ||
| mu.Lock() | ||
| jobErrors = append(jobErrors, err) | ||
| mu.Unlock() | ||
| } | ||
| }() |
There was a problem hiding this comment.
Multiple streamBuildJobLogs goroutines can write to the same logWriter concurrently. This can interleave/garble output because io.Writer implementations (including *os.File) don't guarantee line-level atomic writes. Consider synchronizing writes (e.g., wrap logWriter with a mutex-protected writer, or do the formatting in goroutines but funnel writes through a single goroutine/channel).
| tm := time.Unix(msg.At/1000, (msg.At%1000)*1000) | ||
| fmt.Fprintf(writer, "%s[%s] %-7s %s\n", prefix, tm.Format("15:04:05"), msg.Type, msg.Message) | ||
| } |
There was a problem hiding this comment.
printBuildJobLog treats logMessage.At as seconds+milliseconds, but the conversion uses (msg.At%1000)*1000 (microseconds) as the nanosecond part. If At is milliseconds since epoch (as in the tests), this should be multiplied by 1e6, or use time.UnixMilli(msg.At), to avoid incorrect timestamps (especially if the format later includes sub-second precision).
client/go/internal/cli/cmd/prod.go
Outdated
| cmd.Flags().StringVarP(&options.description, "description", "", "", "Description of the source code being deployed. For example a git commit message") | ||
| cmd.Flags().StringVarP(&options.authorEmail, "author-email", "", "", "Email of the author of the commit being deployed") | ||
| cmd.Flags().StringVarP(&options.sourceURL, "source-url", "", "", "URL which points to the source code being deployed. For example the build job running the submission") | ||
| cli.bindWaitFlag(cmd, 0, &options.waitSecs) |
There was a problem hiding this comment.
prod deploy reuses the global --wait flag, but bindWaitFlag's description is specifically about waiting for services to become ready. Here it's used to wait for build completion, so the help text becomes misleading. Consider defining a command-specific flag/description for build waiting (or extending bindWaitFlag to accept a custom description).
| cli.bindWaitFlag(cmd, 0, &options.waitSecs) | |
| cmd.Flags().IntVarP(&options.waitSecs, "wait", "", 0, "Seconds to wait for the build to complete before returning (0 to return immediately)") |
sebastiannberg
left a comment
There was a problem hiding this comment.
Good work, the implementation looks correct and tests pass
A few things worth considering:
- The
waitSecsparameter inSubmit()is never used inside the function, it should be removed from the signature - Multiple jobs might stream logs at the same time and can end up writing over each other, worth adding a mutex around the writer to keep the output clean
As an alternative if fixing the concurrency feels like a lot of work, there is a (probably easier) approach where AwaitBuild is a method on cloudTarget and added to the Target interface (similar to how AwaitDeployment works), with log polling done synchronously inline on each poll tick instead of in separate goroutines. That avoids all the concurrency issues entirely and also looks more like the existing Waiter pattern the dev deploy uses. You can see how Waiter.Deployment() in waiter.go calls target.AwaitDeployment() for reference, the idea would be to follow the same pattern with a new Waiter.Build() calling target.AwaitBuild()
feat: AwaitBuild only works on cloud Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…m:vespa-engine/vespa into bragehk/CLI-check_production_build_status
I confirm that this contribution is made under the terms of the license found in the root directory of this repository's source tree and that I have the authority necessary to make this contribution on behalf of its copyright owner.