Skip to content

feat: add waitsecs with buildstatus feedback#36057

Open
BrageHK wants to merge 6 commits intomasterfrom
bragehk/CLI-check_production_build_status
Open

feat: add waitsecs with buildstatus feedback#36057
BrageHK wants to merge 6 commits intomasterfrom
bragehk/CLI-check_production_build_status

Conversation

@BrageHK
Copy link
Contributor

@BrageHK BrageHK commented Feb 27, 2026

I confirm that this contribution is made under the terms of the license found in the root directory of this repository's source tree and that I have the authority necessary to make this contribution on behalf of its copyright owner.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for waiting on Vespa Cloud production build completion (via the build-status API), including optional per-job log streaming, and wires this into vespa prod deploy through the --wait flag.

Changes:

  • Add AwaitBuild polling of build-status plus per-job run log streaming utilities.
  • Expose System.BuildStatusURL and add unit tests for build waiting/log formatting.
  • Extend prod deploy to optionally wait for production build completion and report outcome.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
client/go/internal/vespa/target_cloud.go Implements build-status polling (AwaitBuild) and job log streaming/printing helpers.
client/go/internal/vespa/system.go Adds URL builder for the build-status endpoint.
client/go/internal/vespa/deploy.go Changes Submit signature to accept waitSecs.
client/go/internal/vespa/deploy_test.go Updates submit tests to match new Submit signature.
client/go/internal/vespa/target_test.go Adds tests for AwaitBuild, log streaming, and log formatting/filtering.
client/go/internal/cli/cmd/prod.go Adds --wait handling for prod deploy to wait for build completion and print result.
client/go/internal/cli/cmd/prod_test.go Adds CLI test coverage for prod deploy --wait success/skip/failure scenarios.
Comments suppressed due to low confidence (1)

client/go/internal/vespa/deploy.go:399

  • Submit gained a waitSecs parameter but it is not used in the function body, which makes the API change confusing and easy to misuse. Either remove the parameter and keep waiting logic at the call site, or implement the intended behavior (e.g., pass waitSecs to the submit endpoint via a query parameter/body field if the API supports it).
func Submit(opts DeploymentOptions, submission Submission, waitSecs int) (int64, error) {
	if !opts.Target.IsCloud() {
		return 0, fmt.Errorf("%s: deploy is unsupported by %s target", opts, opts.Target.Type())
	}
	if err := checkDeploymentOpts(opts); err != nil {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +498 to +501
wg.Wait()
if mainErr != nil {
return false, mainErr
}
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AwaitBuild waits for all log-streaming goroutines (wg.Wait) before checking mainErr. If the build-status poll exits early with an error (e.g. job failure/cancelled/timeout), this can block returning the error until each streamBuildJobLogs call finishes (potentially up to its full timeout). Consider checking/returning mainErr before waiting, and/or using a cancellation mechanism (context) so spawned log streams stop promptly on main failure.

Suggested change
wg.Wait()
if mainErr != nil {
return false, mainErr
}
if mainErr != nil {
return false, mainErr
}
wg.Wait()

Copilot uses AI. Check for mistakes.
Comment on lines +481 to +489
wg.Add(1)
go func() {
defer wg.Done()
if err := streamBuildJobLogs(target, job, timeout, logWriter, retryInterval); err != nil && !errors.Is(err, ErrWaitTimeout) {
mu.Lock()
jobErrors = append(jobErrors, err)
mu.Unlock()
}
}()
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AwaitBuild passes the full timeout duration to each per-job streamBuildJobLogs goroutine, which can make the total wall-clock time exceed the requested timeout (jobs started late can still run for a full timeout). Consider computing a single deadline at the start and passing the remaining time budget to each stream, or share a context with a single deadline across the polling + log streaming.

Copilot uses AI. Check for mistakes.
Comment on lines +482 to +489
go func() {
defer wg.Done()
if err := streamBuildJobLogs(target, job, timeout, logWriter, retryInterval); err != nil && !errors.Is(err, ErrWaitTimeout) {
mu.Lock()
jobErrors = append(jobErrors, err)
mu.Unlock()
}
}()
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple streamBuildJobLogs goroutines can write to the same logWriter concurrently. This can interleave/garble output because io.Writer implementations (including *os.File) don't guarantee line-level atomic writes. Consider synchronizing writes (e.g., wrap logWriter with a mutex-protected writer, or do the formatting in goroutines but funnel writes through a single goroutine/channel).

Copilot uses AI. Check for mistakes.
Comment on lines +394 to +396
tm := time.Unix(msg.At/1000, (msg.At%1000)*1000)
fmt.Fprintf(writer, "%s[%s] %-7s %s\n", prefix, tm.Format("15:04:05"), msg.Type, msg.Message)
}
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

printBuildJobLog treats logMessage.At as seconds+milliseconds, but the conversion uses (msg.At%1000)*1000 (microseconds) as the nanosecond part. If At is milliseconds since epoch (as in the tests), this should be multiplied by 1e6, or use time.UnixMilli(msg.At), to avoid incorrect timestamps (especially if the format later includes sub-second precision).

Copilot uses AI. Check for mistakes.
cmd.Flags().StringVarP(&options.description, "description", "", "", "Description of the source code being deployed. For example a git commit message")
cmd.Flags().StringVarP(&options.authorEmail, "author-email", "", "", "Email of the author of the commit being deployed")
cmd.Flags().StringVarP(&options.sourceURL, "source-url", "", "", "URL which points to the source code being deployed. For example the build job running the submission")
cli.bindWaitFlag(cmd, 0, &options.waitSecs)
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prod deploy reuses the global --wait flag, but bindWaitFlag's description is specifically about waiting for services to become ready. Here it's used to wait for build completion, so the help text becomes misleading. Consider defining a command-specific flag/description for build waiting (or extending bindWaitFlag to accept a custom description).

Suggested change
cli.bindWaitFlag(cmd, 0, &options.waitSecs)
cmd.Flags().IntVarP(&options.waitSecs, "wait", "", 0, "Seconds to wait for the build to complete before returning (0 to return immediately)")

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@sebastiannberg sebastiannberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work, the implementation looks correct and tests pass

A few things worth considering:

  • The waitSecs parameter in Submit() is never used inside the function, it should be removed from the signature
  • Multiple jobs might stream logs at the same time and can end up writing over each other, worth adding a mutex around the writer to keep the output clean

As an alternative if fixing the concurrency feels like a lot of work, there is a (probably easier) approach where AwaitBuild is a method on cloudTarget and added to the Target interface (similar to how AwaitDeployment works), with log polling done synchronously inline on each poll tick instead of in separate goroutines. That avoids all the concurrency issues entirely and also looks more like the existing Waiter pattern the dev deploy uses. You can see how Waiter.Deployment() in waiter.go calls target.AwaitDeployment() for reference, the idea would be to follow the same pattern with a new Waiter.Build() calling target.AwaitBuild()

BrageHK and others added 4 commits March 6, 2026 09:58
feat: AwaitBuild only works on cloud

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…m:vespa-engine/vespa into bragehk/CLI-check_production_build_status
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants