Skip to content

Add Tolerations to Build and BuildRun objects#1711

Merged
openshift-merge-bot[bot] merged 3 commits into
shipwright-io:mainfrom
dorzel:MULTIARCH-5036
Feb 5, 2025
Merged

Add Tolerations to Build and BuildRun objects#1711
openshift-merge-bot[bot] merged 3 commits into
shipwright-io:mainfrom
dorzel:MULTIARCH-5036

Conversation

@dorzel
Copy link
Copy Markdown
Contributor

@dorzel dorzel commented Oct 30, 2024

Changes

Fixes #1636

Submitter Checklist

  • Includes tests if functionality changed/was added
  • Includes docs if changes are user-facing
  • Set a kind label on this PR
  • Release notes block has been filled in, or marked NONE

See the contributor guide
for details on coding conventions, github and prow interactions, and the code review process.

Release Notes

Add Tolerations to Build and BuildRun objects

@openshift-ci openshift-ci Bot added the release-note Label for when a PR has specified a release note label Oct 30, 2024
@pull-request-size pull-request-size Bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 30, 2024
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 30, 2024
@dorzel dorzel force-pushed the MULTIARCH-5036 branch 8 times, most recently from 872db31 to 462d9bb Compare November 6, 2024 20:22
@dorzel dorzel force-pushed the MULTIARCH-5036 branch 3 times, most recently from 4ecfc21 to dfe25d5 Compare November 13, 2024 18:59
@dorzel dorzel force-pushed the MULTIARCH-5036 branch 3 times, most recently from e449fbd to 03b3b21 Compare November 19, 2024 18:13
@pull-request-size pull-request-size Bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 19, 2024
@dorzel dorzel force-pushed the MULTIARCH-5036 branch 8 times, most recently from 3e66b55 to 43382a6 Compare November 20, 2024 22:06
@dorzel dorzel marked this pull request as ready for review November 21, 2024 17:28
@dorzel dorzel force-pushed the MULTIARCH-5036 branch 12 times, most recently from 090570c to b8e487a Compare January 29, 2025 23:05
Copy link
Copy Markdown
Member

@SaschaSchwarze0 SaschaSchwarze0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice that tests now succeed. I still have one more homework for you. Sorry that I missed that earlier.

Comment on lines +43 to +46
// In this case, fields set only on the BuildRun object do not get validated as they are not copied to the transient Build resource.
// explicitly setting them here is required for validation to happen.
build.Spec.NodeSelector = buildRun.Spec.NodeSelector
build.Spec.Tolerations = buildRun.Spec.Tolerations
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct and reminds me of something that I missed earlier (also for nodeSelector). Users can define these things both in the inline build spec as well as the overrides on the buildrun spec.

Two things:

  1. We should validate the buildrun spec fields somewhere near here (https://github.com/shipwright-io/build/blob/v0.14.0/pkg/reconciler/buildrun/buildrun.go#L276-L292) and not copy it over to a build spec to perform a validation.
  2. When an inline build spec is used, we validate that certain fields are not set on the buildrun spec because it does not make sense to have them specified in two places. That would be suitable for both nodeSelector and tolerations as well. The code that validates this is here: https://github.com/shipwright-io/build/blob/v0.14.0/pkg/validate/validate.go#L106-L136. Basically, when somebody creates a standalone BuildRun (= with an inline Build spec), we expect that BuildRun to not also set those fields (params, volumes, timeout etc) on the BuildRun spec itself.

Can you adopt this please.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so if I understand it seems that specifically in the inline build spec case we expect that certain fields are only specified on that build spec instead of having, for example, the "BuildRun Tolerations take preference over Build Tolerations" behavior (overriding). I'll implement these.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SaschaSchwarze0 Ok, let me know how that looks. Tests are good except for an unrelated flaky e2e test.

@dorzel dorzel force-pushed the MULTIARCH-5036 branch 2 times, most recently from fa80d86 to b0fee7d Compare February 3, 2025 22:06
@dorzel
Copy link
Copy Markdown
Contributor Author

dorzel commented Feb 4, 2025

/retest

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Feb 4, 2025

@dorzel: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Copy Markdown
Member

@adambkaplan adambkaplan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

In general this looks really good! I found a few opportunities for improvement - either through refactoring or adding additional tests to improve code coverage. That said, I would not block merge/lgtm on these items, they can all be addressed in a follow-up pull request.

Comment on lines +32 to +58
for _, toleration := range b.Build.Spec.Tolerations {
// validate Key
if errs := validation.IsQualifiedName(toleration.Key); errs != nil {
b.Build.Status.Reason = ptr.To(build.TolerationNotValid)
b.Build.Status.Message = ptr.To(fmt.Sprintf("Toleration key not valid: %v", strings.Join(errs, ", ")))
}
// validate Operator
if !((toleration.Operator == v1.TolerationOpExists) || (toleration.Operator == v1.TolerationOpEqual)) {
b.Build.Status.Reason = ptr.To(build.TolerationNotValid)
b.Build.Status.Message = ptr.To(fmt.Sprintf("Toleration operator not valid. Must be one of: '%v', '%v'", v1.TolerationOpExists, v1.TolerationOpEqual))
}
// validate Value
if errs := validation.IsValidLabelValue(toleration.Value); errs != nil {
b.Build.Status.Reason = ptr.To(build.TolerationNotValid)
b.Build.Status.Message = ptr.To(fmt.Sprintf("Toleration value not valid: %v", strings.Join(errs, ", ")))
}
// validate Taint Effect, of which only "NoSchedule" is supported
if !((toleration.Effect) == "" || (toleration.Effect == v1.TaintEffectNoSchedule)) {
b.Build.Status.Reason = ptr.To(build.TolerationNotValid)
b.Build.Status.Message = ptr.To(fmt.Sprintf("Only the '%v' toleration effect is supported.", v1.TaintEffectNoSchedule))
}
// validate TolerationSeconds, which should not be specified
if toleration.TolerationSeconds != nil {
b.Build.Status.Reason = ptr.To(build.TolerationNotValid)
b.Build.Status.Message = ptr.To("Specifying TolerationSeconds is not supported.")
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we simplify this a bit and re-use the logic in BuildRunTolerations?

Suggested change
for _, toleration := range b.Build.Spec.Tolerations {
// validate Key
if errs := validation.IsQualifiedName(toleration.Key); errs != nil {
b.Build.Status.Reason = ptr.To(build.TolerationNotValid)
b.Build.Status.Message = ptr.To(fmt.Sprintf("Toleration key not valid: %v", strings.Join(errs, ", ")))
}
// validate Operator
if !((toleration.Operator == v1.TolerationOpExists) || (toleration.Operator == v1.TolerationOpEqual)) {
b.Build.Status.Reason = ptr.To(build.TolerationNotValid)
b.Build.Status.Message = ptr.To(fmt.Sprintf("Toleration operator not valid. Must be one of: '%v', '%v'", v1.TolerationOpExists, v1.TolerationOpEqual))
}
// validate Value
if errs := validation.IsValidLabelValue(toleration.Value); errs != nil {
b.Build.Status.Reason = ptr.To(build.TolerationNotValid)
b.Build.Status.Message = ptr.To(fmt.Sprintf("Toleration value not valid: %v", strings.Join(errs, ", ")))
}
// validate Taint Effect, of which only "NoSchedule" is supported
if !((toleration.Effect) == "" || (toleration.Effect == v1.TaintEffectNoSchedule)) {
b.Build.Status.Reason = ptr.To(build.TolerationNotValid)
b.Build.Status.Message = ptr.To(fmt.Sprintf("Only the '%v' toleration effect is supported.", v1.TaintEffectNoSchedule))
}
// validate TolerationSeconds, which should not be specified
if toleration.TolerationSeconds != nil {
b.Build.Status.Reason = ptr.To(build.TolerationNotValid)
b.Build.Status.Message = ptr.To("Specifying TolerationSeconds is not supported.")
}
}
ok, reason, msg := BuildRunTolerations(b.Build.Spec.Tolerations)
if !ok {
b.Build.Status.Reason = reason
b.Build.Status.Message = msg
}
return nil

shpgit "github.com/shipwright-io/build/pkg/git"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"

corev1 "k8s.io/api/core/v1"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (not blocking): keep k8s.io imports grouped together above.

Comment thread docs/buildrun.md
- `spec.env` - Specifies additional environment variables that should be passed to the build container. Overrides any environment variables that are specified in the `Build` resource. The available variables depend on the tool used by the chosen build strategy.
- `spec.nodeSelector` - Specifies a selector which must match a node's labels for the build pod to be scheduled on that node.
- `spec.nodeSelector` - Specifies a selector which must match a node's labels for the build pod to be scheduled on that node. If nodeSelectors are specified in both a `Build` and `BuildRun`, `BuildRun` values take precedence.
- `spec.tolerations` - Specifies the tolerations for the build pod. Only `key`, `value`, and `operator` are supported. Only `NoSchedule` taint `effect` is supported. If tolerations are specified in both a `Build` and `BuildRun`, `BuildRun` values take precedence.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 on the reference doc!

}

// BuildRunNodeSelector is used to validate nodeSelectors in the BuildRun object
func BuildRunNodeSelector(nodeSelector map[string]string) (bool, string, string) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here - there is a refactoring opportunity to consolidate the logic into a single function.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking - if possible, I'd love to have unit-style Ginkgo tests that cover all the possible validation exceptions.

This can be done in a follow-up pull request, as we currently don't have similar levels of unit test coverage for the node selector validations. We have test coverage elsewhere in this PR.

Comment thread pkg/validate/validate.go
Comment on lines +141 to +149
if len(buildRun.Spec.NodeSelector) > 0 {
return resources.BuildRunBuildFieldOverrideForbidden,
"cannot use 'nodeSelector' override and 'buildSpec' simultaneously"
}

if len(buildRun.Spec.Tolerations) > 0 {
return resources.BuildRunBuildFieldOverrideForbidden,
"cannot use 'tolerations' override and 'buildSpec' simultaneously"
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not-blocking): would like to have unit test coverage here.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Feb 4, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adambkaplan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 4, 2025
@dorzel
Copy link
Copy Markdown
Contributor Author

dorzel commented Feb 5, 2025

@adambkaplan @SaschaSchwarze0 Is this ok for merge? I'll plan on addressing the comments above in #1770

Copy link
Copy Markdown
Member

@SaschaSchwarze0 SaschaSchwarze0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. release-note Label for when a PR has specified a release note size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

SHIP-0039: Allow tolerations on Build and BuildRun to be set

3 participants