Skip to content

Conversation

@NickLarsenNZ
Copy link
Contributor

Fixes #1821

This updates the kube Schema hoisting since schemars Schema's have changed.

  • Add tests for a variety of enum use-cases:
    • Tagged vs Untagged
    • Unit vs Tuple vs Structural variants
    • With and without doc-comments (descriptions)
  • Rewrite the hoisting logic
    • This is annotated with dev-comments to help understand intend and to ease future schemars changes.

This also fixes other issues that appeared along the way:

  • Untagged enum variant doc-comments were being applied to field descriptions.
  • Additional null entry added to enums.

See also: stackabletech#1, where this work was originally done.


Motivation

Since kube 1.0.0, schemas became invalid when optional enums contained comments on the variants.

Solution

  • Break out the hoisting logic into separate modules with scoped tests.
    • Heavily document the steps to help developers with future schemars Schema generation changes.
  • Add tests to try and catch this in future.

Fixes kube Schema conversion since schemars Schema's have changed.

- Add tests for a variety of enum use-cases:
  - Tagged vs Untagged
  - Unit vs Tuple vs Structural variants
  - With and without doc-comments (descriptions)
- Rewrite the hoisting logic
  - This is annotated with dev-comments to help understand intend and to ease future schemars changes.

This also fixes other issues:
- Untagged enum variant doc-comments were being applied to field descriptions.
- Additional `null` entry added to enums.

---------

Co-authored-by: Nick Larsen <[email protected]>
Co-authored-by: Sebastian Bernauer <[email protected]>
Co-authored-by: Techassi <[email protected]>
Signed-off-by: Nick Larsen <[email protected]>
@codecov
Copy link

codecov bot commented Oct 29, 2025

Codecov Report

❌ Patch coverage is 97.31801% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.3%. Comparing base (a12559d) to head (77af8e5).
⚠️ Report is 38 commits behind head on main.

Files with missing lines Patch % Lines
kube-core/src/schema/transform_one_of.rs 92.2% 3 Missing ⚠️
kube-core/src/schema/transform_any_of.rs 95.2% 2 Missing ⚠️
kube-core/src/schema/transform_properties.rs 97.6% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##            main   #1839     +/-   ##
=======================================
+ Coverage   74.7%   75.3%   +0.6%     
=======================================
  Files         84      90      +6     
  Lines       7938    8133    +195     
=======================================
+ Hits        5927    6120    +193     
- Misses      2011    2013      +2     
Files with missing lines Coverage Δ
kube-core/src/schema/mod.rs 100.0% <100.0%> (ø)
...re/src/schema/transform_optional_enum_with_null.rs 100.0% <100.0%> (ø)
kube-derive/src/custom_resource.rs 84.3% <100.0%> (-<0.1%) ⬇️
kube-derive/tests/crd_complex_enum_tests.rs 100.0% <100.0%> (ø)
kube-derive/tests/crd_mixed_enum_test.rs 100.0% <100.0%> (ø)
kube-derive/tests/crd_schema_test.rs 100.0% <100.0%> (ø)
kube-derive/tests/crd_top_level_enum_test.rs 100.0% <ø> (ø)
kube-core/src/schema/transform_any_of.rs 95.2% <95.2%> (ø)
kube-core/src/schema/transform_properties.rs 97.6% <97.6%> (ø)
kube-core/src/schema/transform_one_of.rs 92.2% <92.2%> (ø)

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@NickLarsenNZ
Copy link
Contributor Author

I'm unsure what to do about the CI error:

error: rustc 1.86.0 is not supported by the following package:
  [email protected] requires rustc 1.88

For me, I get [email protected]. Should I pin the dependency?

home = "0.5.4"

@sbernauer
Copy link
Contributor

Created #1840 to address the home CI failures

@Techassi
Copy link
Contributor

Techassi commented Nov 4, 2025

I've just synced this PR branch to include the CI fixes (done in #1840). This PR should now be ready for review.

@NickLarsenNZ
Copy link
Contributor Author

Hi @clux, are you able to take a look at this?

We saw a new PR come up that fixes part of the problem: #1853

@sbernauer
Copy link
Contributor

(Also a small draft PR with a fix popped up: #1841)

@clux
Copy link
Member

clux commented Nov 26, 2025

Heyyy, sorry not had much time to deeply look at things the last month. This is definitely on the larger size, so only done a quick pass now, but the testing rigor is definitely super helpful! Thanks a lot for all this.

One thing that stands out as different in the approaches here is the use of Transform impls; this PR calls multiple functions, in one giant Transform, and the new PR is adding a more isolated option enum transform. I do wonder if it's better to follow the multiples-transform approach to allow users some flexibility (without us having to expose a bunch more config parameters for "our kube rewriter"). I have a hunch that it can make testing a bit more isolatable as well perhaps, but maybe not.

As a quick check first. Would you be OK with rebasing (possibly also splitting to expedite) this if I were to merge #1853 first? That's a pretty digestible PR, although it does seem to clash with your remove_optional_enum_null_variant parts (for which the optional_enum_with_null tests would probably be useful to combine).

@NickLarsenNZ
Copy link
Contributor Author

Heyyy, sorry not had much time to deeply look at things the last month.

No problem. I know how it gets.

This is definitely on the larger size

Yeah, we are aware of that. We started off with very minimal changes (much like the other PR(s)) but after creating a set of tests we found at least three bugs which lead to a larger refactor after seeing a more general transformation pattern.

The other PRs appear to only be solving one at a time and I believe they would not pass the test suite in this PR.

One thing that stands out as different in the approaches here is the use of Transform impls; this PR calls multiple functions, in one giant Transform, and the new PR is adding a more isolated option enum transform. I do wonder if it's better to follow the multiples-transform approach to allow users some flexibility (without us having to expose a bunch more config parameters for "our kube rewriter"). I have a hunch that it can make testing a bit more isolatable as well perhaps, but maybe not.

I just didn't think of that, but I think we can just make multiple transformers and move the chained calls inside the existing one into each new transformer. I think I can spend a little bit of time to see how easy it is to convert.

As a quick check first. Would you be OK with rebasing (possibly also splitting to expedite) this if I were to merge #1853 first? That's a pretty digestible PR, although it does seem to clash with your remove_optional_enum_null_variant parts (for which the optional_enum_with_null tests would probably be useful to combine).

I guess I'd prefer not to, partly because of the possible merge conflicts, but also because I think they only solve part of the problem without seeing the broader issue.

Given that, if you are not convinced to hold off merging the other PR(s), I understand.

@NickLarsenNZ
Copy link
Contributor Author

I have updated the implementation to split the single transformer into multiple transformers.

The original StructuralSchemaRewriter should be renamed because it now does two random other things (should it just be nameds OtherSchemaRewriter, or should they too be split into separate Transform impls?

I did try moving the body of the transform functions into the Transform impl but then the unit tests would need rewriting (I can try that is you prefer).

Anyway, I will leave it as is and await your feedback.

@clux
Copy link
Member

clux commented Nov 27, 2025

I have updated the implementation to split the single transformer into multiple transformers.

I think this makes the intent between the various functionality a bit clearer at the very least. 👍

The other PRs appear to only be solving one at a time and I believe they would not pass the test suite in this PR.

yeah. that is a concern I have as well. i need to look a bit more tomorrow to understand the various test cases here, but afaikt the tests are fairly isolated as is, and each transformer now looks like something that can be reviewed independently, but maybe there's a subtletly i am missing.

i'll get back with more tomorrow.

@clux
Copy link
Member

clux commented Nov 28, 2025

I guess I'd prefer not to, partly because of the possible merge conflicts, but also because I think they only solve part of the problem without seeing the broader issue.

Yeah, that is fair. Will hold off.

I did try moving the body of the transform functions into the Transform impl but then the unit tests would need rewriting (I can try that is you prefer).

I like the idea of this if it doesn't make the testing too difficult. At least then we are testing the actual interface and not only all our internal functions/structs (which i at some point hoped we could reduce). Maybe it can make the tests a little easier to read as well if we try stuff like schemars::json_schema!. EDIT: looking at schemars own unit testing, i like their style

Copy link
Member

@clux clux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some initial comments on various bits. suggestions on some code-minimisation.
feels like i mostly want more use of slice-patterns (to remove some implicit panics), more let-else use, more comments on the invariants that cause us to panic, and possibly (possibly also testing of the interface direct as per previous comment).

otherwise, i think this approach is nice. it is very thorough, and you're dealing with a more complicated codebase at stackable than i have to deal with so happy to take changes like this from someone with a bit of skin in the game.

// Apply this (self) transform to all subschemas
schemars::transform::transform_subschemas(self, transform_schema);

let mut schema: SchemaObject = match serde_json::from_value(transform_schema.clone().to_value()).ok()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this match might be able to be a let-else;

let mut Some(schema) = serde_json::... { else return; }

if the compiler allows the mut in there with a type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 14457d1

Comment on lines +64 to +81
if any_of.len() != 2 {
return;
}

let entry_is_null: [bool; 2] = any_of
.iter()
.map(|schema| serde_json::to_value(schema).expect("schema should be able to convert to JSON"))
.map(|value| value == *NULL_SCHEMA)
.collect::<Vec<_>>()
.try_into()
.expect("there should be exactly 2 elements. We checked earlier");

// Get the `any_of` subschema that isn't the null entry
let subschema_to_hoist = match entry_is_null {
[true, false] => &any_of[1],
[false, true] => &any_of[0],
_ => return,
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically 3 panic-using apis here; array indexing + 2 expects. indexing is checked, but would be nice to use slice-patterns to avoid the implicit linkage between the length check code and the indexing code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first expect should not happen unless everything is broken (a SchemaObject should be able to convert to a serde_json::Value).

The second expect should not be reachable due to the early return.

I'm happy to change things around to make this more compact. Could you elaborate what you mean about the slice-pattern? I tried a few things, but that introduced more conversions and clones.

Comment on lines +86 to +87
// TODO (@NickLarsenNZ): Return errors instead of panicking, leave panicking up to the
// infallible schemars::Transform
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

been thinking about the panicing here a bit, and i think at least the panicing here would generally apply to "setup code". even if it's not, i would rather have a panic here than to accidentally do a runtime "error" log that people miss and as a result perhaps modify a schema incorrectly (making people potentially have to do migrations).

the tradeoff here is that we could panic a controller that does something with schemas live, but controllers should be idempotent and resilient against temporary failures like a controller crashing..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really unfortunate that the schemars::Transform is infallible.

As for the comment I left in the code, that was more about making the functions return errors (easier to tests too) and then panic on error inside the transform.

Would you like me to remove the comment?

Suggested change
// TODO (@NickLarsenNZ): Return errors instead of panicking, leave panicking up to the
// infallible schemars::Transform

///
/// This will return early without modifications unless:
/// - There are `oneOf` subschemas (not empty).
/// - Each subschema contains an `enum` or `const`.
Copy link
Member

@clux clux Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly this can be reduced in scope if we use the existing ReplaceConstValue? EDIT: probably not; it doesn't do a lot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah. Should I leave as-is then?

On that note, Kubernetes might eventually support const: kubernetes/kubernetes#134684

Comment on lines 32 to 33
// TODO (@NickLarsenNZ): At this point, we are seeing duplicate `title` when printing schema as json.
// This is because `title` is specified under both `extensions` and `other`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to concat the titles?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean remove it from other when it is already in extensions?

Comment on lines +90 to +92
if types.any(|r#type| r#type != variant_type) {
panic!("oneOf variants must all have the same type");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably a sane unwrap for now. might need another line about why this is true in the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added extra explanation in 77af8e5

Comment on lines +63 to +69
if subschemas.is_empty() {
return;
}

// Ensure we aren't looking at the subschema with a null, as that is hoisted by another
// transformer.
if subschemas.len() == 2 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two checks could be a slice pattern match perhaps

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I quickly tried both:

match subschemas.len() { ...}
// and
match subschemas.as_slice() { .. }

But it didn't make things any simpler.

Would you mind giving a rough example as a hint?

assert-json-diff.workspace = true
runtime-macros.workspace = true
prettyplease.workspace = true
tokio = { workspace = true, features = ["rt", "macros"] }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be a workspace dep

Copy link
Member

@Danil-Grigorev Danil-Grigorev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'd prefer not to, partly because of the possible merge conflicts, but also because I think they only solve part of the problem without seeing the broader issue.

I believe the #1821 is only mentioning the specific issue with optional enums, and we need to fix that one directly, and not "oh btw I also fixed that". Having separate issues for this PR addresses would help.

// TODO (@NickLarsenNZ): Return errors instead of panicking, leave panicking up to the
// infallible schemars::Transform
let Schema::Object(to_hoist) = subschema_to_hoist else {
panic!("the non-null anyOf subschema is a bool. That is not expected here");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not panic in this code. This produces no linter hints, and gives no trace on which field actually failed. The only correct approach with the chosen path by schemars is to split code into more domain specific parts, and log errors instead of panicking here.

use super::*;

#[test]
fn optional_tagged_enum_with_unit_variants() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this representable with a CRD schema? While pure json seems clear here, it probably does not make sense to test internals of the generated code, as this may change over time. It may also be unrepresentable in a CRD form, in which case we shouldn't try to test such things.

With a CRD based approach, the test is focused on functionality first, so in my opinion this should follow the pattern of crd_schema_test.rs or newly added crd_complex_enum_tests.rs

#[derive(Debug, Clone)]
pub struct AnyOfSchemaRewriter;

impl Transform for AnyOfSchemaRewriter {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this method has no functional differences with implementation in #1853 and therefore it will pass the tests. Yet it is split in two transforms - OptionalEnumSchemaRewriter and AnyOfSchemaRewriter instead of addressing it in one place.

/// NOTE: The trailing null is removed because it's not needed by Kubernetes
/// and makes the CRD more compact by removing redundant information.
#[derive(Debug, Clone)]
pub struct OptionalEnumSchemaRewriter;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems strange to have AnyOfSchemaRewriter and OptionalEnumSchemaRewriter split like this. Also OptionalEnumSchemaRewriter already overrides null containing enums with top level enum. Why is this needed? Can this be produced/tested by some CRD schema?

return;
}

// Ensure we aren't looking at the subschema with a null, as that is hoisted by another
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be driven by the order of transformers, not the manual if->return.

.expect("there should be exactly 2 elements. We checked earlier");

// Get the `any_of` subschema that isn't the null entry
let subschema_to_hoist = match entry_is_null {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless there is a scenario where the null enum produced first in the generated array by CRD definition, it doesn't make sense to test permutations like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CRD generation for optional enums broke with kube 2.0.0

5 participants