Proposal: required fields and related issues

_Originally opened by @mpvl in https://github.com/cuelang/cue/issues/822_

With extensive inputs from @myitcv

We propose a couple of small additions to the language, as well as a sharpening of the language specification, to solve a myriad of shortcomings related to specifying required and optional fields. We believe the result allows for a much more natural representation of required fields in CUE. This proposal also allows us to define tooling in a more consistent manner and lays the foundation for finalizing the query proposal.

The changes are mostly backwards compatible, but:

- may allow configurations that before would fail (where the failure was likely unintended)
- may allow configurations that worked in v0.2 and early v0.3, but fail as of v0.3.0-beta.2, and beyond.
- may require the use of the `-c` flag for `cue eval` and `cue export` to get the same results with the old representation. This  makes this flag consistent with the behavior of `vet` as well.

Automated tooling can be provided to rewrite CUE files.
## Background
We now cover some core CUE concepts necessary to understand the proposal introduced in this document. In some cases, it points out general issues with these topics. People familiar with these concepts can skip this section.
### Closeness
CUE closedness is intended to solve two problems:
- be able to catch typos in field names
- be able to represent mutual exclusivity of fields, like in Protos

A specific non-goal of defining closed structs is to define APIs that promise to never add an additional field. CUE does not offer a structured way to specify this, as it is believed that APIs should be open by definition.

A good analogy of the role of closedness in CUE is to compare it to Protobufs. By definition, protobufs are extendible. Fields with unknown tags on incoming messages should typically be ignored. However, when compiling a proto definition to, say, Go, the fields defined on a proto 
message is closed. An attempt to access an unknown field in a struct will result in a compile error. In CUE, the fact that a definition closes a struct should be interpreted in the sense of a compiled proto message and not as an indication that an API is closed for extension.

In that sense, enforcing mutual exclusivity in protos is a bit of an abuse of this mechanism. Although it has served the representation of protos quite well, it may help to consider alternatives.

### Error types
To understand the below proposal, it is important to realize that CUE distinguishes between several error modes:
1. compile errors: any error that can never be resolved and can be spotted before evaluation. This may be otherwise valid CUE, such as `string + “foo”`
1. permanent errors: an error that occurs during evaluation and cannot be solved by adding more values to the configuration.
1. incomplete errors: an error that could be solved by adding more values, that is making a CUE value more specific.

In the end these are all errors, but CUE evaluation and manifestation will fail at different points when encountering these.

### Evaluation modes
CUE distinguishes two evaluation modes:
#### Unification
Unification combines two CUE values, preserving the entire space of possible values that are the intersection of the two input values. Unification is commutative, idempotent, and associative.

For instance, `cue def` presents the result of unification without any simplifications like picking defaults or resolving references if this would change the meaning of the CUE program.

There can be multiple ways to represent the same schema. For instance `bool` is equivalent to `true | false`. The goal of unification is not to find a normalized representation of the result.  Finding an optimal representation, by some definition, is an option, but not necessity.

#### Default selection
Default selection is used when a full specification of a value is required, but an intermediate concrete value is needed to proceed. Essentially, this happened for references of the form `a.b` and `a["b"]`, where `a` needs to be a concrete list or struct to be able to select the appropriate value.

The steps are as follows:
1. eliminate values with incomplete errors from disjunctions
1. select default values

The differentiating property of default selection is that all optional fields, pattern constraints and closedness information needs to be preserved. This puts limits on the amount of disambiguation that can be done ahead of time. It is the user’s responsibility to disambiguate structs if applicable.

#### Manifestation
Manifestation projects a result from unification to a concrete value. It is used in `cue export`, but also, for instance, for:

- expressions: `x` and `y` in `x + y`
- elements projected by a query: `x.*` (corresponding to the Query Proposal #165)
- arguments for builtins that must be concrete
- when outputting concrete values, like turning into JSON
We will write `manifest(x)` to denote the manifested value of `x`. This is not a CUE builtin.

Right now this encompases:
1. default selection
1. eliminate duplicates values from disjunctions.

Note that after this elimination different structs with different optional values are retained.
## The Issues
We present some issues for the various use cases of CUE that motivated this proposal.
### Encourage good API design
When evolving and augmenting an API, it is generally strongly recommended to not introduce backwards incompatible changes. This means that any added fields to an API should typically be optional.

In CUE, optional fields are indicated by a `?`. This means that when augmenting an API, one should always remember to add the `?`. Especially for larger APIs, required fields should be relatively rare. It would be better if only the required fields had to be marked, while all other fields are optional. This would avoid inadvertently making fields required, and makes it more likely that making a field required is a deliberate choice.

### Specifying minimum and maximum number of required fields
#### The protobuf oneOf representation
OneOf fields in protobufs are currently represented as
```
#Foo: {
    {} | { a?: int } | { b?: int }
}
```
This relies on all three disjuncts ultimately collapsing into the same value, assuming that optional fields can be ignored and the closedness to enforce mutual exclusivity.

The following could be used as an alternative.
```
*{} | { a: int } | { b: int }
```

This approach is a bit of an abuse of the closedness mechanism, whose predominant goals to catch typos. It mostly works, although it does have a few issues. For instance, embedding `#Foo` would allow users to redefine a oneOf field, allowing it to be used in configurations that were not allowed by `#Foo`. To some extent this is okay, considering what closedness is for, and the fact that embeddings disables this check. But it may be nice to address if possible.

In the end, there is currently no explicit “official” way of specifying `oneOf` fields in CUE that can capture the most likely intent. CUE currently deploys a hacky workaround for this, but that is not a permanent solution.


#### Specifying a certain number of required fields
The above mechanism does not generalize to specifying that, for instance, at least one of a set of fields should be given by a user.
#### Policy definitions and querying
One of the core ideas of CUE is that, with the right model, constraints can function as templates, resulting in a powerful combination. The way optional and required fields are marked now works fairly well in this context. These same mechanisms don’t work as smoothly in the context of querying and policy definitions, that is, using CUE to specify filters or additional constraints.

There are currently some gaps in the query proposal that can only be addressed with some refinement in the optionality model in CUE.

#### Requiring fields
Requiring the user to specify a field is currently achieved by specifying a field with a non-concrete value and then requiring non-concrete values to be concrete when exporting. Note that to CUE, non-concrete values are just values. So, in effect, this mechanism is also an abuse of the current semantics. Better would be to mark fields as required explicitly.
#### Requiring concrete values
A related problem is the ability to require the user to specify a concrete value. This is currently not possible. For instance

```
#Def: intList: [...int]
```
says that `intList` must be a list of integers. But since `[...int]` defaults to `[]`, CUE will happily fill in this value if a user doesn’t specify it. Similarly, requiring that a user specify a concrete value is not possible: unifying a user-supplied value with
```
#X: {
    kind: "X"
    ...
}
```
would just happily fill in the missing `kind`.

#### Error messages
A side-effect of having a mechanism to explicitly specify a field as required is that it will also lead to better error messages. Currently, when CUE discovers a non-concrete message it just complains about that: it has no way of knowing the author’s intent.

Having an error message explicitly state that the error is a missing field would be considerably clearer.

### Evaluation
#### Disambiguating disjuncts
CUE currently only disambiguates identical disjuncts. This means that disjuncts with different optional fields will not disambiguate. The following is an amended example from #353
```
e: #Example

e: a: "hello"

#Example: {
	a: string
	{
		value?: string
	} | {
		externalValue?: string
	}
}
```
As the two values have two different sets of optional fields, they are not considered equal.
This can result in confusing error messages to the user, like `incomplete value `{a: 1} | {a: 1}`, for instance when running a command that does not show optional fields, like `cue eval` or `cue export`.
CUE does not do this as disambiguating optional fields is NP-complete in the general case. It is nonetheless confusing for the user.
A similar issue occurs with regular (non-optional)  fields.
This can be solved by giving users better tools to disambiguate as well as having more permissive disambiguation.
### `cue` commands
#### Avoid confusion for users
CUE currently allows non-concrete values in the output.  Unifying
```
a: int
b?: int
```
and
```
b: 2
```
succeeds with vet, because for CUE `int` is a valid value. The `-c` option, will verify values are concrete. For `cue export`, however, values must be concrete to succeed.

Overall, there are some discrepancies in how optionality and non-concreteness is handled across commands. A more consistent handling across commands would be useful.

#### What is `cue eval` for? How does it differ from cue export? More clarity for CUE commands
`cue eval` prints a form of CUE that is somewhat in between schema and concrete data. It may not even be correct CUE. The purpose of this was to get good feedback for debugging. But as CUE evolved this output has become increasingly confusing. A clearer model of output modes is in order.

A more consistent output with more explicit debugging options shared between commands seems less confusing.

## Proposal
This section gives a brief overview of the proposal. Details are given in the next section.
### Required fields
We introduce the concept of a required field, written as `foo!: bar`, which requires that the field be unified with a namesake regular field (not required and not optional) which has a concrete value. A field that violates this constraint results in an “incomplete” error (the failure could be solved by adding a concrete value of the field).

Consider this simple example:
```
#person: {
    name!: string
    age: int
}

jack: #person & {
    name: string // incomplete error; string is not concrete
    age:  int    // ok
}
```
Further examples can be seen below.

A required field can be referenced as if it were a regular field:
```
a!: string
b: a   // string

c!: "foo"
d: c   // “foo”
``` 
A required field may have a concrete value. The same limitation holds for such fields, meaning that it must be unified with a regular field with that exact same concrete value.


### `numexist` builtin

We introduce a `numexist` builtin with the signature
```
numexist(<num>, <expr>+)
```
which takes a numeric constraint and a variable list of expressions and returns
`_`:  if the number of expressions evaluating to a concrete value unifies with `num`.
 `_|_` otherwise, where the error is an incomplete error if the number could still be satisfied by having more concrete values.
When evaluating the expressions, incomplete errors are ignored and counted as 0.

For instance, a protobuf oneOf with fields named `a` and `b` could be written as:
```
#P1: {
    numexist(<=1, a, b)
    a: int
    b: int
}
```
There is nothing special about the arguments `a` and `b` here, they are just resolved as usual. In this case, `numexist` passes as both `a` and `b` evaluates to `int`, and thus the number of concrete values is 0, which matches `<=1`. It would fail with a fatal error if `a` and `b` were both, say, `1`.

Requiring that at least one of a set of fields is non-concrete can be written as:
```
#P2: {
    numexist(>=1, a, b)
    a: int
    b: int
}
```
In this case, `numexist` fails with an “incomplete” error, as the condition can still be resolved by setting either `a`, `b`, or both to a concrete value.

Ignoring incomplete errors is needed to allow the same construct for fields that have struct values:
```
#P3: {
    numexist(<=1, a, b)
    a?: #Struct
    b?: #Struct
}
#Struct: { c: int }
```
without making `a` and `b` optional fields, they would both always evaluate to a concrete value. For consistency, it is probably good style to always mark fields referred to by this builtin as optional.

When optional fields are used, as in `#P3` above, we could consider another builtin `numexists` that checks for the existence of a reference, instead of concreteness.

### More specific manifestation
We propose adjusting manifestation as follows (new or changed steps emphasized):

1. default selection
1. eliminate optional fields, pattern constraints
1. [optional]: drop definitions, hidden fields and/or fields with non-concrete values.
1. eliminate duplicates values from disjunctions. 

Step 3 can be optional and only affects the outcome insofar it may lead to different disambiguation for structs. Which to choose may depend on the output mode requested by the user.

Unlike with unification, there is logically only one correct representation (disregarding field order  for now).

### Tooling alignment
The introduction of required fields and a more specific manifestation can be used to simplify the commands of the tooling layer.

The biggest change is that we will generally _not_ interpret non-concrete fields as errors, making them behave somewhat similar to optional fields today. The old behavior would still be available through the `-c` option. Some commands, like `cue vet` have always taken this interpretation. So this change will bring more consistency between commands, while addressing many of the above errors.

Details of the command alignment is discussed below.
## Detailed design and examples
### Required fields
#### Semantics
A required field, denoted `foo!: bar`, requires that the field be unified with one or more regular fields (not required and not optional) giving a concrete result. A field that violates this constraint results in an “incomplete” error. 

For instance, consider this example:
```
x!: {
    a!: string
    b: int
}
```
This would require that the user specify field `x` which must have a field `a` with a concrete value. Dropping the `!` from `x` would mean that users don’t have to specify a field `x` but when they do, they also need to specify a concrete value for `a`.

The required constraint is a property of fields. Here are some examples of how required fields unify with regular fields and other required fields:
```
{foo!: int} & {foo: int}   	→  {foo!: int}
{foo!: int} & {foo: <=3}   	→  {foo!: <=3}
{foo!: int} & {foo: 3}  	→  {foo: 3}

{foo!: 3} & {foo: int}   	→  {foo!: 3}
{foo!: 3} & {foo: <=3}   	→  {foo!: 3} & {foo: <=3}
{foo!: 3} & {foo: 3}  	→  {foo: 3}
```
A total ordering of all types of fields can be expressed as
```
foo?: int > foo: int > foo!: int > foo!: 1 > foo: 1
```
Note that `{foo!: 3} & {foo: <=3}`: cannot be simplified further. The definition requires that the values of the regular fields together be concrete. Logically, `{foo: <=3}` could unify with `{foo: >=3}` to become `{foo: 3}` (it represents the same set of numbers), so rewriting `{foo!: 3} & {foo: <=3}` as either `{foo!: 3}` would result in loss of information. Retaining this distinction is important to keep associativity and a valid lattice structure.

#### Limitations
The definition imposes some limitations on the use of the `!` flag. For instance, if a definition were to have both a field `foo!: 3` and `foo: 3`, the latter would remove the requirement for the user to specify a field, because `foo: 3` satisfies the requirement of `foo!: 3` to provide a concrete value. It doesn’t matter that this comes from the same definition. In general, when marking a field as required, it is advisable that all other constraints on the same field in a definition (or at least those with concrete values) should specify the `!` as well. These cases could be caught with a `cue vet` rule, requiring that in a single definition all fields must be either required or not. This would also help readability.

#### Implementation
_We now present a possible implementation. Understanding this section is not key to the proposal._

The implementation of required fields could be done with an “internal builtin” `__isconcrete` defined as follows:

`__isconcre(x)` if unified with a value `y` returns
a fatal error if unification fails
`_` if it succeeds and `y` is concrete
an incomplete error otherwise

This would be implemented along the lines of current validators (like `<10`) and the proposed [`must`](https://github.com/cuelang/cue/issues/575) builtin.

Example rewrites:
```
foo!: int   	→   foo: __isconcrete(int)
foo!: 3   	→   foo: __isconcrete(3)
foo!: {name: int} 	→   foo: __isconcrete({}) & {name: int}
foo!: [...int]	→   foo: __isconcrete([...]) & [...int]
```
Also `__isconcrete(int) & __isconcrete(3)` → `__isconcrete(3)`.
After this conversion, unification proceeds along the usual lines.

We opted for a syntactic addition to the language and not to have the user define this builtin directly, as its usage is rather confusing. For nested structures we only care about concreteness at the top level. So using `__isconcrete({name: int})` would work, but would do unnecessary work and also give the impression that field `name` itself should be concrete.

Another reason is that people will usually think of this functionality as specifying a `required` field. Technically speaking, though, as CUE treats non-concrete values as “present”, it should be called “concrete” and not “required”. The use of `!` avoids this issue.
#### Implications for `?`
Note with the addition of `!`, the use of  `?` would be eliminated in most cases. As we saw for the proposal for `numconcrete`, however, there are still some use cases for it.  For this reason, as well as backwards compatibility, we expect that `?` will stay.

#### Example: querying
Regardless of whether subsumption or unification is used for querying (see #165), it is currently not possible to specify a field should be concrete but of a certain set of values. 

However, with the `!` constraint, we can write
```
a.[_:{name!: =~”^[a-z]”}]
```
to query all values of `a` that have a name field starting with a lowercase letter. 

Note, the `!` would not be necessary when using subsumption and querying using concrete values like `name: “foo”`. It would also not be necessary if all values in `a` can be assumed to be concrete or if including non-concrete values is desirable.

#### Example: policy specification
It is currently awkward to write that a value should either have value of one kind or another. For instance:
```
{a: >10} | {b: <10}
```
would match both variants if a user didn’t specify any of the fields (assuming `a` and `b` are valid fields).

Using the exclamation mark (`{a!: >10} | {b!: <10}`) would explicitly require that one of these fields be present.

#### Example: require user to specify values 
This proposal allows for requiring a configuration to specify concrete values:

* `a!: [...]`
* `a!: {...}`
* `a!: 1`

This can be useful, for instance, to require the use to explicitly specify a discriminator field: 
#### Example: discriminator fields

The use of `!` can signal the evaluator the presence of discriminator fields. This could form the basis of a substantial performance boost for many configurations. Tooling could later help to annotate configurations with such fields (detecting discriminator fields is akin to anti-unification).

Consider:
```
#Object: {
    kind!: string
}
#Service: #Object & {
    kind!: “Service” // Require the user to explicitly specify this.
}
```
In this case, the user is required to explicitly specify the discriminator field. But even if the `!` is dropped from `#Service.kind`, as a convenience to the user, the `!` from `#Object` would still signal that this is likely a discriminator field. The same holds if a `#MyService: #Service & kind: “Service”` instantiation would make the field concrete.

#### Example: at least one of
The required field can be used to specify that at least one of a collection of fields should be concrete:
```
#T: {
    {a!: _} | {b!: _}
    a: int
    b: int
}
```
If both were concrete, the resulting value will be identical and duplicate elimination will take care of the disambiguation.

### `numconcrete` builtin
The `numconcrete` builtin allows mutual exclusivity of values where this would be hard or expensive to express using the required field construct introduced above. It has the additional benefit that it more directly conveys the intent of the user.

In many cases it is possible to express the required field annotation in terms of this builtin. In For instance, `a!: int` could be written as
```
numrequired(1, a)
a: int
```
However, it is not possible to express requiring the user to specify a specific concrete value, such as `a!: 1`. For this reason, as well as for convenience in specifying required fields in policy definitions, it makes sense to have both constructs.

Things are a bit tricker when requiring fields of structs or lists to be concrete. As these are always concrete, all values would pass. 

#### Need for `?`
In the current proposal, we rely on `?` to allow specifying a non-concrete struct for a field. Alternative, this could be done by a builtin, like `must`:
```
#P: {
    numconcrete(<=1, a, b)
    a: must(a&#Struct1)
    b: must(b&#Struct2)
}
```
or some dedicated builtin. The must builtin (as proposed in #575) would essentially remain incomplete as long as it is not unified with a concrete value.

The use of `?`, though, seems more elegant.

#### `numvalid`
Considering the need for `?`, it may be useful to also have a builtin `numexist` which counts the number of valid references (values that are not an error). This has the advantage that it will enforce the consistent use of `?` for those fields that are part of a oneOf, making them stand out a bit more. In the case of Protobuffers, this works well, and may more accurately reflect intent.

We considered using `numvalid` instead of `numexist`, but it does not cover all cases correctly. A separate builtin proposal should lay out all the builtins and their relevant consistency.
#### Naming
`numexist` seems to accurately describe what the builtin does. It may not be the most intuitive naming, though. `numrequired` seems on the surface to be a likely candidate, but it would be a misnomer, as this is specifically not what that means. (In fact, the term required field is also not quite correct, even though it conveys its predominant use case.)

An possible name, albeit less descriptive, could be `numof`. As may be confusing, however, as it alludes to the JSON schema constructs `anyOf` and `oneOf`, which are different in nature. In that sense `numof(count, ...expr)` would seem more to indicate a disjunction (like the `or` builtin), where the user can indicate the number of disjuncts that should unify.

#### Implementation
`numconcrete` is a so-called non-monotonic construct: a failing condition may be negated by making a configuration more specific. CUE handles these constructs in a separate per-node post-evaluation validation phase. It must be careful to not return a permanent error if a resolution of the constraint is still possible. In this regard, it will be similar to the implementation of `struct.MinFields` and `struct.MaxFields`.

Other than that, the implementation is straightforward: evaluate the expressions, count the number of concrete values (non-recursively), counting incomplete errors as 0. The latter is necessary to allow non-required fields of structs and lists to fill this pattern.

#### vet rules
It may be surprising to the user that
```
numexist(>=1, a, b)
a: {}
b: {}
```
always passes (`{}` is always considered concrete). A vet rule should detect use cases where an argument is always concrete, and suggest the use of `?` or possibly `numexists`.

#### Example: protobuf

To annotate protobuf oneOfs, under this proposal one could write
```
#X: {
    *{} | {a!: int} | {b!: int}
}
```
Note that this doesn’t change all that much for the current model and the current model would still work. The main difference is that it enables a stronger hint for early elimination of alternatives to the CUE compiler.

This notation also doesn’t address the embedding issue: it is still possible to add a field that before was mutually exclusive, even overriding its original type. For instance:
```
#Y: {
    #X
    a: string
```
would evaluate to
```
#Y: *{a: string} | {a: string, b!: int}
}
```
Arguably, this comes with the territory of using embedding, which gives the power of extension, but disables checking: it is already the responsibility of the user to embed with care. Also, one can imagine having a `vet` check to guard against such likely  mistaken use.

That said, the `numexist` builtin would allow writing the above proto definition as:
```
#X: {
    numexist(<=1, a, b)
    a?: int
    b?: int
}
```
The use of optional fields here is unnecessary, but are needed to cover the general case, for instance if `a` and `b` were fields with a struct value. One advantage is that this sets these fields visually apart from fields that are not subject to concreteness counts.

Note how this closely resembles the “structural form” as used for OpenAPI for CRDs. 

The resulting type cannot be extended to redefine `a`. Also, the definition gets rid of disjunctions, and defines the intent of the constraint more clearly. This, in turn, can help the CUE evaluator be more performant.

### Evaluation modes
We now briefly discuss the phases of CUE evaluation, to show how they will interplay with optional and non-concrete fields.
#### Manifestation
We proposed that in the manifestation phase we disambiguate disjuncts based on concrete values only. It will be important to not leak this final step into earlier phases of evaluation, such as default selection. Doing so may, for instance, cause a disjunct with arbitrary optional fields to be used for closedness checks.

Consider this example:
```
a: { b: {foo?: >=1} } | { b: { foo?: <1 } } 
```
Purely on the basis of concrete values, these two are identical. However, simply picking the first or second when resolving `a.b` would give different results for `a.b & {foo: 1}`.

Doing disambiguation early, however, has quite considerable performance benefits. An implementation can work around this by clearly marking a result as ambiguous. For instance, deduped elements can have counts of the number of values that were collapsed into it.

#### Example: Disambiguating disjuncts.
With the new semantics of manifestation, the current example from issue #353 resolves as expected: consider
```
e: #Example

e: a: "hello"

#Example: {
	a: string
	{ value?: string } | { externalValue?: string }
}
```
In this case all two disjuncts in the resulting disjunction will have the same concrete values. This will even be the case for non-optional fields if we allow non-concrete fields to be ignored.
#### Example: Schema simplification
A typical Kubernetes definition now looks like
```
#Object: {
    kind: string
    foo?: int
    bar?: int
} 
```
littered with question marks.

Under this proposal, the above can be written as
```
#Object: {
    kind!: string
    foo: int
    bar: int
} 
```
This eliminates the majority of uses for `?` and marks required fields more explicitly.
### Printing modes

- cue <selection>
- cue export <selection>
- cue def <selection>

#### `cue` printing
The majority of use cases in cue seem to be to use `cue export` or `cue eval` along similar lines.

We propose a “default command” `cue <selection>` which uses manifestation with the following specification:

- Non-concrete values are dropped by default. Disjunctions are disambiguated based on their value after removal.
- Lists and structs that were only defined in definitions and that have no elements or fields with concrete values, recursively, are dropped from the output.

Omitting non-concrete values from the output avoids flooding the printing from non-concrete fields merged in from definitions. For similar reasons CUE omits optional fields (those with a `?`) from the output today. So the choice to omit non-concrete values is a logical consequence of allowing users to omit the use of `?` in the majority of cases in schema.

Ignoring non-concrete values when exporting is a departure from how `cue export` works and how generally constraints are enforced in CUE. To provide backwards compatibility users could use the `-c` flag to
- fail when fields are not concrete for export
- print non-concrete values unconditionally in CUE mode.


Printing modes:
- -D: also print definitions. (TODO: evaluated in schema mode or manifestation mode)
- -H: show hidden fields of the _current package_
  - TODO also show hidden fields of imported packages? Probably not. We could do so if we supported `_foo#bar`-style package classification. This may be a debug option to reflect it is not valid CUE as of this moment.
- -A: show attributes
- -O: show optional fields (will be rare because we anticipate that ? might become obsolete)
- -C: print comments
- -i: ignore errors and show in context of an evaluation.
- -a: show non-concrete values (essentially showing what other values can be set).
- --debug=<key>=<value>: useful debugging information in the form of comments below each field and value
  - original expressions/ conjuncts
  - line information
  - default values
  - dependencies
  - hidden fields from other packages
  - etc.

The default printing mode is CUE. Output formats can be chosen with the flag `--out`.
#### `cue export`
Much of the current functionality of `cue export` is reproduced in command `cue`. We propose to repurpose the `export` command adding the functionality described below.

##### Backwards compatibility
`cue export` would differ in one major way: non-concrete fields would be omitted from export; only those explicitly marked as required would result in an error when not concrete. 

The old behavior can be obtained by using the `-c` flag, just as one would have to use it today for `cue eval` and `cue vet`, making behavior between all of these more consistent.

##### File output
`cue export` would otherwise be repurposed to be the inverse of `cue import`.  It would be like command-less `cue`, but would interpret `@export` attributes to generate specific files.

More specifically, any struct may contain one or more export attributes that evaluates to a CUE “filetypes” specifier to direct export to write a file of a certain type.

Consider the following CUE file.
```
a: 2 + 3
baz: {
    @export("baz.json")
    b: a
}
bar: {
    @export("jsonschema:/foo/bar/bar.json")
    string
}
```
This would instruct `cue export` to generate two files.

By default the files will be exported in the[`txtar`](https://pkg.go.dev/golang.org/x/tools/txtar) format as follows:

```
// File
import baz ":/foo/bar/baz.json”
import bar "jsonschema:/foo/bar/bar.json”

a: 5
"baz": { baz, @export("baz.json") }
"bar": { bar, @export("jsonschema:bar.yaml") }
-- baz.json --
b: 5
-- bar.yaml --
type: string
```
Note that `bar.yaml` represents a JSON schema here.

The comment section of the txtar output (above all files) would describe how to reconstruct the original file from the generated files. Note that this example utilizes several currently unsupported features, like JSON imports and JSON Schema output.

Options:
- -z: write the output to a ZIP archive
- -u/--update: actually generate the files

#### `cue def`
The define command `cue def` will remain as the main way to simplify CUE schema without resolving them to concrete values.

It is allowed to simplify disjunctions (retaining semantics), but it may be on a best-effort basis. Default values are retained

Special features to be addressed in a different design doc could include:
- Make self-contained unification versus retaining imports. (-S)
- Do potentially expensive simplifications.


### Query extension
One big motivation for this proposal is to narrow down some of the gaps needed for the query proposal (see #165). One such gap was how to define queries and whether such queries should only return concrete matches or everything.

It is not the goal of this proposal to fully define these remaining gaps, but at least to show in detail how the main proposal interacts with this proposal and solves some remaining puzzles.

#### Selectors
CUE is different from other query languages in that a user can query concrete and non-concrete values. At the same time, we would like to avoid confusion between these two modes of operation. In particular, we need to define what to expect for selecting values in projections.

CUE currently has one form of selection: `a.foo`. To understand the various modes of selection in projections, we also propose a variant of this: `a.foo?`. For regular selection they are defined as follows:
1. a.foo: as it is today:
  1. the value for `foo` if it exists in `a`.
  1. an incomplete error if it does not exists, but is allowed and could be added later
  1. a  fatal error if `foo` is never allowed in a
1. a.foo?: like `a.foo`, but instead of an incomplete error it would return the constraints for `foo` if `foo` were defined as `_`.

So `a.foo?` where `a` is a struct is equivalent to `(a&{foo:_)).foo`, and `b.1?`, where `b` is a list (allowing integer indexes for lists in selection) is equivalent to  `(b & [_, _] )[1]`.  The `foo?` variant works around a common issue reported by users. 

Now pulling in the query proposal (#165), let’s consider the equivalent for projections, that is, how these selectors behave for a “stream” of values that is the result of a query. Firstly, we expect that typical query usage will either center on querying concrete data, or on API definitions that have not been manifested. 

To facilitate this view and translating the semantics of selectors to that of projections, we assume that _for the purpose of projections_ a non-concrete value (like `<10`) is treated as an “incomplete” error.

Given this definition, we can then define:
1. a.*.foo:     value is dropped for incomplete errors and if the value is *non-concrete (does not apply recursively)*
1. a.*.foo?:   also allow non-concrete or optional (match pattern constraints)

Forms 1 and 2 will fail if an illegal value is selected (fatal error).

The semantics of treating non-concrete values as an incomplete error when querying was partly chosen to be consistent with the proposed default mode for `cue` commands to silently ignore non-concrete values (unless the `-c` option is used), making behavior consistent and predictable across the spectrum.

Note that there is precedence within CUE to expect values to be concrete. For instance, operands most binary expressions (except `&` and `|`) will result in an incomplete error when not concrete. The proposed semantics for queries is identical.

#### Querying with subsumption (instance-of relation)
The `!` notation solves an issue with allowing a value to be used as subsumption in query filters. Consider the following:
```
a: {
    foo: { name: string }
    bar: { name: “bar” }
}
```
Using a subsumption filter `{name: string}` would also match `foo`, as it is, strictly speaking subsumed. Using `!`, we can work around this:
```
query: a.[:{name!: string}]
```
will select only `bar`.

We could require that if a field is specified in a query we required it to be concrete. That is a bit of an odd rule, though. The `!` notation seems a natural solution.

#### Subsumption variants
Subsumption in the general case is an expensive operation. Using the definition of the different evaluation modes, however, we can distinguish two different kind of subsumption:
1. subsumption patterns without pattern constraints (such as in the above example)
1. subsumption patterns with pattern constraints.

Note that closed structs are defined in terms of pattern constraints, so any closed struct classifies as 2.

Patterns of type 1 would be executed as actual subsumption. 

For patterns of type 2, however, we would require that the queries values must have been _explicitly unified with this value_. For instance, the query
```
query: a.[:v1.#Service]
```
would search for any value in `a` that were unified with `v1.#Service`. So a value that has all the same fields in a as a `#Service` would still not match unless it was unified with this explicit definition. In effect, this introduces a notion of explicit typing, rather than just relying on isomorphic equivalence.

Such selection is easy to implement efficiently, and may be a good compromise. 

## Transition
Although this is a big change to the language, we foresee a relatively smooth transition. The meaning of `?` would largely remain unaltered. 

### Phase 0
Introduce the new disambiguation semantics. This should be done before v0.3. Although somewhat different, v0.2 has similar semantics, and introducing this before a final release will allow for a smoother transition.

### Phase 1
In phase one we would introduce the `!` annotation and the `numconcrete` builtin to work as proposed.  
### Phase 2
Add an experiment environment variable, like CUE030EXPREQUIRED to enable the new semantics eliding non-concrete fields. In this mode, the `-c` flag would mimic old behavior. 

A `cue fix` flag allows users to rewrite their CUE files to the new semantics.

If the flat does not allow for a fine-grained enough transition, we could consider defining a transitionary field attribute to define the interpretation of such field on a per-field level.
### Phase 3
Decide on whether to proceed with step 4.
### Phase 4
The biggest change will be moving to relaxing the rules for non-concrete fields and moving away from excessive usage of `?`. This would be done as a minor pre-1.0.0 change (e.g. v0.4.0 or v0.5.0).

The biggest issue is for installations that rely on not using `?` meaning required. It will be good to ask users whether the use of `cue fix` and/or `-c` is sufficient or whether an API feature is supported as well.

Adding a feature to `cue fix` to “promote” fields on a one-off basis would be straightforward. Generated configurations could just be regenerated.

Note that default values and other concrete values specified in definitions would still be printed. It is only the non-concrete values that are omitted.

The removal of `?` may also have performance implications, as CUE processes them differently internally. The implementation can be adjusted however to overcome performance issues. Experience with v0.2, however, which processed optional fields similarly to regular fields, showed that the performance impact of this is relatively small. Structure sharing can further mitigate this issue, and we should probably ensure this is implemented before the transition.

## Alternatives considered
### Alternative disjunction simplifications
We also considered eliminating non-concrete values from disjunctions. For instance, at manifestation time (only!):
```
a: int | 1
```
could in such a case be simplified to `a: 1`. This would obviate the need for the default marker in this case.

The overall intuition here is that this would be weird, though.

By extension, we also chose not to simplify
```
{ a: 1, foo: int } | { a: 1, bar: int }
```

To achieve this effect without using defaults, users would have to write
```
{ a: 1, foo?: int } | { a: 1, bar?: int }
```

A variant that _would_ allow such simplification is open for consideration, though, especially if it can help fully deprecating the use of `?`.

### Other definitions of foo!: int
We have considered the following meanings of `foo!: int`
#### `foo!: int` as an optional field
`foo!: bar` is an optional field that must unify with any concrete field to be valid.

It sort of would still work if people would diligently use `?` for fields in definitions, but this would defeat one of the main purposes of introducing `!` in the first place.

But logically the `foo: int > foo!: int` relation makes sense, as the `!` constrains the regular field. This alone gave too much of a contradiction to work well.

#### require to be unified with a non-definition field with concrete value
The main advantage of this approach is that it would allow adding the required constraint to a schema that already defines that field as a concrete value.

The main drawback of this approach is that it is not possible to create a derivative definition of a schema that defines a required field  that fills in the required field for the user. 

For instance, suppose we have a definition 
```
#Service: { kind!: string }
```
And we create the derivative:
```
#MyService: #Service & {
kind: “Service” // still not set
}
```
then `kind` would still not be set.

It seems that this should be possible, though. Specifying a concrete field in this case is akin to setting a default value. In general it is considered good CUE style to define defaults separately from an API, so this would be consistent with that definition.

Also, although the CUE evaluator can track the origin of fields quite easily, there is no representation for a “field that has already been unified with a concrete field.
#### require to be unified with a non-definition field and a concrete value
The distinction from the former is that the concrete value may originate from a definition. This definition however, is not associative.
### Required indication on the right-hand side
We considered writing `foo: int!` or `foo: required(int)` instead of `foo!: int` making the requiredness a property of the value instead of the field.

This didn't sit well with the requirement of needing to unify with a concrete value from an optional field: `{foo?: 1} & {foo: int!}` would be hard to represent correctly in CUE. A goal is to allow representing evaluation results in CUE itself, but we did not see a way to accomplish that here.

### List type
One motivation for this proposal was for the ability to define a non-concrete list. For this we considered introducing Go style
```
a: []T
b: [<10]T
```
types. 
However, indicating the size can already done by syntax introduced in the query notation:
```
a: [<10]: int
```
which would just be a generalization of CUE.

Also, this would still not solve the same problem for non-concrete structs or scalar values.

So overall, this would be a heavy-weight solution for little benefit.

### Querying using unification
The `!` operator would also be useful for querying values using unification. Normally, a query like `a.[: {name: 2}]` would produce quite unexpected results: it would unify with any element that either has `name` set to `2` _or_ for which `name` is undefined, possibly setting that value to `2`.

To avoid this, users could write `a.[: {name!: 2}]`. 

We considered this to be too cumbersome and surprising to be a viable solution, though.

### Alternate definitions for querying by subsumption
There are really many variants possible. We mention a few. All of these could be considered.
#### Definitions as types
1. If the subsumption is a definition, the subsumed instances must have unified with this value.
1. For non-definitions, we do a full subsumption, but put restrictions on what values are allowed.

This would allow queries like 
```
a.[:{a: [string]: name!: <”Q”}]
```
but perhaps not more funky queries using pattern constraints.

#### Definitions as types, subsume concrete values only
If the subsumption is a definition, the subsumed instances must have unified with this value.
For non-definitions, we only match the manifestation (concrete values) of the value.
This would allow full pattern matching.

#### Always subsume manifested values only
This would allow unrestricted unification. This seems limited, though, as people may want to query APIs with certain properties.

The syntax `a.*.[:{}]?` could be used to query the non-manifested value. Similar restrictions may still have to be applied to subsumption in this mode though, though they would typically be irrelevant to the casual user.

#### Alternative semantics for projection selectors
We’ve considered a more direct correspondence between selectors for projection and regular selection. This results in the following definitions
1. a.*.foo:     fatal error if not allowed, dropped for incomplete errors, otherwise value returned (including non-concrete). 
1. a.*.foo?:   also allow non-concrete or optional (match pattern constraints)
1. a.*.foo!:    like a.*.foo, but additionally filters for concreteness..

For the common use case of requiring concrete data, this would mean that users would have to almost always use the third form. This seems undesirable and will likely result in too many gotchas. In the end, we were able to get the desired behavior for selectors in projections by only considering a non-concrete value to be an “incomplete” error. This seems to be a reasonable solution. Consider also that interpreting a non-concrete value as incomplete already happens at various points in CUE evaluation.


Proposal: required fields and related issues #822

Description

Background

Closeness

Error types

Evaluation modes

Unification

Default selection

Manifestation

The Issues

Encourage good API design

Specifying minimum and maximum number of required fields

The protobuf oneOf representation

Specifying a certain number of required fields

Policy definitions and querying

Requiring fields

Requiring concrete values

Error messages

Evaluation

Disambiguating disjuncts

cue commands

Avoid confusion for users

What is cue eval for? How does it differ from cue export? More clarity for CUE commands

Proposal

Required fields

numexist builtin

More specific manifestation

Tooling alignment

Detailed design and examples

Required fields

Semantics

Limitations

Implementation

Implications for ?

Example: querying

Example: policy specification

Example: require user to specify values

Example: discriminator fields

Example: at least one of

numconcrete builtin

Need for ?

numvalid

Naming

Implementation

vet rules

Example: protobuf

Evaluation modes

Manifestation

Example: Disambiguating disjuncts.

Example: Schema simplification

Printing modes

cue printing

cue export

Backwards compatibility

File output

cue def

Query extension

Selectors

Querying with subsumption (instance-of relation)

Subsumption variants

Transition

Phase 0

Phase 1

Phase 2

Phase 3

Phase 4

Alternatives considered

Alternative disjunction simplifications

Other definitions of foo!: int

foo!: int as an optional field

require to be unified with a non-definition field with concrete value

require to be unified with a non-definition field and a concrete value

Required indication on the right-hand side

List type

Querying using unification

Alternate definitions for querying by subsumption

Definitions as types

Definitions as types, subsume concrete values only

Always subsume manifested values only

Alternative semantics for projection selectors

`cue` commands

What is `cue eval` for? How does it differ from cue export? More clarity for CUE commands

`numexist` builtin

Implications for `?`

`numconcrete` builtin

Need for `?`

`numvalid`

`cue` printing

`cue export`

`cue def`

`foo!: int` as an optional field