Skip to content
Closed
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
239 changes: 239 additions & 0 deletions text/0000-disjoins.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,239 @@
- Start Date: (fill me in with today's date, YYYY-MM-DD)
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)

# Summary

This is a resurrection/expansion of [an older RFC](https://github.com/rust-lang/rust/issues/8277).

Rust supports both kinds of composite algebraic data type: product types
(structs) and sum types (enums). Rust has anonymous structs (tuples) but no
anonymous enums, leaving a gap in it's type system.

named anonymous
------------------------
| | |
products | structs | tuples |
| | |
------------------------
| | |
sums | enums | ?? |
| | |
------------------------
| | |
exponentials | functions | closures |
| | |
------------------------

This RFC proposes to add anonymous enums to Rust and suggests naming them
disjoins (as in disjoint unions).

```rust
let foo: (char | i32 | i32) = (!|!|123);
match foo {
(c|!|!) => println!("char in position zero: {:?}", c),
(!|i|!) => println!("i32 in position one: {}", i),
(!|!|i) => println!("i32 in position two: {}", i),
};

let foo: (char|) = ('a'|);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any use for a single element disjoin? Single element tuples are already weird

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be weirder without them (both single element tuples and disjoins that is).

match foo {
(c|) => println!("char in position zero: {:?}", c),
};

let foo: ! = panic!("no value");
match foo {
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this'll cause the unreachable lint to warn. I'm not sure what this means for generics.

fn test<T>(f: &Fn() -> T) {
    let x = f();
    drop(x); // unreachable if T == !, but no useful operations are possible on T anyway?
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is an issue then it's also an issue with enum Never { } which you can already write. (I don't believe there's any issue.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the match is reachable for enum Never { }, it just is a noop. But for a diverging function it makes no sense to match on it's result, and if it did, you would never reach the match

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the match is reachable for enum Never { }

No it's not. enum Never {} can never exist so a function that returns it is a diverging function.

But for a diverging function it makes no sense to match on it's result

Yes it does. A diverging function doesn't return a value so its return type is the type of no values which can be matched with an empty match statement. The following is currently valid rust code:

enum Never {}

fn foo() -> Never {
  panic!("oh no");
}

fn main() {
  let x = foo();
  let y: String = match x {}:
}


```

The syntax is chosen to look like tuples, but with pipes instead of commas to
signify OR instead of AND.

# Motivation

Disjoins fill an analogous role to tuples. They're useful where the programmer needs a single-use type who's usage will be localised to a small area of code.

For example, consider this code:

```rust
fn some_function() -> Result<i32, io::Error> {
...

fn inner_helper_function() -> Result<char, (io::Error | Utf8Error)> {
...
}

match inner_helper_function() {
Ok(c) => ...,
Err(e) => match e {
(io_err|!) => return Err(io_err),
(!|u8_err) => ...,
},
};
...
}
```

Here, defining a type `enum IoOrUtf8Error { ... }` would have been possible,
but would have been overkill because it would only have been used in one place.
The type would also had to have been defined somewhere outside of
`some_function` which would have spread out the relevant code and made it less
readable.

Disjoins are also useful in situations where having unnamed variants is the
natural choice. For example, with disjoins it would be possible to define a
one-hole-context type operator.

ohc!(T, (i32, T, T)) ==> ((!, T, T) | (i32, (), T) | (i32, T, ()))

Like with tuples, disjoins should only be used where the meaning of the
variants will be obvious. For things like argument/return types on public
methods, named enums should be used instead.

Another motivation for disjoins is that they require a syntax for the empty
disjoin type. This makes them the perfect vehicle for bringing `!` into the
type system.

## The `!` type

I've prosthelytised before that `!` should be treated as a type but have been met
with skepticism. So I'll start with what I think is an accurate analogy. In
C/C++, `void` is in essence a type like any other. It is the trivial type with
one value, equivalent to a struct type with no members and can be trivially
cast from any other type (by simply throwing away the value). However it can't
be used in all the normal positions where a type can be used. This breaks
generic code (eg. `T foo(); T val = foo()` where `T == void`) and forces one to
use workarounds such as defining `struct Void {}` and wrapping `void`-returning
functions:

```c
Void foo_wrap() {
foo();
return {};
};
```

If in the 1960s, when PL theory was young, someone had suggested to Dennis
Ritchie to allow `void` to be used a function argument, as a struct member and
everywhere else a type could be used, he may well have said something like
"That doesn't make sense. `void` isn't a type, it's just a special syntax for
declaring functions with no return value, why would you want a `void` function
argument anyway?". So instead, the `void` type gets treated as a second class
citizen, adding extra hassle and complexity to the language for no benefit.

Fast-forward fifty years...

Rust, building on decades of experience, decides to fix C's shortsightedness
and bring `void` into the type system in the form of the empty tuple `()`.
Rust also introduces a new composite data kind, dual to the notion of structs,
in the form of enums. These are a somewhat innovative feature, most major
languages don't have them (eg. C), and those that do often don't permit empty
enums like Rust does (eg. Haskell). However Rust also introduces a syntax for
declaring functions that never return: `fn() -> !`. Here, `!` is in essence a
type like any other. It is the trivial type with no values, equivalent to an
enum type with no variants, and can be trivially cast to any other type
(because it has no values). However it can't be used in all the normal
positions where a type can be used. This breaks generic code (eg. `fn foo() ->
T; let val: T = foo()` where `T == !`) and forces one to use workarounds such
as defining `enum Void {}` and wrapping `!`-returning functions.

```rust
fn foo_wrap() -> Void {
foo()
};
```

However when it's suggested to allow `!` to be used as a function argument, as
a struct member and everywhere else a type can be used people often respond
with something like "That doesn't make sense. `!` isn't a type, it's a special
syntax for declaring functions that don't return, why would you want a `!`
function argument anyway?". So instead, the `!` type gets treated as a second
class citizen, adding extra hassle and complexity to the language for no
benefit.

`!` has a meaning in any situation that any other type does. A `!` function
argument makes a function uncallable, a `Vec<!>` is a vector that can never
contain an element, a `!` enum variant makes the variant guaranteed never to
occur and so forth. It might seem pointless to use a `!` function argument or a
`Vec<!>` (just as it would be pointless to use a `()` function argument or a
`Vec<()>`), but that's no reason to disallow it. And generic code sometimes
requires it. There's at least two cases of this:
* Functions of the type `fn() -> !` cannot satisfy the `Fn() -> T` trait. As
such diverging functions can't be passed to generic higher-order functions.
* There is no standard way to express types like `Result<T, !>` causing
library authors to implement their own, incompatible implementations of
`enum Void {}`. (`Result<T, !>` is useful when implementing a trait method
of type `Result<T, E>` and you know the implementation will never return
`Err`)

Promoting `!` to a type and allowing it to unify with all other types would
give it the same behavior as the current `!` syntax, except generalized,
allowing more kinds of correct code to exist.

It's worth making clear the difference between `struct Void {}` (ie `void`, ie.
`()`) and `enum Void {}` (ie. `!`). We can think of `struct` and `enum` as
being dual type operators where `struct {A, B, C}` is `A AND B AND C` and `enum
{A, B, C}` is `A OR B OR C`. `()` and `!` are then the identities of these
operators in the sense that adding a `()` member to a struct does not change
the overall structure of the type (because the member is always instantiated
with `()` and thus does nothing) and adding a `!` variant to an enum does not
change the overall structure of the type (because the member can never be
instantiated and thus does nothing). The reason `()` and `!` have historically
been neglected as types is that they're so trivial that it doesn't occur to
people that they even *are* types. `()` only has one value and so it's not
interesting. Something that can only be in one state cannot carry information
and a type with that doesn't carry any information has no obvious use in a
language. `!` has no values and so it's not interesting. A type with no values
can never exist and a type that can never exist has no obvious use in a
language.

However they are types, they have their use cases and an algebraic type system
is not complete without both of them.

# Detailed design

Disjoins have the exact same semantics as named enums and behave the same in
terms of code generation, representation etc. Code that handles the empty
disjoin type should be marked unreachable and eliminated where possible.
Functions that return it should be marked with the llvm `NoReturn` attribute.
The above should also apply to empty named enums.

The typechecker should allow `!` to unify with all other types.

The existing compiler support for diverging functions (eg. `FnDiverging`)
should be removed/replaced.

# Drawbacks

Adds it's own complexity to the type system and compiler.

# Alternatives

* Do Nothing.
* Some have suggested non-positional disjoint union types (ie. where (T | T) is
isomorphic to T). However these aren't enums, aren't algebraic, and would add
enormous complexity to Rust's type system compared to positional disjoint
unions.
* Change the syntax? This might be necessary if the suggested syntax turns out
to be ambiguous (see: Unresolved Questions). The `!` in disjoin expressions
and patterns could be changed to another character although `!` already
has connotations of "no value".
* Just promote `!` to a type. Although `!` fits naturally into a scheme
of anonymous enums this would still be a worthwhile change if done
independently. This would be a smaller change in the sense that it would
involve removing restrictions on an existing feature rather than adding a
whole new feature.

# Unresolved questions

Is the suggested syntax unambiguous? The '|' character is already used for
closures, bitwise OR and disjunctive match patterns. The '!' character is used
for the not-operator and negative trait bounds. I can't see any of these being
a problem but I'm not sure.

Could `!` be treated as a subtype of all other types? Theoretically, yes, but
the implementation might be a headache. Allowing `!` to unify with all other
types would be simpler and good enough for most cases. And we don't treat `()`
as a supertype of all other types either.