- Feature Name:
eager_macro_expansion - Start Date: 2018-01-26
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)
Expose an API for procedural macros to opt in to eager expansion. This will:
- Allow procedural and declarative macros to handle unexpanded macro calls that are passed as inputs,
- Allow macros to access the results of macro calls that they construct themselves,
- Enable macros to be used where the grammar currently forbids it.
There are a few places where proc macros may encounter unexpanded macros in their input:
-
In declarative macros:
env!(concat!("PA", "TH")); // ^^^^^^^^^^^^^^^^^^^ // Currently, `std::env!` is a compiler-builtin macro because it often // needs to expand input like this, and 'normal' macros aren't able // to do so.
-
In procedural macros:
my_proc_macro!(concat!("hello", "world")); // ^^^^^^^^^^^^^^^^^^^^^^^^^ // This call isn't expanded before being passed to `my_proc_macro`, and // can't be since proc macros are passed opaque token streams by design.
-
In attribute macros:
#[my_attr_macro(x = a_macro_call!(...))] // ^^^^^^^^^^^^^^^^^^ // This call isn't expanded before being passed to `my_attr_macro`, and // can't be since attr macros are passed opaque token streams by design. struct X { my_field_definition_macro!(...) // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Same with this one. }
In these situations, macros need to either re-emit the input macro invocation as part of their token output, or simply reject the input. If the proc macro needs to inspect the result of the macro call (for instance, to check or edit it, or to re-export a hygienic symbol defined in it), the author is currently unable to do so.
Giving proc macro authors the ability to handle these situations will allow proc macros to 'just work' in more contexts, and without surprising users who expect macro calls to interact well with more parts of the language.
As a side note, allowing macro calls in built-in attributes would solve a few outstanding issues (see rust-lang/rust#18849 for an example).
An older motivation to allow macro calls in attributes was to get
#[doc = include_str!("path/to/doc.txt")] working, in order to provide an
ergonomic way to keep documentation outside of Rust source files. This was
eventually emulated by the accepted RFC
1990, indicating that macros in
attributes could be used to solve problems at least important enough to go
through the RFC process.
Macros are currently not allowed in certain syntactic positions. Famously, they
aren't allowed in identifier position, which makes concat_idents! almost
useless. If macro authors have
access to eager expansion, they could eagerly expand concat_idents! and
interpolate the resulting token into their output.
We expose the following API in the proc_macro crate, to allow proc macro
authors to iteratively expand all macros within a token stream:
use proc_macro::{LexError, Diagnostic, TokenStream};
pub struct ExpansionBuilder { /* No public fields. */ }
pub enum ExpansionError {
LexError(LexError),
ParseError(Diagnostic),
MissingDefinition(Diagnostic),
/* What other errors do we want to report?
What failures can the user reasonably handle? */
}
impl ExpansionBuilder {
/// Creates a new macro expansion request to expand macro invocations that
/// occur in `tokens`.
///
/// Expansion results will be interpolated within the input stream before
/// being returned.
///
/// By default, macro invocations in `tokens` will have their expansion
/// results recursively expanded, until there are no invocations left to
/// expand. To change this, use the `set_max_depth` method on the returned
/// instance of `ExpansionBuilder`.
///
/// `tokens` should parse as valid Rust -- for instance, as an item or
/// expression.
pub fn from_tokens(tokens: TokenStream) -> Self;
/// Sends the expansion request to the compiler, then awaits the results of
/// expansion.
///
/// The returned future will be ready unless expansion requires expanding a
/// procedural macro or a macro that hasn't been defined yet but might be
/// after further expansion. In those cases, the returned future will be
/// woken once all the required expansions have completed.
pub async fn expand(self) -> Result<TokenStream, ExpansionError>;
/// Sets the maximum depth of expansion. For example, if the depth is 2,
/// then the result of expanding the following tokens:
/// ```rust
/// {
/// vec![vec![vec![17; 1]; 2]; 3];
/// concat!("a", "b");
/// }
/// ```
/// Will be the following tokens:
/// ```rust
/// {
/// std::vec::from_elem(std::vec::from_elem(vec![17; 1]. 2), 3);
/// "ab";
/// }
/// ```
///
/// Notice that, since the innermost invocation of `vec!` is "inside" two
/// other invocatoins of `vec!`, it is left unexpanded.
///
/// If `depth` is 0, all macro invocations in the input will be recursively
/// expanded to completion; this is the default behaviour for an instance
/// of `ExpansionBuilder` created with `ExpansionBuilder::from_tokens`.
pub fn set_max_depth(&mut self, depth: usize) -> &mut self;
}In addition, we allow proc macros to be async so that macro authors can more
easily make use of ExpansionBuilder::expand.
Here is an example showing how a proc macro can find out what the result of
concat!("hello ", "world!") is.
use proc_macro::quote;
let tokens = quote!{
concat!("hello ", "world!")
};
let expansion = ExpansionBuilder::from_tokens(tokens);
let result = expansion.expand().await.unwrap();
let expected_result = quote!("hello world!");
assert_eq!(result.into_string(), expected_result.into_string());Here is an example showing what we mean by "interpolating" expansion results into the input tokens.
let tokens = quote!{
let x = concat!("hello ", "world!");
};
let expansion = ExpansionBuilder::from_tokens(tokens);
let result = expansion.expand().await.unwrap();
// As we saw above, the invocation `concat!(...)` expands into the literal
// "hello world!". This literal gets interpolated into `tokens` at the same
// location as the expanded invocation.
let expected_result = quote!{
let x = "hello world!";
};
assert_eq!(result.into_string(), expected_result.into_string());Here is an example showing what we mean by "iteratively expanding" macros: if a macro expands into a token stream which in turn contains macro invocations, those invocations will also be expanded, and so on.
let tokens = quote!{
let x = vec![concat!("hello ", "world!"); 1]
};
let expansion = ExpansionBuilder::from_tokens(tokens);
let result = expansion.expand().await.unwrap();
// `vec![concat!(...); 1]` expands into `std::vec::from_elem(concat!(...), n)`.
// Instead of returning this result, the compiler continues expanding the input.
//
// As before, the results of these expansions are interpolated into the same
// location as their invocations.
let expected_result = quote!{
let x = std::vec::from_elem("hello world!", 1)
};
assert_eq!(result.into_string(), expected_result.into_string());We're going to show how we could write a procedural macro that could be used by declarative macros for eager expansion.
As an example, say we want to create eager_stringify!, an eager version of
stringify!. Remember that stringify! turns the tokens in its input into
strings and concatenates them:
assert_eq!(
stringify!(let x = concat!("hello ", "world!")),
r#"let x = concat ! ("hello ", "world!")")We want eager_stringify! to behave similarly, but to expand any macros it
sees in its input before concatenating the resulting tokens:
assert_eq!(
eager_stringify!(let x = concat!("hello ", "world!")),
r#"let x = "hello world!""#)As an aside, this means eager_stringify! needs to be able to parse its input.
We could write eager_stringify! as a fairly straightforward proc macro using
ExpansionBuilder. However, since decl macros are much quicker and easier to
write and use, it would be nice to have a reusable "utility" macro which we
could use to define eager_stringify!.
Let's call our utility macro expand!. The idea is that users of expand! will
invoke it with:
- The tokens they want to expand.
- An identifier, to refer to the expansion result.
- The token stream they want to insert the result into, which can use the identifier to determine where to insert the result.
For example, this invocation of expand! should reproduce the intended
behaviour of our earlier eager_stringify!(let x = ...) example:
expand! {
input = { let x = concat!("hello ", "world!"); },
name = foo,
output = { stringify!(#foo) }
};Let's assume we already have the following:
- A function
parse_input(TokenStream) -> (TokenStream, Ident, TokenStream)which parses the input toexpand!and extracts the right-hand sides ofinput,name, andoutput. - A function
interpolate(tokens: TokenStream, name: Ident, output: TokenStream) -> TokenStreamwhich looks for instances of the token sequence#$nameinsideoutputand replaces them withtokens, returning the result. For example, the token stream returned by:Should be the same token stream as:interpolate(quote!(a + b), foo, quote!([#foo, #bar]))
quote!([a + b, #bar])
Then we can implement expand! as a proc macro:
#[proc_macro]
pub async fn expand(input: TokenStream) -> TokenStream {
let (input, name, output) = parse_input(input);
let expansion = ExpansionBuilder::from_tokens(input);
let result = expansion.expand().await.unwrap();
return interpolate(result, name, output);
}Finally, we can implement eager_stringify! as a decl macro:
pub macro eager_stringify($($inputs:tt)*) {
expand! {
input = { $($inputs)* },
name = foo,
output = { stringify!(#foo) }
}
}The current implementation of procedural macros is as a form of inter-process communication: the compiler creates a new process that contains the proc macro logic, then sends a request (a remote procedure call, or RPC) to that process to return the result of expanding the proc macro with some input token stream.
This interaction works the other way as well: for example, if a proc macro wants
to access span information, it does so by sending a request to the compiler.
This RFC adds the ExpansionBuilder API as a way to construct a new kind of
request to the compiler -- a request to expand macro invocations in a
token stream.
We assume that for nested attribute macros the least surprising behaviour is
for them to be expanded "outside in". For example, if I have two eager
attribute macros my_eager_foo and my_eager_bar, then in this example
my_eager_foo would see the result of expanding my_eager_bar:
#[my_eager_foo]
#[my_eager_bar]
mod whatever { ... }The situation is less clear for outer attributes. Which macro should be expanded first in this case?
#[my_eager_foo]
mod whatever {
#![my_eager_bar]
...
}'Derive' attribute macros can define additional 'helper' attributes, used by the invoker to annotate derive macro input. When expanding invocations, the compiler must be careful not to try and expand these helper attributes as though they were actual invocations.
Here is an example to justify why this is best dealt with by the compiler. Say
I have an eager derive macro derive_foo for the trait Foo with the helper
attribute foo_helper, and consider this invocation:
#[derive(Foo)]
struct S {
#[some_other_eager_attr_macro]
#[foo_helper]
field: usize
}When derive_foo eagerly expands #[some_other_eager_attr_macro], that macro
in turn will try to expand the token stream #[foo_helper] field: usize. Two
things could go wrong here:
- If there is an attribute macro called
#[foo_helper]in scope, it might get expanded. This is probably not the behaviour expected by the invoker of#[derive(Foo)]or the author ofderive_foo. - If there isn't such a macro, the compiler might report a missing definition.
Both of these issues are handled by the compiler keeping track of the fact that
#[some_other_eager_attr_macro] is being expanded "inside" a macro derive
context, and leaving the helper attribute #[foo_helper] in-place.
Two eager macro expansions could depend on each other. For example, assume that
my_eager_identity! is a macro which expands its input, then returns the
result unchanged. Here are two invocations of my_eager_identity! which can't
proceed until the other finishes expanding:
my_eager_identity! {
macro foo() {}
bar!();
}
my_eager_identity! {
macro bar() {}
foo!();
}The compiler can detect this kind of deadlock, and should report it to the user. The error message should be similar to the standard "cannot find macro" error message, but with more context about having occurred during an eager expansion:
error: cannot find macro `bar` in this scope during eager expansion
|
1 | my_eager_identity! {
| ------------------- when eagerly expanding within this macro
...
3 | bar!();
| ^^^ could not find a definition for this macro
error: cannot find macro `baz` in this scope during eager expansion
|
6 | my_eager_identity! {
| ------------------- when eagerly expanding within this macro
...
8 | baz!();
| ^^^ could not find a definition for this macro
Eager expansion is orthogonal to macro hygiene. Hygiene information is associated with tokens in token streams, and fresh hygiene contexts will be automatically generated by the compiler for iteratively eagerly expanded macros.
The compiler makes no guarantees about the order in which procedural macros get expanded, except that eager expansions which refer to an undefined macro cannot be expanded until a definition appears. This adds to the long list of reasons why a macro author or user shouldn't rely on the order of expansion when reasoning about side-effects.
The method ExpansionBuilder::set_max_depth determines how many "layers" of
expansion will be performed by the compiler. The intent is that this will allow
proc macro authors to expand other proc macros for their side effects, as well
as "incrementally" expand decl macros to see their intermediate states.
The current "layer" of macro invocations are all the invocations that show up in the AST. For example, in this input:
concat!("a", "b");
do_twice!(vec![17; 1]);
macro do_twice($($input:tt)*) {
$($input)* ; $($input)*
}The invocations of concat! and do_twice! appear in the parsed AST,
whereas the invocation of vec! does not; the arguments to macros are always
opaque token streams.
After we expand all the macros in the current layer, we get this output:
"ab";
vec![17; 1]; vec![17; 1];
macro do_twice($($input:tt)*) {
$($input)* ; $($input)*
}Now there are two invocations of vec! in the current layer.
With this example in mind, we can more clearly describe set_max_depth as
specifying how many times to iteratively expand the current layer of
invocations.
Depending on the order in which macros get expanded by the compiler, a proc
macro using the ExpansionBuilder API might try to expand a token stream
containing a macro that isn't defined, but would be defined if some other
macro were expanded first. For example:
macro make_macro($name:ident) {
macro $name () { "hello!" }
}
make_macro!(foo);
my_eager_macro!{ let x = foo!(); }If my_eager_macro! tries to expand foo!() after make_macro!(foo) is
expanded, all is well: the compiler will see the new definition of macro foo,
so when my_eager_macro! uses ExpansionBuilder to expand foo!(), the
compiler knows what to return. However, what should we do if the compiler tries
to expand my_eager_macro! before expanding make_macro!(foo)? There are
several options:
- A: Only expand macros in a non-blocking order. This is hard, because the
knowledge that
my_eager_macro!depends onfoo!being defined is only available oncemy_eager_macro!is executing. Similarly, we only know thatmake_macro!definesfoo!after it has finished expanding. - B: The compiler could indicate to
my_eager_macro!that its expansion request can't be completed yet, due to a missing definition. This meansmy_eager_macro!needs to handle that outcome, preferably by indicating to the compiler that the compiler should retry the expansion ofmy_eager_macro!once a definition offoo!is available. - C: The compiler could delay returning a complete expansion result until it is
able to, while allowing
my_eager_macro!to make as much progress as it can without the result.
This RFC goes with option C by making expand an async fn, since this
provides a clear indication to proc macro authors that they should consider and
handle this scenario. Additionally, this behaviour of expand -- delaying the
return of expansion results until all the necessary definitions are available --
is probably the outcome that most authors would opt-in to if given the choice
via option B.
We could imagine an alternative ExpansionBuilder API which required the user
to construct a single macro invocation at a time, perhaps by exposing
constructors like this:
use syn::{Macro, Attribute};
impl ExpansionBuilder {
pub fn bang_macro(macro: Macro) -> Self;
pub fn attribute_macro(
macro: Attribute,
body: TokenStream
) -> Self;
}This would force proc macro authors to traverse their inputs, perform the
relevant expansion, and then interpolate the results. Presumably utilities would
show up in crates like syn to make this easier. However, this alternative API
doesn't handle cases where the macro invocation uses local definitions or
relative paths. For example, how would a user of bang_macro use it to expand
the invocation of bar! in the following token stream?
quote!{
mod foo {
pub macro bar () {}
}
foo::bar!();
}By contrast, the proposed from_tokens interface makes handling these cases the
responsibility of the compiler.
The builder pattern lets us start off with a fairly bare-bones API which then becomes progressively more sophisticated as we learn what proc macro authors need from an eager expansion API. For example:
- It isn't obvious how to treat requests to expand expressions from a proc
macro that has been invoked in item position; we might need to add a new
constructor
from_expr_tokens. - The proposed API only does complete, recursive expansion. Some proc macros might need to expand invocations "one level deep" in order to inspect intermediate results; the builder pattern lets us add that level of fine-grained control.
- The builder pattern also lets us deprecate methods which overreach or underperform. If it turns out that the reasons for accepting a token stream are offset by an unexpected increase in implementation complexity, we might backpedal and expose a more constrained API.
Rust's macro system is heavily influenced by the syntax metaprogramming systems of languages like Lisp, Scheme, and Racket (see discussion on the Rust subreddit).
In particular, Racket has very similar semantics in terms of hygiene, allowing 'use before define', and allowing macros to define macros. As an example of all of these, the rough equivalent of this Rust code:
foo!(hello);
foo!((hello, world!));
mk_macro!(foo);
macro mk_macro($name:ident) {
macro $name ($arg:tt) {
println!("mk_macro: {}: {}",
stringify!($name), stringify!($arg));
}
}Is this Racket code:
(let ()
(foo hello)
(foo (hello, world!))
(mk_macro foo))
(define-syntax-rule
(mk_macro name)
(define-syntax-rule
(name arg)
(printf "mk_macro: ~a: ~a\n" 'name 'arg)))And both of them print out (modulo some odd spacing from stringify!):
mk_macro: foo: hello
mk_macro: foo: (hello, world!)
Looking at the API that Racket exposes to offer eager expansion (alongside similar functions on that page), we see the following:
- Eager macros are essentially procedural macros that call one of the expansion methods.
- These expansion methods perform a 'best effort' expansion of their input (they don't produce an error if a macro isn't in scope, they just don't expand it).
- It's not clear how this system handles definitions introduced by eager expansion. Some parts of the API suggest that manual syntax context manipulation is involved.
Overall, it's not obvious that a straightforward translation of Racket's eager macros is desirable or achievable (although it could provide inspiration for a more fleshed-out procedural macro API). Future work should include identifying Racket equivalents of the examples in this RFC to confirm this.
- Requires authors to opt-in to expansion, rather than somehow providing an ecosystem-wide solution similar to the one proposed by petrochenkov for macros in inert attributes.
- Exposes more of the compiler internals as an eventually stable API, which may make alternative compiler implementations more complicated.
We could encourage the creation of a 'macros for macro authors' crate with implementations of common macros (for instance, those in the standard library) and make it clear that macro support isn't guaranteed for arbitrary macro calls passed in to proc macros. This feels unsatisfying, since it fractures the macro ecosystem and leads to very indirect unexpected behaviour (for instance, one proc macro may use a different macro expansion library than another, and they might return different results). This also doesn't help address macro calls in built-in attributes.
Opt-out eager expansion is backwards-incompatible with current macro behaviour:
- Consider
stringify!(concat!("a", "b")). If expanded eagerly, the result is"ab". If expanded normally, the result isconcat ! ( "a" , "b" ). - Consider
quote!(expects_a_struct!(struct #X)). If we eagerly expandexpects_a_struct!this will probably fail:expects_a_struct!expects a normal complete struct declaration, not aquote!interpolation marker (#X).
Detecting these macro calls would require the compiler to parse arbitrary token
trees within macro arguments, looking for a $path ! ( $($tt)*) pattern, and
then treating that pattern as a macro call. Doing this everywhere essentially
bans that pattern from being used in custom macro syntax, which seems
excessive.
RFC 1628 proposes adding an
alternative invocation syntax to explicitly make the invocation eager (the
proposal text suggests foo$!(...)). The lang team couldn't reach
consensus
on the design.
In addition to the issues discussed in RFC 1628, any proposal which marks macros as eager 'in-line' with the invocation runs into a similar issue to the global eager expansion suggestion, which is that it bans certain token patterns from macro inputs.
Additionally, special invocation syntax makes macro output sensitive to the
invocation grammar: a macro might need to somehow 'escape' $! in its output
to prevent the compiler from trying to treat the surrounding tokens as an
invocation. This adds an unexpected and unnecessary burden on macro authors.
These are design questions that would be best investigated while implementing the proposed interface, as well as afterwards with feedback from users:
- Are there any corner-cases concerning attribute macros that aren't covered by treating them as two-argument proc-macros?
- How do we handle outer attributes?
- What can we learn from the eager macro systems of other languages, e.g. Racket?