feat: initial implementation of project file format#2510
feat: initial implementation of project file format#2510
Conversation
| @@ -0,0 +1,11 @@ | |||
|
|
|||
| [package] | |||
| name = "my-account-note" | |||
There was a problem hiding this comment.
From #1919:
The Rust tooling uses/reproduces much of this information in Cargo.toml - we could move most of that to miden-project.toml instead, or alternatively, simply merge the relevant sections from Cargo.toml into an in-memory miden-project.toml manifest. Much of what I've laid out here is in fact the parts we already define/maintain in Cargo.toml for Rust-based projects today
I wonder what our plan is for Rust projects?
Currently we store all Miden-related fields in the [package.metadata.miden] section of the Cargo.toml. See a P2ID example.
Do we want to require miden-project.toml and move the Miden fields currently residing in the Cargo.toml there? Multiple fields end up duplicated (name, version, etc.). Plus, the developer would have package dependencies defined in the miden-project.toml and the crate dependencies in the Cargo.toml.
I'd suggest to keep storing all Miden fields in Cargo.toml. The cargo miden build can embed generated miden-project.toml in the compiled Miden package.
There was a problem hiding this comment.
I think the issue with storing everything in Cargo.toml has a few issues:
- We will have tooling, and there will be third-party tooling, built around Miden projects, and storing project information in
Cargo.tomlrequires those tools to understand not onlymiden-project.toml, but alsoCargo.tomland however we choose to encodemiden-projet.tomlinformation inside of it. - In the future, other languages that can compile to Miden may not have a TOML configuration file (or at least one that can be piggy-backed on like
Cargo.toml), in which case they would need to usemiden-project.tomland would face the same dilemma of duplication between language-specific project config and Miden project config. Being consistent about the boundary between language-specific config and Miden config up front will lead to less confusion IMO. - It complicates IDE/LSP integration, since
Cargo.tomlis already used as a language marker for Rust, it would require parsingCargo.tomlto determine if the directory is a Miden/Rust project, or just a Rust project. In some editors it may not be a problem, if extensions are provided an opportunity to do this when a worktree is opened in the editor (e.g. Zed), but I don't believe that is always the case. - Even if we were to use
Cargo.tomlfor Miden+Rust projects, we still have a situation where we are specifying dependencies in two places inCargo.toml- one for normal crates, and then one for Miden packages under ([package.metadata.miden.dependencies]). IMO, it's more likely to confuse users that both sets of dependencies are specified in different places in the same file. If Miden package dependencies are inmiden-project.tomland pure Rust dependencies are inCargo.toml, the delineation between the two is a lot more clear (and easier to explain). - Rust crates which are compiled to Miden will not be published to crates.io - so virtually all of the metadata that would typically go in
Cargo.tomlcan be specified inmiden-project.tomlinstead, aside fromnameandversion. GIven that, it isn't clear to me how much information would actually be duplicated, or how much of a pain that would be in practice. Changing thenameis going to be very rare. Changingversionis more frequent, so if it is considered painful enough to have to maintain it in two places, we could provide a solution there that allows specifying inmiden-project.tomlthat the version is to be inferred from external/language-specific tooling, something likeversion = { external = true, language = "rust" } | { external = true, path = "VERSION" }. TBH, I don't think it is actually a significant enough pain point for that, but 🤷. Aside fromversionthough - there just isn't overlap betweenCargo.tomlandmiden-project.tomlto combine them IMO.
I'm not saying I couldn't be persuaded on this point, but it does feel like preemptively addressing an issue that we don't actually know is going to be an issue in practice. I would want to see how much duplication there actually is (given the constraints I outlined above, e.g. defining description in Cargo.toml doesn't make any sense), and from there determine if it is a significant enough problem to warrant complicating project handling by allowing Rust projects to piggy-back on Cargo.toml.
There was a problem hiding this comment.
Fair points. I'm sold on the project-level miden-project.toml.
The workspace-level miden-project.toml still bothers me due to the members and profile duplication with Cargo.toml. Assuming they have to be maintained by the user.
|
I like the version-or-digest approach and that we'll store it in the same field. Apart from my question about the workspace level in Rust projects at #2510 (comment) everything sounds good. |
|
This is ready for review, though I expect to add a few more tests still, and write up a README doc that describes the usage and syntax of the project file. A few design details to note during your review:
If you have any questions about specific details, leave a comment here or ping me to discuss, I'm happy to walk through any of this, since there is a lot of context that isn't easy to glean from just the source code or its documentation. |
The duplication is hard to avoid at the workspace level. We could derive the Regardless, we need the workspace-level project file for workspace management of Miden Assembly projects anyway, so the issue of duplication with |
feea354 to
e252ca3
Compare
bobbinth
left a comment
There was a problem hiding this comment.
Looks good! Thank you! Not a full review - I mostly reviewed docs and examples for now - but I left some comments inline.
|
|
||
| ### Lint configuration | ||
|
|
||
| TODO |
There was a problem hiding this comment.
nit: let's add the "LICENSE" section as we do for other crates.
There was a problem hiding this comment.
It would be good to add some comments explaining the example. My understanding is: this project defines a kernel with two targets:
- The kernel executable, under
kernel/mod.masm. - The kernel library under the namespace
userspace.
If the above is correct, I believe this is similar to how we define the transaction kernel in miden-base, where:
kernel/mod.masmis meant to be similar tomain.masm.userspace/mod.masmis meant to be similar toapi.masm.
Currently, we assemble these programmatically using the build.rs script - but once we have the project file (and update the assembler to be able to work with projects), I'm assuming the build.rs could be greatly simplified - right? Basically, we'd be able to instantiate the assembler and then just feed the project file to it, and everything else will happen automatically.
A few questions/comments:
- I think
kind = "kernel"should be attacked to thelib/mod.masmtarget. I assume this is used to assemble the library usingAssembler::assemble_kernel_from_dir()and this function outputs kernel libraries rather than executables.- This would also mean that for
kernel/mod.masmwe'd havekind = "executable"- right?
- This would also mean that for
Assembler::assemble_kernel_from_dir()takes two parameters: (1) path to the module - which I think would belib/mod.masm, and (2) path to directory of the library that would get statically linked into the kernel. What would be the directory path here? Or would this be changed somehow?- Related to the above, when we assemble the kernel library, we don't need to specify the namespace as we automatically get the
$kernelnamespace. So, we probably don't need thenamespace = "userspace"attribute (and it should probably not be allowed forkerneltargets). - How would the module graph look here? Specifically:
- Since
kernel/mod.masmis an executable, I'm assuming we can't import anything from there. So, it is unclear why some procedures there are declared aspub(that should be an assembly error, right?). - I'm also assuming since
userspace/mod.masmis meant to be a kernel API file, we won't be able to refer to it fromkernel/mod.masm- right? - If the above is correct, what would be the best way to replicate the "shared library" concept which we manually orchestrate in
miden-base? Specifically, we have thelibdirectory that can be references from both the kernel executable and the kernel library.
- Since
There was a problem hiding this comment.
It would be good to add some comments explaining the example.
For sure, these were thrown together for simple testing, but I can certainly add some documentation explaining the setup in theory.
My understanding is: this project defines a kernel with two targets:
Not exactly - or rather, to be more precise, here we're defining a project that can produce two packages:
- A kernel component, whose root module is located at
kernel/mod.masm- we're specifying the path because how the project is organized differs from the default single-target layout where the path is assumed to bemod.masm. - A library component, whose root module is located at
lib/mod.masm. This library component is intended to be the userspace abstraction over the kernel target. We're specifying a different namespace from the default just to exercise that option here, but by default it would be derived from the package name.
A couple of things to note:
- A project may only define a single kernel target.
- A project may define any number of non-kernel targets
kernel/mod.masmis meant to be similar tomain.masm.userspace/mod.masmis meant to be similar toapi.masm.
The canonical expectation by the assembler is that the top-level module is always mod.masm. By top-level module, I mean the module which owns the namespace for the library. In practice, if this module is not present, we just treat it as if that module is empty, and parse other modules as children of the top-level namespace, but that's more a convenience than anything else. In practice, we should probably always have a mod.masm even if it's empty.
Currently, we assemble these programmatically using the build.rs script - but once we have the project file (and update the assembler to be able to work with projects), I'm assuming the
build.rscould be greatly simplified - right? Basically, we'd be able to instantiate the assembler and then just feed the project file to it, and everything else will happen automatically.
Yes, exactly.
- I think
kind = "kernel"should be attacked to thelib/mod.masmtarget. I assume this is used to assemble the library usingAssembler::assemble_kernel_from_dir()and this function outputs kernel libraries rather than executables.
As noted above, the userspace library isn't assembled as part of the kernel, because it exists separate from the kernel - that's why these are two separate targets. This keeps the kernel itself stable, while allowing the userspace to change more frequently.
- This would also mean that for
kernel/mod.masmwe'd havekind = "executable"- right?
The assembler currently distinguishes between kernels and executables, but maybe that distinction is meaningless, and instead we should just have kernels and libraries, where executables are always kernels - that's effectively the case already.
If we do remove executables as a distinct concept, then we'd have no need to have executable targets here.
Assembler::assemble_kernel_from_dir()takes two parameters: (1) path to the module - which I think would belib/mod.masm, and (2) path to directory of the library that would get statically linked into the kernel. What would be the directory path here? Or would this be changed somehow?
I think we'd just make assemble_kernel_from_dir take a single path, the path to the kernel module directory, where mod.masm is the kernel module itself, and all other modules are statically linked into the kernel artifact. There's no reason to have them be separate.
- Related to the above, when we assemble the kernel library, we don't need to specify the namespace as we automatically get the
$kernelnamespace. So, we probably don't need thenamespace = "userspace"attribute (and it should probably not be allowed forkerneltargets).
It's not allowed for kernel targets, but to be clear, the library target here is the userspace target, i.e. the library users of the kernel will use to interact with it, not library code consumed internally by the kernel. The latter is implicitly part of the kernel module hierarchy.
How would the module graph look here? Specifically:
- Since
kernel/mod.masmis an executable, I'm assuming we can't import anything
You can import from modules defined as part of the kernel, e.g. if there's a kernel/a.masm, you could import that as use $kernel::a from the $kernel module. Note that those modules are internal to the kernel artifact, and are not visible to users of the kernel.
So, it is unclear why some procedures there are declared as
pub(that should be an assembly error, right?).
The assembler does not treat kernels as executables, but as libraries - executables are a separate thing. The kernel interface is defined as the set of procedures exported from the kernel module.
- I'm also assuming since
userspace/mod.masmis meant to be a kernel API file, we won't be able to refer to it fromkernel/mod.masm- right?
Right, the userspace is a separate target (which presumably depends on the kernel) so its modules will not be available to the kernel itself.
- If the above is correct, what would be the best way to replicate the "shared library" concept which we manually orchestrate in
miden-base? Specifically, we have thelibdirectory that can be references from both the kernel executable and the kernel library.
You'd define a library target that is a dependency of both the kernel and the userspace.
There was a problem hiding this comment.
I think I understand the intent better now - though, I think it differs a bit from how we set things up in the protocol repo. We have two cases there that would be good to cover with the project file so that we could simplify the build.rs files there as much as possible.
The two cases are:
The standards library is pretty simple - it is a single library that has two dependencies: (1) Miden core lib and (2) transaction wrapper kernel lib. Both of these are regular dependencies (i.e., non-kernel dependency). I'm thinking that once we move this to a project format we could define this library via a single project file.
The transaction kernel is more complex. It consists of 4 parts:
- The kernel API (defined here) - i.e., a set of procedures that can be
syscall-ed by user programs. This library depends only on Miden core lib. - The protocol library (defined here - this is a set of procedures that wraps all kernel API procedures. The primary purpose of this library is twofold: (1) to let the user call kernel procedures via the
execinstruction (i.e., the wrapper procedures handle all necessary padding) and (2) to invoke kernel procedures by index rather than hash. This library depends on the Miden core lib as a regular dependency and on the kernel API library as a kernel dependency. - Two executable files (defined here and here) which is basically the "driver" of the transaction kernel. These executables depend on the Miden core lib and some internal modules of the kernel library (but these are used as a regular dependency).
- A set of shared utilities that are used by the kernel API and protocol library modules.
Expressing all these as a single project would probably be quite complicated and may not be worth it. So, would be good to figure out how we can split this up into 2 (or at most 3) projects and then make sure that the project file format covers these. One option is:
- The shared utilities project: this would contain the shared utilities defined as a simple library.
- The transaction kernel project: this would output 2 executables and a single kernel library.
- The protocol project: This would depend on shared utilities and the kernel library.
Also cc @PhilippGackstatter as he may have additional suggestions.
There was a problem hiding this comment.
There's a few different ways to structure things, but your general outline sounds perfectly good to me - a few notes:
- The whole thing would be part of a single workspace, so dependencies can be trivially shared between projects in the workspace, including packages within the workspace.
- A single project can cover most of the kernel target pieces (the kernel library, and the two executables)
- The code shared between the kernel, userspace, and smart contracts can be in a separate library, nothing surprising there.
- The userspace library as a separate project makes sense to me - it simplifies things a bit.
I'll get this PR updated tomorrow at some point, and we can do a final pass on what things look like.
There was a problem hiding this comment.
We are also building account components in miden-standards with build.rs. I just described a case (0xMiden/protocol#2527 (comment)) where it would be useful to have a separate library to be able to share code between two account components, but that code should not live in miden::standards directly. This is currently difficult to do (would require special code in build.rs), but I think if we can easily setup such library projects in the future with this project file, this would be a real help.
Afaict, this should be easily possible with the work done here, so I just wanted to mention that additional use case in miden-protocol as a reference.
One question I have in this context: Does the project file allow specifying if a dependency is linked statically or dynamically, or is this decided at a different level? At first glance, this would seem useful to say "link miden::standards dynamically" but "link multisig library statically".
There was a problem hiding this comment.
This file seems to be empty. It may be good to add some dummy procedure here - or a comment explaining why it is intentionally left empty.
There was a problem hiding this comment.
Sure, it's really just there to make the module organization clear.
| [[target]] | ||
| kind = "account-component" |
There was a problem hiding this comment.
We could specify namespace here as well, right? I'd probably do this as we encourage people to define unique namespaces for their components/notes - e.g., my_project::my_component
There was a problem hiding this comment.
Yes, you can always specify namespace for any component other than executables and kernels.
There was a problem hiding this comment.
I should note that by default the namespace is derived from the project name, and multi-target projects must have unique namespaces for each target or a validation error is raised.
There was a problem hiding this comment.
Once we add a dummy procedure to the my-account example, we could add an example here to illustrate how the account component would be imported and how a procedure from that component could be invoked.
There was a problem hiding this comment.
Similar to other comments: I'd add comments with a brief explanation of this example here.
| * `library`, `lib` - produce a package which exports one or more procedures that can be called from other artifacts, but cannot be executed by the Miden VM without additional setup. Libraries have no implicit dependency on any particular kernel. | ||
| * `executable`, `bin`, or `program` - produce a package which has a single entrypoint, and can be executed by the Miden VM directly. Executables have no dependency on any particular kernel. | ||
| * `account-component` - produce a package which is a valid account component in the Miden protocol, and contains all metadata needed to construct that component. This type is only valid in conjunction with the Miden transaction kernel. | ||
| * `note-script` - produce a package which is a valid note script in the Miden protocol, and exports the necessary metadata and procedures to construct and execute the note. This type is only valid in conjunction with the Miden transaction kernel. |
There was a problem hiding this comment.
Here we say note-script but in the examples we use note. I think note is the right approach.
There was a problem hiding this comment.
Good catch, thanks!
crates/project/README.md
Outdated
| As noted above, we've added a target definition that is equivalent to the default target that is inferred if no targets are explicitly declared: a library, whose root module is expected to be found in `mod.masm`, and whose modules will all belong to the `name` namespace (individual modules will have | ||
| their path derived from the directory structure). | ||
|
|
||
| There are other types of targets though, currently the available kinds are: |
There was a problem hiding this comment.
I think we are missing kernel from the list below.
1cbce81 to
4fe599b
Compare
4fe599b to
b1629af
Compare
|
Just an update - I'm modifying the project file syntax a bit to restrict it in some ways, as the existing flexibility in the current draft makes for some awkward complexity that can be avoided with some simplification. I should have my updated draft up later tonight. |
|
I've documented the new approach in |
huitseeker
left a comment
There was a problem hiding this comment.
Main blockers before merge are correctness issues in dependency resolution / project parsing:
- profile table deserialization only updates existing entries and never inserts new ones,
- digest+semver intersection and
contains()behavior can accept/reject versions incorrectly, producing wrong solutions or false “no solution” failures, - it looks like current canonicalization checks the workspace root against itself, so path deps can be over-classified as workspace members,
- package metadata is populated from lints, so metadata/lints can be misapplied.
should_trim_paths()returns the debug flag instead of trim-paths,- I think there are reachable
todo!()/panic!()in version requirement ordering can crash resolution on edge constraints, - member paths are joined without a boundary check, allowing
../-style escape outside the workspace root.
In general:
It kind of looks like the same concepts are modeled twice, once in AST and once in runtime (Profile, Package, Workspace, Target). That creates extra mapping code and makes drift likely across {ast/,}(profile|workspace). Is there a trait-based approach that can keep one canonical model for each concept, and keep parsing types as thin input wrappers only where needed?
There are a few rules spread across version_requirement.rs, resolver/version_set.rs, and resolver/provider.rs. Is there a way to have one source of truth re: those? That would also help with negative tests against those rules, which would help review quite a bit.
| } | ||
|
|
||
| fn contains(&self, v: &Self::V) -> bool { | ||
| if let Some(digest) = v.digest.as_ref() { |
There was a problem hiding this comment.
contains checks only digest when v has one. That skips self.range.contains(&v.version), so a version with a matching digest can pass even when semver is outside the allowed range. Should this require both range and digest checks?
There was a problem hiding this comment.
We don't care about the semantic version range when a digest is specified, as the latter is considered authoritative (i.e. if you request a specific digest of a dependency, then all instances of that dependency in the transitive closure of your dependency graph must be compatible with that exact package, or we will raise a conflict).
* This removes the multi-target example in favor of a new "protocol" example intended to model how `miden-protocol` might be organized. * Add optional target names to allow for multiple executable targets in the same project, disambiguated by name. * Add target `requires` field to allow depending on other targets in the same project. This is strictly for depending on kernel/library targets, others will raise an error. * Fix the resolution of workspace dependencies
This is necessary for non-MASM source projects. In cases where the `path` is missing, it is expected that the assembler will be given the modules for a target that it is being requested to assemble.
…hen digests are present in any of the input sets
8d93199 to
fe2146f
Compare
This split is intentional, so as to separate the concerns of the human-readable manifest representation (i.e. the AST) from how post-processed and validated manifests are represented.
I can take another look here and follow up, but certainly if we can share rules around versioning, that would be ideal. |
|
The failing lint check appears to be due to unrelated code in this PR. I do not know why we are running a lint check against nightly tbh, the errors we get are rarely actionable in the PR which triggers the failing check 🤷 |
I dug into this a bit to see if there were obvious cases of duplication between them, but I'm not finding anything particularly extractable as of yet. Each of those types have subtle semantics that aren't identical, but do have some commonality, just not anything that seems likely to be factorable (that's not to say throwing Claude/Codex at it to see what it comes up with might not find something). If you would like to propose a set of missing tests, I can write those up and at least ensure the existing implementation is sane, and in the process perhaps uncover where such a factorization might be useful or called for. |
Do you mean the rustfmt check? The second run of huitseeker@gringolet.local➜~/tmp/miden-vm-review(bitwalker/project-file✗)» git diff --stat [7:14:41] |
I cannot reproduce that locally - I run EDIT: I should clarify that I cannot reproduce locally using |
f23ac0e to
0f6e196
Compare
huitseeker
left a comment
There was a problem hiding this comment.
It kind of looks like the same concepts are modeled twice, once in AST and once in runtime (
Profile,Package,Workspace,Target). That creates extra mapping code and makes drift likely across{ast/,}(profile|workspace). Is there a trait-based approach that can keep one canonical model for each concept, and keep parsing types as thin input wrappers only where needed?
This split is intentional, so as to separate the concerns of the human-readable manifest representation (i.e. the AST) from how post-processed and validated manifests are represented.
That probably justifies different structs, but I think there's some use in having some shared behaviors between those. Here's one example that seemed to make sense to me:
https://gist.github.com/huitseeker/ae37b40fae6047e264ae3742e9172b25
It has the advantage of obviating the issue about merge not inserting new metadata keys, but YMMV.
crates/project/src/dependencies.rs
Outdated
| let Some(workspace_root) = workspace_path.and_then(|p| p.strip_suffix("miden-project.toml")) && | ||
| uri.path().strip_prefix(workspace_root).is_some() && | ||
| // Make sure the uri is relative to workspace root | ||
| (!workspace_root.is_empty() || !uri.path().starts_with('/') || !uri.path().starts_with("..")) |
There was a problem hiding this comment.
I think this guard may mark out-of-workspace paths as Workspace in no_std. The last check uses ||, so it passes for most inputs (!starts_with('/') is usually true on relative paths, and !starts_with("..") is true for absolute paths).
Would it be safer to require all three conditions with &&, or normalize the path before checking membership?
There was a problem hiding this comment.
This was meant to be:
(!workspace_root.is_empty() && !(uri.path().starts_with('/') || uri.path().starts_with("..")))In a no-std environment, we don't actually have files anyway, so this is really just attempting to do something reasonable to infer workspace-relative dependencies of a virtual project manifest with respect to a virtual workspace manifest, based on the virtual filenames those manifests carry (if they even have them at all).
I think ideally this bit of code would be cfg-d out, but it is retained since it is still possible to parse manifests from strings and not filesystem paths. The only reason to even allow compiling in that configuration is for Wasm targets, but we may be able to require std on Wasm targets (via WASI) if we know that satisfies all of our build requirements.
…ncyVersionScheme::try_from_in_workspace
…ading projects in workspace
I'll readily agree that if we can factor out shared behaviors and avoid unnecessary duplication, that's beneficial - but I'm not sure this is the best example of that to demonstrate those benefits - or at least it doesn't seem that way to me. IMO the diff there doesn't seem to meaningfully simplify things in a way that eases maintenance or clarifies readability. All of the original code is still there, but with more boilerplate, and with implementation details spread further afield. IMO, the place where traits are great for this sort of thing, is when you can reduce boilerplate associated with multiple similar implementations of some behavior, separating the parts that are different from the part(s) that are common. That doesn't really seem to be the case here - but like I said, perhaps it just isn't the best example. I'm certainly open to this sort of thing in general.
That's not really a result of the approach though - both versions of the code could exhibit this issue, you've just added the fix for it in your diff. |
This PR implements a new crate,
miden-project, which implements the data structures for representing Miden projects, along with the AST representation of themiden-project.tomlfile initially sketched out in #1919. Note that the actual syntax ofmiden-project.tomlas implemented in this PR differs from that initial sketch in a few ways. See the example project layouts incrates/project/examplesfor the syntax of workspace-based and standalone project files.I'm still making a few small adjustments, and implementing tests for the parsing/validation bits, but this should be more or less reviewable as-is, at least in terms of the high-level design.
Links
miden-project.tomlcan be found in RFC: Introducemiden-project.tomlmanifest file #1919TODO
pubgrubplumbing for package selection during dependency resolution