Skip to content

feat: initial implementation of project file format#2510

Open
bitwalker wants to merge 30 commits intonextfrom
bitwalker/project-file
Open

feat: initial implementation of project file format#2510
bitwalker wants to merge 30 commits intonextfrom
bitwalker/project-file

Conversation

@bitwalker
Copy link
Collaborator

@bitwalker bitwalker commented Dec 30, 2025

This PR implements a new crate, miden-project, which implements the data structures for representing Miden projects, along with the AST representation of the miden-project.toml file initially sketched out in #1919. Note that the actual syntax of miden-project.toml as implemented in this PR differs from that initial sketch in a few ways. See the example project layouts in crates/project/examples for the syntax of workspace-based and standalone project files.

I'm still making a few small adjustments, and implementing tests for the parsing/validation bits, but this should be more or less reviewable as-is, at least in terms of the high-level design.

Links

TODO

  • Finish implementing tests
  • Add the necessary pubgrub plumbing for package selection during dependency resolution
  • Clean up pass (remove unused dependencies, make sure everything is correctly documented, remove unused APIs)

@bitwalker bitwalker added this to the v0.21.0 milestone Dec 30, 2025
@bitwalker bitwalker requested a review from bobbinth December 30, 2025 20:21
@bitwalker bitwalker self-assigned this Dec 30, 2025
@bitwalker bitwalker added the assembly Related to Miden assembly label Dec 30, 2025
@@ -0,0 +1,11 @@

[package]
name = "my-account-note"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From #1919:

The Rust tooling uses/reproduces much of this information in Cargo.toml - we could move most of that to miden-project.toml instead, or alternatively, simply merge the relevant sections from Cargo.toml into an in-memory miden-project.toml manifest. Much of what I've laid out here is in fact the parts we already define/maintain in Cargo.toml for Rust-based projects today

I wonder what our plan is for Rust projects?
Currently we store all Miden-related fields in the [package.metadata.miden] section of the Cargo.toml. See a P2ID example.

Do we want to require miden-project.toml and move the Miden fields currently residing in the Cargo.toml there? Multiple fields end up duplicated (name, version, etc.). Plus, the developer would have package dependencies defined in the miden-project.toml and the crate dependencies in the Cargo.toml.

I'd suggest to keep storing all Miden fields in Cargo.toml. The cargo miden build can embed generated miden-project.toml in the compiled Miden package.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue with storing everything in Cargo.toml has a few issues:

  1. We will have tooling, and there will be third-party tooling, built around Miden projects, and storing project information in Cargo.toml requires those tools to understand not only miden-project.toml, but also Cargo.toml and however we choose to encode miden-projet.toml information inside of it.
  2. In the future, other languages that can compile to Miden may not have a TOML configuration file (or at least one that can be piggy-backed on like Cargo.toml), in which case they would need to use miden-project.toml and would face the same dilemma of duplication between language-specific project config and Miden project config. Being consistent about the boundary between language-specific config and Miden config up front will lead to less confusion IMO.
  3. It complicates IDE/LSP integration, since Cargo.toml is already used as a language marker for Rust, it would require parsing Cargo.toml to determine if the directory is a Miden/Rust project, or just a Rust project. In some editors it may not be a problem, if extensions are provided an opportunity to do this when a worktree is opened in the editor (e.g. Zed), but I don't believe that is always the case.
  4. Even if we were to use Cargo.toml for Miden+Rust projects, we still have a situation where we are specifying dependencies in two places in Cargo.toml - one for normal crates, and then one for Miden packages under ([package.metadata.miden.dependencies]). IMO, it's more likely to confuse users that both sets of dependencies are specified in different places in the same file. If Miden package dependencies are in miden-project.toml and pure Rust dependencies are in Cargo.toml, the delineation between the two is a lot more clear (and easier to explain).
  5. Rust crates which are compiled to Miden will not be published to crates.io - so virtually all of the metadata that would typically go in Cargo.toml can be specified in miden-project.toml instead, aside from name and version. GIven that, it isn't clear to me how much information would actually be duplicated, or how much of a pain that would be in practice. Changing the name is going to be very rare. Changing version is more frequent, so if it is considered painful enough to have to maintain it in two places, we could provide a solution there that allows specifying in miden-project.toml that the version is to be inferred from external/language-specific tooling, something like version = { external = true, language = "rust" } | { external = true, path = "VERSION" }. TBH, I don't think it is actually a significant enough pain point for that, but 🤷. Aside from version though - there just isn't overlap between Cargo.toml and miden-project.toml to combine them IMO.

I'm not saying I couldn't be persuaded on this point, but it does feel like preemptively addressing an issue that we don't actually know is going to be an issue in practice. I would want to see how much duplication there actually is (given the constraints I outlined above, e.g. defining description in Cargo.toml doesn't make any sense), and from there determine if it is a significant enough problem to warrant complicating project handling by allowing Rust projects to piggy-back on Cargo.toml.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair points. I'm sold on the project-level miden-project.toml.

The workspace-level miden-project.toml still bothers me due to the members and profile duplication with Cargo.toml. Assuming they have to be maintained by the user.

@greenhat
Copy link
Collaborator

greenhat commented Jan 5, 2026

I like the version-or-digest approach and that we'll store it in the same field. Apart from my question about the workspace level in Rust projects at #2510 (comment) everything sounds good.

@bitwalker bitwalker marked this pull request as ready for review January 7, 2026 20:29
@bitwalker
Copy link
Collaborator Author

This is ready for review, though I expect to add a few more tests still, and write up a README doc that describes the usage and syntax of the project file.

A few design details to note during your review:

  • No changes are made to existing crates in this repo for this PR - this is all greenfield code in a new crate
  • The majority of this PR's content can be broken up into three categories, which can be reviewed independently to make it easier to focus on the relevant details:
    1. The format/syntax/features of the project file, the data structures that represent its AST, and the data structures that represent the loaded/validated representation which is derived from the AST.
    2. The basic serialization boilerplate, the parsing functions that deserialize the AST from TOML source code, and the code to translate from the AST to the final user-facing data structures.
    3. The dependency resolver and associated implementation details. Note that the resolver does not actually fetch/load packages, it provides only the high-level resolution and version selection based on a simple index of packages, their versions, and the dependency requirements of those versions.
  • There are a handful of examples in the crate which are used in the tests, and provide a sense of the syntax and relationship between project file and directory structure/packages.
  • TargetType is essentially a duplicate of PackageKind, and I expect that PackageKind will be replaced with TargetType in subsequent PRs.
  • I expect to make miden_project::Package derivable from a miden_mast_package::Package, or even embed the manifest directly in the latter, in subsequent PRs. Either way, the goal would be that if you have a miden_mast_package::Package, you have access to the version of the manifest used to assemble that package. This can then be used in dependency resolution, etc.
  • The dependency resolution algorithm is provided by pubgrub, and is set up to largely mimic Cargo's dependency resolution behavior, with some simplifications. Additionally, our usage of it extends its behavior to support version requirements that use content digests in addition to, or in lieu of semantic version constraints. This aspect of the PR is experimental, and likely to change/evolve with further testing, but is an essential component of downstream tooling that will build on this, so some implementation of its functionality was needed.

If you have any questions about specific details, leave a comment here or ping me to discuss, I'm happy to walk through any of this, since there is a lot of context that isn't easy to glean from just the source code or its documentation.

@bitwalker
Copy link
Collaborator Author

I like the version-or-digest approach and that we'll store it in the same field. Apart from my question about the workspace level in Rust projects at #2510 (comment) everything sounds good.

The duplication is hard to avoid at the workspace level. We could derive the Workspace for Rust projects from Cargo.toml, but that leaves the issue that I mentioned before, which is that this will break expectations for tooling that only cares about Miden projects in general, and not the language-specific details.

Regardless, we need the workspace-level project file for workspace management of Miden Assembly projects anyway, so the issue of duplication with Cargo.toml is something specific to the Rust toolchain, rather than an issue to be addressed here, IMO.

@bitwalker bitwalker force-pushed the bitwalker/project-file branch 5 times, most recently from feea354 to e252ca3 Compare January 7, 2026 23:06
Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! Not a full review - I mostly reviewed docs and examples for now - but I left some comments inline.


### Lint configuration

TODO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's add the "LICENSE" section as we do for other crates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add some comments explaining the example. My understanding is: this project defines a kernel with two targets:

  • The kernel executable, under kernel/mod.masm.
  • The kernel library under the namespace userspace.

If the above is correct, I believe this is similar to how we define the transaction kernel in miden-base, where:

  • kernel/mod.masm is meant to be similar to main.masm.
  • userspace/mod.masm is meant to be similar to api.masm.

Currently, we assemble these programmatically using the build.rs script - but once we have the project file (and update the assembler to be able to work with projects), I'm assuming the build.rs could be greatly simplified - right? Basically, we'd be able to instantiate the assembler and then just feed the project file to it, and everything else will happen automatically.

A few questions/comments:

  • I think kind = "kernel" should be attacked to the lib/mod.masm target. I assume this is used to assemble the library using Assembler::assemble_kernel_from_dir() and this function outputs kernel libraries rather than executables.
    • This would also mean that for kernel/mod.masm we'd have kind = "executable" - right?
  • Assembler::assemble_kernel_from_dir() takes two parameters: (1) path to the module - which I think would be lib/mod.masm, and (2) path to directory of the library that would get statically linked into the kernel. What would be the directory path here? Or would this be changed somehow?
  • Related to the above, when we assemble the kernel library, we don't need to specify the namespace as we automatically get the $kernel namespace. So, we probably don't need the namespace = "userspace" attribute (and it should probably not be allowed for kernel targets).
  • How would the module graph look here? Specifically:
    • Since kernel/mod.masm is an executable, I'm assuming we can't import anything from there. So, it is unclear why some procedures there are declared as pub (that should be an assembly error, right?).
    • I'm also assuming since userspace/mod.masm is meant to be a kernel API file, we won't be able to refer to it from kernel/mod.masm - right?
    • If the above is correct, what would be the best way to replicate the "shared library" concept which we manually orchestrate in miden-base? Specifically, we have the lib directory that can be references from both the kernel executable and the kernel library.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add some comments explaining the example.

For sure, these were thrown together for simple testing, but I can certainly add some documentation explaining the setup in theory.

My understanding is: this project defines a kernel with two targets:

Not exactly - or rather, to be more precise, here we're defining a project that can produce two packages:

  1. A kernel component, whose root module is located at kernel/mod.masm - we're specifying the path because how the project is organized differs from the default single-target layout where the path is assumed to be mod.masm.
  2. A library component, whose root module is located at lib/mod.masm. This library component is intended to be the userspace abstraction over the kernel target. We're specifying a different namespace from the default just to exercise that option here, but by default it would be derived from the package name.

A couple of things to note:

  • A project may only define a single kernel target.
  • A project may define any number of non-kernel targets
  • kernel/mod.masm is meant to be similar to main.masm.
  • userspace/mod.masm is meant to be similar to api.masm.

The canonical expectation by the assembler is that the top-level module is always mod.masm. By top-level module, I mean the module which owns the namespace for the library. In practice, if this module is not present, we just treat it as if that module is empty, and parse other modules as children of the top-level namespace, but that's more a convenience than anything else. In practice, we should probably always have a mod.masm even if it's empty.

Currently, we assemble these programmatically using the build.rs script - but once we have the project file (and update the assembler to be able to work with projects), I'm assuming the build.rs could be greatly simplified - right? Basically, we'd be able to instantiate the assembler and then just feed the project file to it, and everything else will happen automatically.

Yes, exactly.

  • I think kind = "kernel" should be attacked to the lib/mod.masm target. I assume this is used to assemble the library using Assembler::assemble_kernel_from_dir() and this function outputs kernel libraries rather than executables.

As noted above, the userspace library isn't assembled as part of the kernel, because it exists separate from the kernel - that's why these are two separate targets. This keeps the kernel itself stable, while allowing the userspace to change more frequently.

  • This would also mean that for kernel/mod.masm we'd have kind = "executable" - right?

The assembler currently distinguishes between kernels and executables, but maybe that distinction is meaningless, and instead we should just have kernels and libraries, where executables are always kernels - that's effectively the case already.

If we do remove executables as a distinct concept, then we'd have no need to have executable targets here.

  • Assembler::assemble_kernel_from_dir() takes two parameters: (1) path to the module - which I think would be lib/mod.masm, and (2) path to directory of the library that would get statically linked into the kernel. What would be the directory path here? Or would this be changed somehow?

I think we'd just make assemble_kernel_from_dir take a single path, the path to the kernel module directory, where mod.masm is the kernel module itself, and all other modules are statically linked into the kernel artifact. There's no reason to have them be separate.

  • Related to the above, when we assemble the kernel library, we don't need to specify the namespace as we automatically get the $kernel namespace. So, we probably don't need the namespace = "userspace" attribute (and it should probably not be allowed for kernel targets).

It's not allowed for kernel targets, but to be clear, the library target here is the userspace target, i.e. the library users of the kernel will use to interact with it, not library code consumed internally by the kernel. The latter is implicitly part of the kernel module hierarchy.

  • How would the module graph look here? Specifically:

    • Since kernel/mod.masm is an executable, I'm assuming we can't import anything

You can import from modules defined as part of the kernel, e.g. if there's a kernel/a.masm, you could import that as use $kernel::a from the $kernel module. Note that those modules are internal to the kernel artifact, and are not visible to users of the kernel.

So, it is unclear why some procedures there are declared as pub (that should be an assembly error, right?).

The assembler does not treat kernels as executables, but as libraries - executables are a separate thing. The kernel interface is defined as the set of procedures exported from the kernel module.

  • I'm also assuming since userspace/mod.masm is meant to be a kernel API file, we won't be able to refer to it from kernel/mod.masm - right?

Right, the userspace is a separate target (which presumably depends on the kernel) so its modules will not be available to the kernel itself.

  • If the above is correct, what would be the best way to replicate the "shared library" concept which we manually orchestrate in miden-base? Specifically, we have the lib directory that can be references from both the kernel executable and the kernel library.

You'd define a library target that is a dependency of both the kernel and the userspace.

Copy link
Contributor

@bobbinth bobbinth Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand the intent better now - though, I think it differs a bit from how we set things up in the protocol repo. We have two cases there that would be good to cover with the project file so that we could simplify the build.rs files there as much as possible.

The two cases are:

  • The transaction kernel (located here).
  • The standard smart contracts (located here).

The standards library is pretty simple - it is a single library that has two dependencies: (1) Miden core lib and (2) transaction wrapper kernel lib. Both of these are regular dependencies (i.e., non-kernel dependency). I'm thinking that once we move this to a project format we could define this library via a single project file.

The transaction kernel is more complex. It consists of 4 parts:

  • The kernel API (defined here) - i.e., a set of procedures that can be syscall-ed by user programs. This library depends only on Miden core lib.
  • The protocol library (defined here - this is a set of procedures that wraps all kernel API procedures. The primary purpose of this library is twofold: (1) to let the user call kernel procedures via the exec instruction (i.e., the wrapper procedures handle all necessary padding) and (2) to invoke kernel procedures by index rather than hash. This library depends on the Miden core lib as a regular dependency and on the kernel API library as a kernel dependency.
  • Two executable files (defined here and here) which is basically the "driver" of the transaction kernel. These executables depend on the Miden core lib and some internal modules of the kernel library (but these are used as a regular dependency).
  • A set of shared utilities that are used by the kernel API and protocol library modules.

Expressing all these as a single project would probably be quite complicated and may not be worth it. So, would be good to figure out how we can split this up into 2 (or at most 3) projects and then make sure that the project file format covers these. One option is:

  • The shared utilities project: this would contain the shared utilities defined as a simple library.
  • The transaction kernel project: this would output 2 executables and a single kernel library.
  • The protocol project: This would depend on shared utilities and the kernel library.

Also cc @PhilippGackstatter as he may have additional suggestions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a few different ways to structure things, but your general outline sounds perfectly good to me - a few notes:

  • The whole thing would be part of a single workspace, so dependencies can be trivially shared between projects in the workspace, including packages within the workspace.
  • A single project can cover most of the kernel target pieces (the kernel library, and the two executables)
  • The code shared between the kernel, userspace, and smart contracts can be in a separate library, nothing surprising there.
  • The userspace library as a separate project makes sense to me - it simplifies things a bit.

I'll get this PR updated tomorrow at some point, and we can do a final pass on what things look like.

Copy link
Contributor

@PhilippGackstatter PhilippGackstatter Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are also building account components in miden-standards with build.rs. I just described a case (0xMiden/protocol#2527 (comment)) where it would be useful to have a separate library to be able to share code between two account components, but that code should not live in miden::standards directly. This is currently difficult to do (would require special code in build.rs), but I think if we can easily setup such library projects in the future with this project file, this would be a real help.

Afaict, this should be easily possible with the work done here, so I just wanted to mention that additional use case in miden-protocol as a reference.

One question I have in this context: Does the project file allow specifying if a dependency is linked statically or dynamically, or is this decided at a different level? At first glance, this would seem useful to say "link miden::standards dynamically" but "link multisig library statically".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file seems to be empty. It may be good to add some dummy procedure here - or a comment explaining why it is intentionally left empty.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it's really just there to make the module organization clear.

Comment on lines +5 to +6
[[target]]
kind = "account-component"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could specify namespace here as well, right? I'd probably do this as we encourage people to define unique namespaces for their components/notes - e.g., my_project::my_component

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you can always specify namespace for any component other than executables and kernels.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should note that by default the namespace is derived from the project name, and multi-target projects must have unique namespaces for each target or a validation error is raised.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we add a dummy procedure to the my-account example, we could add an example here to illustrate how the account component would be imported and how a procedure from that component could be invoked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to other comments: I'd add comments with a brief explanation of this example here.

* `library`, `lib` - produce a package which exports one or more procedures that can be called from other artifacts, but cannot be executed by the Miden VM without additional setup. Libraries have no implicit dependency on any particular kernel.
* `executable`, `bin`, or `program` - produce a package which has a single entrypoint, and can be executed by the Miden VM directly. Executables have no dependency on any particular kernel.
* `account-component` - produce a package which is a valid account component in the Miden protocol, and contains all metadata needed to construct that component. This type is only valid in conjunction with the Miden transaction kernel.
* `note-script` - produce a package which is a valid note script in the Miden protocol, and exports the necessary metadata and procedures to construct and execute the note. This type is only valid in conjunction with the Miden transaction kernel.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we say note-script but in the examples we use note. I think note is the right approach.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks!

As noted above, we've added a target definition that is equivalent to the default target that is inferred if no targets are explicitly declared: a library, whose root module is expected to be found in `mod.masm`, and whose modules will all belong to the `name` namespace (individual modules will have
their path derived from the directory structure).

There are other types of targets though, currently the available kinds are:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are missing kernel from the list below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, thanks!

@bitwalker
Copy link
Collaborator Author

Just an update - I'm modifying the project file syntax a bit to restrict it in some ways, as the existing flexibility in the current draft makes for some awkward complexity that can be avoided with some simplification. I should have my updated draft up later tonight.

@bitwalker
Copy link
Collaborator Author

I've documented the new approach in crates/project/README.md, and will follow up with the changes in implementation tonight/tomorrow morning.

@bobbinth bobbinth requested a review from huitseeker March 10, 2026 15:35
Copy link
Contributor

@huitseeker huitseeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main blockers before merge are correctness issues in dependency resolution / project parsing:

  • profile table deserialization only updates existing entries and never inserts new ones,
  • digest+semver intersection and contains() behavior can accept/reject versions incorrectly, producing wrong solutions or false “no solution” failures,
  • it looks like current canonicalization checks the workspace root against itself, so path deps can be over-classified as workspace members,
  • package metadata is populated from lints, so metadata/lints can be misapplied.
  • should_trim_paths() returns the debug flag instead of trim-paths,
  • I think there are reachable todo!()/panic!() in version requirement ordering can crash resolution on edge constraints,
  • member paths are joined without a boundary check, allowing ../-style escape outside the workspace root.

In general:

It kind of looks like the same concepts are modeled twice, once in AST and once in runtime (Profile, Package, Workspace, Target). That creates extra mapping code and makes drift likely across {ast/,}(profile|workspace). Is there a trait-based approach that can keep one canonical model for each concept, and keep parsing types as thin input wrappers only where needed?

There are a few rules spread across version_requirement.rs, resolver/version_set.rs, and resolver/provider.rs. Is there a way to have one source of truth re: those? That would also help with negative tests against those rules, which would help review quite a bit.

}

fn contains(&self, v: &Self::V) -> bool {
if let Some(digest) = v.digest.as_ref() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contains checks only digest when v has one. That skips self.range.contains(&v.version), so a version with a matching digest can pass even when semver is outside the allowed range. Should this require both range and digest checks?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't care about the semantic version range when a digest is specified, as the latter is considered authoritative (i.e. if you request a specific digest of a dependency, then all instances of that dependency in the transitive closure of your dependency graph must be compatible with that exact package, or we will raise a conflict).

* This removes the multi-target example in favor of a new "protocol"
  example intended to model how `miden-protocol` might be organized.
* Add optional target names to allow for multiple executable targets in
  the same project, disambiguated by name.
* Add target `requires` field to allow depending on other targets in the
  same project. This is strictly for depending on kernel/library
  targets, others will raise an error.
* Fix the resolution of workspace dependencies
This is necessary for non-MASM source projects. In cases where the
`path` is missing, it is expected that the assembler will be given the
modules for a target that it is being requested to assemble.
…hen digests are present in any of the input sets
@bitwalker bitwalker force-pushed the bitwalker/project-file branch from 8d93199 to fe2146f Compare March 10, 2026 22:29
@bitwalker
Copy link
Collaborator Author

It kind of looks like the same concepts are modeled twice, once in AST and once in runtime (Profile, Package, Workspace, Target). That creates extra mapping code and makes drift likely across {ast/,}(profile|workspace). Is there a trait-based approach that can keep one canonical model for each concept, and keep parsing types as thin input wrappers only where needed?

This split is intentional, so as to separate the concerns of the human-readable manifest representation (i.e. the AST) from how post-processed and validated manifests are represented.

There are a few rules spread across version_requirement.rs, resolver/version_set.rs, and resolver/provider.rs. Is there a way to have one source of truth re: those? That would also help with negative tests against those rules, which would help review quite a bit.

I can take another look here and follow up, but certainly if we can share rules around versioning, that would be ideal.

@bitwalker
Copy link
Collaborator Author

The failing lint check appears to be due to unrelated code in this PR. I do not know why we are running a lint check against nightly tbh, the errors we get are rarely actionable in the PR which triggers the failing check 🤷

@bitwalker
Copy link
Collaborator Author

There are a few rules spread across version_requirement.rs, resolver/version_set.rs, and resolver/provider.rs. Is there a way to have one source of truth re: those? That would also help with negative tests against those rules, which would help review quite a bit.

I dug into this a bit to see if there were obvious cases of duplication between them, but I'm not finding anything particularly extractable as of yet. Each of those types have subtle semantics that aren't identical, but do have some commonality, just not anything that seems likely to be factorable (that's not to say throwing Claude/Codex at it to see what it comes up with might not find something). If you would like to propose a set of missing tests, I can write those up and at least ensure the existing implementation is sane, and in the process perhaps uncover where such a factorization might be useful or called for.

@huitseeker
Copy link
Contributor

I do not know why we are running a lint check against nightly tbh,

rustfmt option Tracking issue
wrap_comments rust-lang/rustfmt#3347
imports_granularity rust-lang/rustfmt#4991
group_imports rust-lang/rustfmt#5083
condense_wildcard_suffixes rust-lang/rustfmt#3371
format_code_in_doc_comments rust-lang/rustfmt#3349
format_macro_matchers rust-lang/rustfmt#3354
hex_literal_case rust-lang/rustfmt#5082
match_block_trailing_comma rust-lang/rustfmt#3380
single_line_if_else_max_width rust-lang/rustfmt#3357
single_line_let_else_max_width rust-lang/rustfmt#3357

The failing lint check appears to be due to unrelated code in this PR
the errors we get are rarely actionable in the PR which triggers the failing check

Do you mean the rustfmt check? The second run of make lint finishes with an exit code of 0 (i.e. the fixes are idempotent) and the modified files are the following, all introduced in this PR AFAICT.

huitseeker@gringolet.local➜~/tmp/miden-vm-review(bitwalker/project-file✗)» git diff --stat [7:14:41]
crates/project/src/ast/package.rs | 6 ++++--
crates/project/src/dependencies.rs | 3 ++-
2 files changed, 6 insertions(+), 3 deletions(-)

@bitwalker
Copy link
Collaborator Author

bitwalker commented Mar 11, 2026

Do you mean the rustfmt check? The second run of make lint finishes with an exit code of 0 (i.e. the fixes are idempotent) and the modified files are the following, all introduced in this PR AFAICT.

huitseeker@gringolet.local➜~/tmp/miden-vm-review(bitwalker/project-file✗)» git diff --stat [7:14:41] crates/project/src/ast/package.rs | 6 ++++-- crates/project/src/dependencies.rs | 3 ++- 2 files changed, 6 insertions(+), 3 deletions(-)

I cannot reproduce that locally - I run make lint locally always before I push. This is the purpose of rust-toolchain.toml for anything that isn't an optional check.

EDIT: I should clarify that I cannot reproduce locally using +nightly, but since my nightly isn't literally yesterdays nightly, that's never guaranteed to succeed.

@bitwalker bitwalker force-pushed the bitwalker/project-file branch from f23ac0e to 0f6e196 Compare March 11, 2026 14:55
Copy link
Contributor

@huitseeker huitseeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It kind of looks like the same concepts are modeled twice, once in AST and once in runtime (Profile, Package, Workspace, Target). That creates extra mapping code and makes drift likely across {ast/,}(profile|workspace). Is there a trait-based approach that can keep one canonical model for each concept, and keep parsing types as thin input wrappers only where needed?

This split is intentional, so as to separate the concerns of the human-readable manifest representation (i.e. the AST) from how post-processed and validated manifests are represented.

That probably justifies different structs, but I think there's some use in having some shared behaviors between those. Here's one example that seemed to make sense to me:

https://gist.github.com/huitseeker/ae37b40fae6047e264ae3742e9172b25

It has the advantage of obviating the issue about merge not inserting new metadata keys, but YMMV.

let Some(workspace_root) = workspace_path.and_then(|p| p.strip_suffix("miden-project.toml")) &&
uri.path().strip_prefix(workspace_root).is_some() &&
// Make sure the uri is relative to workspace root
(!workspace_root.is_empty() || !uri.path().starts_with('/') || !uri.path().starts_with(".."))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this guard may mark out-of-workspace paths as Workspace in no_std. The last check uses ||, so it passes for most inputs (!starts_with('/') is usually true on relative paths, and !starts_with("..") is true for absolute paths).

Would it be safer to require all three conditions with &&, or normalize the path before checking membership?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was meant to be:

(!workspace_root.is_empty() && !(uri.path().starts_with('/') || uri.path().starts_with("..")))

In a no-std environment, we don't actually have files anyway, so this is really just attempting to do something reasonable to infer workspace-relative dependencies of a virtual project manifest with respect to a virtual workspace manifest, based on the virtual filenames those manifests carry (if they even have them at all).

I think ideally this bit of code would be cfg-d out, but it is retained since it is still possible to parse manifests from strings and not filesystem paths. The only reason to even allow compiling in that configuration is for Wasm targets, but we may be able to require std on Wasm targets (via WASI) if we know that satisfies all of our build requirements.

@bitwalker
Copy link
Collaborator Author

That probably justifies different structs, but I think there's some use in having some shared behaviors between those. Here's one example that seemed to make sense to me:

https://gist.github.com/huitseeker/ae37b40fae6047e264ae3742e9172b25

I'll readily agree that if we can factor out shared behaviors and avoid unnecessary duplication, that's beneficial - but I'm not sure this is the best example of that to demonstrate those benefits - or at least it doesn't seem that way to me. IMO the diff there doesn't seem to meaningfully simplify things in a way that eases maintenance or clarifies readability. All of the original code is still there, but with more boilerplate, and with implementation details spread further afield.

IMO, the place where traits are great for this sort of thing, is when you can reduce boilerplate associated with multiple similar implementations of some behavior, separating the parts that are different from the part(s) that are common. That doesn't really seem to be the case here - but like I said, perhaps it just isn't the best example. I'm certainly open to this sort of thing in general.

It has the advantage of obviating the issue about merge not inserting new metadata keys, but YMMV.

That's not really a result of the approach though - both versions of the code could exhibit this issue, you've just added the fix for it in your diff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

assembly Related to Miden assembly

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants