-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Initial Crossgen2 changes for composite R2R support #31663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
davidwrighton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like us to have a discussion about how we expect arguments to be passed to the composite generator before we go down this route.
src/coreclr/src/tools/crossgen2/ILCompiler.ReadyToRun/IBC/IBCProfileData.cs
Outdated
Show resolved
Hide resolved
|
@trylek |
|
@davidwrighton - thanks a lot for your immediate feedback, here are my initial thoughts:
Thanks Tomas |
|
@trylek, I don't think you need to address the time to compile issue as part of this checkin, but that's a concern as we build out this feature. (Although if its a really obvious fix, feel free to stuff it in this changeset) I'm more interested to understand why you were merging all of the reference set into the input set in the general sense. That didn't make sense to me, and I'd like to understand your thinking there. |
|
@davidwrighton - for the 10 minutes compilation time, admittedly this is in debug mode I'm mostly using these days to debug the compiler; it's just 6 minutes when Crossgen2 itself is built in release mode and presumably even less with a release build of JIT so it's probably reasonable to leave more precise measurements for later. For adding references to inputs, I guess that was more of an experiment rather than having any deep design foundations on my part - I've mostly been thinking that for typical apps we'd probably want something like this (only rooting the app and only compiling those parts of the framework found by the dependency analysis) rather than always carrying around the dead weight of the complete framework so I just tried that as that also lets me minimize the counterpart SuperIlc changes. Ultimately, I tend to think that we should probably have two flavors of reference options in Crossgen2 to refer to "assemblies to be compiled into the composite file but without automatic rooting we're doing for input files" vs. "assemblies external to the composite file" (that will come into play once we start implementing layered composite files). At that moment we'll also need to figure out what to do about Crossgen2 options for actually compiling the framework - right now I'm not clear on whether in such case we should also root everything or for instance just the public API surface. |
src/coreclr/src/tools/Common/TypeSystem/Common/TypeSystemContext.cs
Outdated
Show resolved
Hide resolved
...c/tools/crossgen2/ILCompiler.ReadyToRun/Compiler/DependencyAnalysis/ReadyToRun/HeaderNode.cs
Outdated
Show resolved
Hide resolved
|
FYI, during my work on the counterpart R2RDump changes for the composite format I realized an interesting open question regarding import cells. I believe there are basically three ways to adjust import cells to fit the composite format and I'm interested in opinions on the preferable one:
Choice of the root module doesn't matter - manifest metadata covers the entire version bubble of the composite image so that we can encode signatures relative to any of the modules present, a smart choice might just yield a minor size optimization in making some of the signatures a byte or two shorter. |
|
3 - "Change the runtime signature behavior for composite R2R to avoid import cell duplication" looks like the best choice. |
|
Thanks Jan, I agree with you, I have started implementing option 3. In the meantime I have found another funny thing: in my current implementation I'm using the manifest metadata to represent the assemblies within the composite R2R file. After implementing export table directory support in R2RDump and becoming able to parse RTR_HEADER, I've immediately hit a new issue: as we're using AssemblyRef entries to represent the reference assemblies, this naturally doesn't work when the assembly name differs from the module name. I hit this just by lucky chance as I'm still using the regression test b426654 for the initial experiments and that's exactly an IL-written test where the module is named "b426654" but the assembly name is "cs1". I'm wondering whether we should use ModuleRefs instead in the composite case. |
|
We do not have multimodule assemblies in .NET Core. Why does this matter? |
|
Well, it matters at least for CoreCLR tests that involve IL assemblies with hand-written assembly vs. module names. We may decide to not care about them but I'm worried it will make our life harder w.r.t. testing and it seems to me like an overal debt in the implementation. |
|
Ok, but how does it matter for them? I am not following. |
|
The assembly name should be used for both assembly ref and assembly def. The filename should not need to be stored anywhere. |
|
OK, you're probably right it's merely an annoyance for R2RDump, not for the runtime as such, it just sucks that we cannot use the manifest metadata to easily identify the appropriate standalone MSIL files corresponding to the composite R2R components. For the "b426654" test the main app module entry in the manifest metadata is "cs1" so I'll need to try to open all managed modules in all reference paths looking for the one "r426654" that happens to contain the assembly "cs1". |
I do not think you need to do that. You can just declare this case unsupported. There are number of places in the .net tooling that assume assembly name == file name. |
|
OK, that sounds reasonable or at least reasonably simplifying for now. Thanks Jan for sharing your detailed feedback! |
|
OK, with the latest changes I seem to be getting consistent R2RDump info for composite images (except for the EHClauses that are currently commented out due to a design bit I have yet to clarify). I believe I'm ready to start working on the relevant runtime changes but I'd prefer to review and merge in the existing changes separately as the change is already big. I'll start out by sending out a separate review for the R2RDump parts of this change that are relatively straightforward. I'll follow up by preparing separate PR's for SuperIlc and Crossgen2 changes after applying @davidwrighton's feedback towards my "non-rooting input" suggestion. |
.../crossgen2/ILCompiler.ReadyToRun/Compiler/DependencyAnalysis/ReadyToRunCodegenNodeFactory.cs
Outdated
Show resolved
Hide resolved
1897393 to
1f02305
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Make this an assert and have the real check when we're setting up compilation and can write an error out to the user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the LINQ expression to use Single() to throw when the number of elements is not 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this method! - it shows how flexible our system is to add a thing to a module and save it in ~20 lines of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me too :-).
nattress
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - I made a couple nits about comments
ad692ba to
cde6a80
Compare
|
Thanks Simon for your feedback. In the latest (15th) commit I have addressed "conservative" parts of your PR feedback like improved comments and minor code reorganizations. I haven't yet addressed PE header feedback including DavidWr's feedback. I'm not sure how to approach the deterministic ID thing, it looks like a new functionality we don't have any support for in the compiler just yet. My current plan is to just play with the runtime over the weekend and hopefully get to the point I'm able to actually call [non-generic] methods in the composite R2R executable. Ideally early next week I should be able to review and merge in the initial support so that we can fan out further work like shared framework support, diagnostic support, additional performance tuning knobs etc. |
So open a new issue on it and address it in separate PR. |
With initial local changes to enable composite R2R I have finally been able to compile a CoreCLR unit test in this mode. It doesn't yet work at runtime, these changes just fix initial compiler issues I hit during the build: 1) Delegate constructor signature was missing some comparison logic that ended up asserting in the node comparer in the composite build. 2) I have removed two seeminly superfluous parameters to CreateMethodEntrypointNodeHelper. 3) I have removed an assertion related to multi-file RVA fields that I believe to be no longer relevant after the refactorings to copy the RVA fields around properly. Thanks Tomas
I have verified locally that the explicit passing of isInstantiatingStub to DelegateCtorSignature is indeed superflous, it was a leftover from my previous attempts to fix the aliasing problem that later turned out to be caused by the incomplete comparison. I have rolled back this part of my change. Thanks Tomas
This is currently a big WIP. This change lets me compile a test together with the framework into a single composite R2R image. With this change the image does get produced but is still incorrect as the two-level header structure is not yet included so that the available types and method entrypoints cannot be properly resolved. As next step I plan to add this support together with matching R2RDump logic to let me statically prove that the produced image is consistent before I start working on the matching runtime support. Some general observations: 1) Currently this change includes my DelegateCtorSignature comparison fix I have sent out for a separate review. Based on complexity of the composite R2R format changes I�ll either rebase this PR once I merge the DelegateCtorSignature fix in or I�ll send out finalized parts of this change for separate PR�s and subsequently I�ll close this change. 2) With all framework assemblies rooted a simple CoreCLR test builds for about 10 minutes and the resulting executable has about 142 MB. For now I have modified Program.cs so that, in composite mode, reference assemblies (-r) are also added as input files but aren�t automatically rooted for R2R compilation i.o.w. we compile only their subset that is transitively reachable from the input assemblies. With this policy the compiled test executable has about 1 MB. This will definitely require more thought. 3) I had to temporarily delete a check in IBCProfileData that was crashing in the composite build mode. I�ll welcome any feedback as to what is the right fix here. 4) I keep constantly using my local instrumentation to downgrade SuperIlc to .NET Core 3.0 framework when building from the VS as even the newest VS still seems unable to build .NET Core 5-targeting apps. I�ll welcome any thoughts as to whether this merits merging in. 5) For now I plan to modify the R2R PE emitter to stop copying over the managed headers and parts of the input file in the composite mode and emit the header from scratch. One interesting question here is what to do about resource sections � I�m not sure if it�s legal for a PE executable to contain multiple .rsrc sections so we�d either need to copy around just the first one or somehow merge the resources. Alternatively we may be able to ignore the resources with the rationale that, at least in the standalone MSIL case, they remain in existence in the original MSIL files, and just perhaps support specifying an explicit resource file to attach to the executable (e.g. to set up its file manager icon). Thanks Tomas
This change seems to be the minimum letting me compile a small test with the (unrooted) framework without crashes. It mostly deals with proper propagation of the input module list to NodeFactory, creating separate ModuleTokenResolver / SignatureContext for each input module, adding the appropriate mapping logic and splitting the R2R header nodes to global and assembly-level. I'm not completely sure about the instance entrypoint table encoding, that will need to come next as part of R2RDump correctness validation. As next step I plan to make the required changes to the PE header writer in the R2R executable emitter and start working on R2RDump support. Another unfinished bit is the handling of Win32 resources. Thanks Tomas
1) Remove the remaining uses of PE reader from R2R PE writer; 2) Rename PEHeaderCopier to PEHeaderProvider as it can newly either copy around an MSIL header or create a new one from scratch; 3) I have simplified the PEHeaderBuilder constructor by porting over many of the hard-coded header constants from Crossgen1, avoiding the need to copy them over from the input MSIL PE file. 4) I have cleaned up the logic around export symbols and leveraged it for emitting the R2R_HEADER symbol. 5) Fixed a bunch of previously overlooked bugs around manifest metadata table and the new assembly table. Thanks Tomas
I have addressed Michal's initial feedback - I reverted some ECMA-specific changes from the common type system, I changed HeaderNode to use a virtual method for the varying header part and I removed the incorrect change to "non-READYTORUN" part of getFieldInfo. I have added provisions to SuperIlc to let me build tests in composite mode; this mostly boils down to extending CompilerProcess to support multiple input files and slightly restructuring the build construction so that in composite mode we only run a single compilation per folder. As a small fix I noticed that the DLL name in the export table is supposed to keep file extension. Thanks Tomas
With this change I have managed to get the first seemingly meaningful dump of a composite R2R executable. One remaining TODO I haven'�t yet managed to fully fix is EH clauses that contain class tokens for typed catch clauses. I�'m not yet completely clear on how easy it is to get back from the EH clause to the containing method / module so I�m thinking that perhaps in composite case we could change the interpretation of the class token dword to instead point into a special table mapping these dwords to the ordered pairs (module, token). I have also changed the code factory so that input module ordering is explicit which is needed in multiple places. I have fixed manifest metadata so that entries for all components assemblies are generated upfront in the proper order to match the component assembly table entry indices. In signature and instance entrypoint emitter I have made sure to always generate module prefixes in composite mode in order to make their interpretation unique across all component assemblies. I have yet to slightly mop up the context and resolver processing as it currently still causes duplication of import cells emitted from the different modules. In R2RDump I have written a new piece of code to parse the export table directory to get to the RTR_HEADER ReadyToRun header pointer and use it to parse the component assembly table and per-assembly type and method entrypoint tables. Thanks Tomas
Based on my exchange with JanK on the PR thread I realized I can substantially simplify signature context and module token resolver management in Crossgen2: as we don't want to produce module-specific import cells, we also don't need multiple signature context and multiple resolvers. I have basically reverted a part of my change implementing the new class InputModule - without the need for module-specific context and resolvers we can basically return to the old way of having just one resolver and context in the node factory. The new logic is that, in the composite R2R case, we create a special "null signature context" that subsequently forces the emission of explicit module overrides on all signatures produces on the import cells and thus makes them module-agnostic. Thanks Tomas
I have moved the 'Composite' flag into CompilationModuleGroup (under the name IsCompositeBuildMode). For CompilationModuleSet, I just changed it to an enumerable of EcmaModule, with my most recent resolver / context changes it doesn't really need to have such a complex structure. Based on David Wrighton's feedback I rolled back my experimental changes altering the behavior of reference assemblies in composite mode and instead I introduced a new Crossgen2 option '-u' to supply "unrooted input files"; I have modified SuperIlc to use this option when invoking Crossgen2 in composite build mode. Thanks Tomas
This change modifies Crossgen2 so that, during composite R2R compilation, it rewrites the individual input MSIL assemblies to the output folder, marking them with a special type of R2R header stating that this is a component assembly compiled into a composite R2R image (by means of a new section OwnerCompositeExecutable). I have also renamed Context in NodeFactory to SignatureContext to avoid ambiguity with TypeSystemContext. In the R2R compilation the rewriting logic merits additional cleanup - creating of the node graph for component assemblies should be probably moved into the node factory and consolidated with AttachToDependencyGraph to reduce duplication. (But the actual duplication isn't huge.) I have also added provisions to R2RDump to honor this new section type so that it can dump the contents of the rewritten standalone MSIL. Thanks Tomas
As based on my recent design discussions with JanK we want the import cells in composite R2R images to be unique across all modules, we don't need different signature contexts for emitting them. This means that we no longer need to back-propagate the signature context from CorInfoImpl to R2R codegen node factory. Thanks Tomas
By debugging the CoreCLR runtime running the composite R2R image I found found that available types and inlining info wasn't getting filtered by current module. Due to this fact both tables were basically getting duplicated 240 times for each of the component modules. Fixing this reduced the size of the composite R2R image for a small JIT test from 75 MB to 1.5 MB. Thanks Tomas
7559ec3 to
ad7635c
Compare
* Enable crossgen2smoke test for ARM64. * Re-enable NullableWithExplicitLayoutTest sub-test. * Restore setting the image base removed by #31663.
This change adds the new command-line option "--composite",
a new type of input files "--unrooted-input-file" for files that should
be made part of the composite image but only rooted as hit by the
dependency analysis, not automatically as the normal input files
(to reduce the size of framework built along with the app) and new
Crossgen2 logic for producing the additional R2R metadata needed
by composite images including rewriting of input MSIL by injecting
a "component R2R header" on them that forwards the native code
to the composite image. With a set of runtime changes out for
a separate PR I'm able to make first steps in running composite R2R
in the CoreCLR runtime.
Thanks
Tomas