-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Add multiple files compilation mode for crossgen2 #37411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Did you run crossgen on crossgen2 itself before doing the measurements? Without R2R code, a significant portion of time will be spent JITting the compiler. Compilers (clang, Roslyn, VC++, you name it) don't offer multiple input/multiple output modes because they're not useful when integrating into build systems. Multiple-input/multiple-output is a test hook. We place test hooks in the src\coreclr\src\tools\r2rtest runner so that we don't have test hooks in the shipping compiler. r2rtest already has modes to compile all files in a directory. We could potentially add a new launch option that doesn't create a new compilation process, but does an Note that the crossgen2 compiler has known "memory leaks" when run in such mode and will run out of memory eventually (there are static fields that cache state that is only relevant to a single compilation and is never released until the process dies). Cc @dotnet/crossgen-contrib |
|
Yes, forgot to mention that, all system libs and crossgen2 libs are compiled with first crossgen in r2r mode. However, runtime still spends much time on jitting, even with r2r images. Change in this PR allows to mitigate startup overhead of jitting and loading crossgen2. Our concern is that on small dlls crossgen2 perf is much worse than first crossgen (for example, on more that 1100% for crossgen2.dll on x64). For large dlls like SPC.dll crossgen2 is better on 25% on x64, however this is achieved with 16 threads, and with 1 thread crossgen2 is 2x times slower. On arm devices with 2 cpus, however, we can't launch so many threads, so crossgen2 is slower than first crossgen even on SPC.dll. Here's some data: x64
armel
|
|
@gbalykov, thank you for looking deeper into this. I agree that it is quite concerning how slow crossgen2 is, especially for the smaller binaries. There are several details to note.
|
Yes, I'm aware of that mode. That's why I keep pushing for people to stop using statics to store per compilation state but people keep adding those when I'm not looking. But there's a difference between a compiler server and a command line argument to batch compile. The former can be integrated into build systems (but it does bring it's own challenges which is why I sometimes need to taskkill I know what sort of response I get for this, but I compiled crossgen2 with the CoreRT compiler and compared throughput with the CoreCLR/ReadyToRun based one:
We'll probably need to build a compilation server to get throughput anywhere near this as long as the compiler is hosted on top of CoreCLR. |
When we were discussing publishing options for crossgen2, we ruled out self-contained publishing because the size was prohibitively large - I don't think we'll be able to composite-compile crossgen2 itself and ship it that way to our customers. We can use it to speed up our inner loop, but I'm skeptical of our ability to pass the benefit to our customers. |
|
We've faced with crossgen2 throughput on compiling tests, @davidwrighton one of the ways to solve it using cross compilation, and it should already work by using: https://github.com/dotnet/runtime/tree/master/src/coreclr/src/jit/armelnonjit But there is another case it's when user install application from market to device. On this scenario we cannot use cross compilation. This PR or server mode will help. In case of server mode it should be easily started and shot down. |
|
Ah, as I understand it, this switch is intended for use outside of build system driven scenarios, and only for use within a bespoke application installer pipeline built by your company. This isn't a scenario that has been considered actively as part of crossgen2 development. Could you share the amount of improvement that you are seeing as a result of this change to the end to end install time of typical application? |
|
// Auto-generated message 69e114c which was merged 12/7 removed the intermediate src/coreclr/src/ folder. This PR needs to be updated as it touches files in that directory which causes conflicts. To update your commits you can use this bash script: https://gist.github.com/ViktorHofer/6d24f62abdcddb518b4966ead5ef3783. Feel free to use the comment section of the gist to improve the script for others. |
|
@davidwrighton this PR seems to have been waiting on an update for 7 months - would it make sense to close it if it's not actively being worked on? |
|
Yes, I think that makes sense. @alpencolt if you are still interested in this, please re-activate this PR/provide some of the performance numbers we were looking for. |
|
@davidwrighton we're working right now on this, I hope we'll share results for crossgen vs crossgen2 perfromance and memory comparison on armel in this week. |
Sorry for the late response. I've measured Calculator app as a typical Tizen Xamarin app. Its installation (without ni compilation) takes just 3.656 seconds. Target arm device has just two cpus, so crossgen2 results for >=2 threads are pretty much the same.
As you can see, pipeline mode saves 18.3 (38%) and 23.6 (39%) seconds for 2 and 1 threads respectively. Considering app installation time, pipeline mode saves 35-37% of end-to-end app install time. Without these changes crossgen2 is almost 3 times slower than crossgen1. Additionally, I've measured system libs recompilation from scratch using crossgen1 and crossgen2.
Unfortunately, pipeline mode leaks memory which results in process getting killed by oom killer. So, I wasn't able to measure all 261 system libs compilation in one command. Anyway, currently it takes ~5mins for crossgen1 to compile all system libs and ~30mins for crossgen2. cc @alpencolt |
|
I've re-opened the request, as there is active work happening here. Could you clarify if the crossgen2 binaries in this test were themselves crossgenned? Also, could you describe what version of crossgen2 is in use here? Is it from the 5.0 release branch, or a recent build from the master branch? @nattress, @mangod9 We need to come up with a solution here of some form. In my opinion a slowdown of 2.75X is really not acceptable. I dislike the approach taken here, but it is very expedient, and not particularly impacting to the scenarios we use here in the more general .NET community. @alpencolt @gbalykov you mention that pipeline mode leaks memory. Do you know what it is leaking, and by how much? |
|
This is measured on dotnet/runtime master (6.0), commit d266fdb. In application related measurements above, all system libs including crossgen2 are compiled in r2r. In "system libs recompilation from scratch" scenario no libs are compiled, all libs are compiled from scratch, order of libs compilation is System.Private.CoreLib.dll, all 5 crossgen2 dlls, others in some order. I'm not yet sure what is leaking in pipeline mode, but this resulted in ~800 Mb of physical memory occupied for approximately 127 compiled dlls. Then oom killer killed the process. This is the patch that I've used: 1.patch.txt. |
|
Regarding memory consumption, here's memory consumption of System.Private.CoreLib.dll compilation (when all system libs are compiled in r2r):
Crossgen2 RSS is more than 3 times higher. For Tizen Xamarin app, mentioned above, crossgen2 RSS is ~2 times higher.
|
|
@mangod9 could you please set an assignee on this PR ? It helps for old PR's to have a "shepherd" and this is the oldest one without such a person.. |
|
@mangod9 thoughts about owner? |
|
Sorry, must have missed the previous tag. Adding @trylek as well. We will discuss how to proceed in this week. |
- Add --out-near-input option, which adds .ni. suffix to input filepath and stores resulting ni.dll near original dll. In this mode --out option can be skipped. - Add --single-file-compilation mode, which allows to compile all input files separately.
a601571 to
d266fdb
Compare
|
Hey @gbalykov. assume this PR is still relevant? If you could resolve conflicts we can work on getting it merged. Just a note that as part of the regular workflow we wouldnt be validating the multiple file compilation mode. |
|
@mangod9 yes, this is still relevant, I'll rebase it |
|
Hi @gbalykov, closing this for now, please reopen when its ready for review? Reminder that preview7 (early July) is when new feature work should be done for .net 6. Thanks. |
By using
--out-near-inputand--single-file-compilationoptions multiple files can be compiled in one invocation of crossgen2. This allows to remove startup overhead if many files are to be compiled anyway.x64, f95b2b2, release build, 100 measurements for each case (for single-file compilation mode all 100 copies of file are passed in one command):
default number of threads
Basically, the result is a constant ~0.3s diff per dll:
1 thread
cc @alpencolt