This repository was archived by the owner on Jul 13, 2021. It is now read-only.
Improve inference in update_limits!
#504
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a small demonstration of a potential alternate path towards reducing latency compared to #502. This PR takes one tiny sliver of the problem,
update_limits!, and improves its inferrability---not just in terms of its return type (which isScene), but also its internals. The reasoning here is that if we improve the quality of inference, then (1) there's less abstract inference, which typically takes more time than concrete inference, and (2) it makes it possible to prevent runtime inference altogether, since we can typically precompile concrete signatures.The main point of this PR is to tell you how I went about this and allow you to see the result: first, I profiled
display(plot(rand(5)))and then used ProfileView:The flames that rise high to the sky are almost always inference. By hovering over the bars at the base you can sometimes figure out what is being inferred (obviously,
@snoopigives you more direct routes to figuring this out, but sometimes it's nice to know the call chain and profiling is really good for that).update_limits!is an incredibly skinny (i.e., not very consequential) vertical bar fairly near the beginning; I didn't pick it for any particular reason other than a glance or two showed me what was being inferred.Then, I just did
Hit 'o' to turn off optimization and 'w' to turn on coloration of non-concrete types. Then I just used the menu to follow the calls into
update_limits!(::Scene, ::Automatic).The problems I found stemmed largely from JuliaLang/julia#15276, and given the heavy
do-block usage in this package I expect that to be pretty common. The underlying issues are well-described in https://docs.julialang.org/en/v1/manual/performance-tips/#man-performance-captured. It may be worth considering the use of FastClosures.jl if this proves to be widespread. In this case, however, I just solved the problem manually. Here, theletstatements prior to use in an anonymous function solve the problem (with the small addition of making sure that theifelsestatements inferred well).There were also a couple of other more localized problems:
scene.padding[]was not inferrable so I added a type-assert on the argument. This guarantees that the method will never be inferred for anything non-concrete at that argument position, no matter where it's called from.data_limitswas also not inferrable. One of the things to keep in mind is that if Julia hasnapplicable methods (given what it knows about the argument type(s)), all of which return the same typeT, Julia still will infer it asAnyif n > 3. (It might be 4 on 1.5 but it's 3 on 1.6.) This doesn't come up with objects of known argument type, because there's only ever 1 applicable method, but when you're pulling things out of containers with abstract element types it's crucial to be aware of this issue.Again, this particular PR is such an infintesimal step that I expect no measurable improvements whatsoever; the main point is to demonstrate the idea and show you my workflow. For the record, this is quite similar to the work that I did eliminating invalidations in Julia itself, which I think has had a pretty big impact on latency.
Making this viable
I'd really rather not feel like I have to tackle this alone, so I'm curious how interested the Makie developers really are in this as a mini-project. If you're not already familiar with these tools, I hope somebody reading this actually tries these same steps to follow along, so you can get a sense for what such an adventure would look like. As a word of warning, while the tools are really good for this now, it still can be slow, laborious work, but the cumulative effect really adds up.
As motivation, let me show you where I've gotten Revise. Here's the test I did:
juliamns -i -e 'using Profile; @profile (using Revise; using Example; cp("/tmp/Example.jl", "/home/tim/.julia/dev/Example/src/Example.jl"; force=true); revise())'This test profiles the entire Revise pipeline, from tracking packages to detecting diffs to evaluating them and creating new methods. Revise is not a tiny package, and moreover all this requires both JuliaInterpreter and LoweredCodeUtils, so there's a quite a lot of functionality actively being compiled during this demo.
After the REPL came up then I did
using ProfileView; ProfileView.view()and got this:You can see lots of code-loading and active compilation (the red bars going across the top), but the amount of that inference contributes to this is almost nothing (there are relatively few tall flames and they are very skinny). The few stragglers come from the fact that calling methods owned by other packages or Base with novel types is not currently precompilable, and for Revise that's mostly
DictandOrderedDictmethods. So yes there is still latency, but in a very nontrivial example I've almost completely gotten rid of the contribution of inference to that latency. This is a very different result from Makie, but I predict one could probably get the Makie one looking pretty close to Revise's with sufficient effort. One might predict a 2-fold decrease in latency, which would not be a bad accomplishment.