-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
fix performance issue of @nospecialize-d keyword func call
#47059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
base/compiler/tfuncs.jl
Outdated
| elseif isa(appl, DataType) && appl.name === _NAMEDTUPLE_NAME && appl.parameters[1] === () | ||
| # if the first parameter of `NamedTuple` is known to be empty tuple, | ||
| # the second argument should also be empty tuple type, | ||
| # so refine it here | ||
| return Const(NamedTuple{(),Tuple{}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is unnecessary, but I added this while working on this PR, and I think this strictly improves the inference accuracy. Test cases added.
This comment was marked as outdated.
This comment was marked as outdated.
bf0ce6a to
9754b6d
Compare
|
@nanosoldier |
@nospecialize-d keyword func call@nospecialize-d keyword func call
|
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
base/compiler/ssair/passes.jl
Outdated
| if is_known_call(argexpr, tuple, compact) && length(ns) == length(argexpr.args)-1 | ||
| # ok, we know this NamedTuple construction is nothrow, | ||
| # let's mark this NamedTuple as DCE-eligible | ||
| compact[leaf::AnySSAValue][:flag] |= IR_FLAG_EFFECT_FREE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we know this from interprocedural analysis?
Why doesn't this get inlined into |
9754b6d to
deb87b9
Compare
This call is from Line 618 in deb87b9
and the redirected constructor call is union split to:
And the latter contains dynamic dispatch since @nanosoldier |
Can we use the regular error check pattern to switch that around? I.e. And then throw an |
Well, the problematic dynamic dispatch is not that error path but Lines 411 to 420 in deb87b9
and the latter split is confused for abstract tuple input. |
|
Ah, ok, so the issue is that we have: NamedTuple{names}(args::Tuple) where {names} = NamedTuple{names,typeof(args)}(args)but inference looses the type constraint that the second type parameter is typeequal to diff --git a/base/boot.jl b/base/boot.jl
index 5f3b99df1c..4e02725fc3 100644
--- a/base/boot.jl
+++ b/base/boot.jl
@@ -615,7 +615,8 @@ end
NamedTuple() = NamedTuple{(),Tuple{}}(())
-NamedTuple{names}(args::Tuple) where {names} = NamedTuple{names,typeof(args)}(args)
+eval(Core, :(NamedTuple{names}(args::Tuple) =
+ $(Expr(:splatnew, :(NamedTuple{names,typeof(args)}), :args))))
using .Intrinsics: sle_int, add_intThat should also save us some inference time by not having to infer through the useless unionsplit. |
|
That sounds quite simple and better! I will work on implementing SROA for |
|
Your benchmark job has completed - no performance regressions were detected. A full report can be found here. |
deb87b9 to
44d92cd
Compare
|
Okay, this PR should be ready. |
|
@nanosoldier |
a86b36b to
d55f00c
Compare
|
Your benchmark job has completed - no performance regressions were detected. A full report can be found here. |
d55f00c to
5dd7a0b
Compare
|
The benchmark results look promising. Going to merge once confirm successful CI. |
|
@nanosoldier |
This commit tries to fix and improve performance for calling keyword
funcs whose arguments types are not fully known but `@nospecialize`-d.
The final result would look like (this particular example is taken from
our Julia-level compiler implementation):
```julia
abstract type CallInfo end
struct NoCallInfo <: CallInfo end
struct NewInstruction
stmt::Any
type::Any
info::CallInfo
line::Union{Int32,Nothing} # if nothing, copy the line from previous statement in the insertion location
flag::Union{UInt8,Nothing} # if nothing, IR flags will be recomputed on insertion
function NewInstruction(@nospecialize(stmt), @nospecialize(type), @nospecialize(info::CallInfo),
line::Union{Int32,Nothing}, flag::Union{UInt8,Nothing})
return new(stmt, type, info, line, flag)
end
end
@nospecialize
function NewInstruction(newinst::NewInstruction;
stmt=newinst.stmt,
type=newinst.type,
info::CallInfo=newinst.info,
line::Union{Int32,Nothing}=newinst.line,
flag::Union{UInt8,Nothing}=newinst.flag)
return NewInstruction(stmt, type, info, line, flag)
end
@Specialize
using BenchmarkTools
struct VirtualKwargs
stmt::Any
type::Any
info::CallInfo
end
vkws = VirtualKwargs(nothing, Any, NoCallInfo())
newinst = NewInstruction(nothing, Any, NoCallInfo(), nothing, nothing)
runner(newinst, vkws) = NewInstruction(newinst; vkws.stmt, vkws.type, vkws.info)
@benchmark runner($newinst, $vkws)
```
> on master
```
BenchmarkTools.Trial: 10000 samples with 186 evaluations.
Range (min … max): 559.898 ns … 4.173 μs ┊ GC (min … max): 0.00% … 85.29%
Time (median): 605.608 ns ┊ GC (median): 0.00%
Time (mean ± σ): 638.170 ns ± 125.080 ns ┊ GC (mean ± σ): 0.06% ± 0.85%
█▇▂▆▄ ▁█▇▄▂ ▂
██████▅██████▇▇▇██████▇▇▇▆▆▅▄▅▄▂▄▄▅▇▆▆▆▆▆▅▆▆▄▄▅▅▄▃▄▄▄▅▃▅▅▆▅▆▆ █
560 ns Histogram: log(frequency) by time 1.23 μs <
Memory estimate: 32 bytes, allocs estimate: 2.
```
> on this commit
```julia
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min … max): 3.080 ns … 83.177 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 3.098 ns ┊ GC (median): 0.00%
Time (mean ± σ): 3.118 ns ± 0.885 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▂▅▇█▆▅▄▂
▂▄▆▆▇████████▆▃▃▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▁▁▂▂▂▁▂▂▂▂▂▂▁▁▂▁▂▂▂▂▂▂▂▂▂ ▃
3.08 ns Histogram: frequency by time 3.19 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
```
So for this particular case it achieves roughly 200x speed up.
This is because this commit allows inlining of a call to keyword sorter
as well as removal of `NamedTuple` call.
Especially this commit is composed of the following improvements:
- Add early return case for `structdiff`:
This change improves the return type inference for a case when
compared `NamedTuple`s are type unstable but there is no difference
in their names, e.g. given two `NamedTuple{(:a,:b),T} where T<:Tuple{Any,Any}`s.
And in such case the optimizer will remove `structdiff` and succeeding
`pairs` calls, letting the keyword sorter to be inlined.
- Tweak the core `NamedTuple{names}(args::Tuple)` constructor so that it
directly forms `:splatnew` allocation rather than redirects to the
general `NamedTuple` constructor, that could be confused for abstract
input tuple type.
- Improve `nfields_tfunc` accuracy as for abstract `NamedTuple` types.
This improvement lets `inline_splatnew` to handle more abstract
`NamedTuple`s, especially whose names are fully known but its fields
tuple type is abstract.
Those improvements are combined to allow our SROA pass to optimize away
`NamedTuple` and `tuple` calls generated for keyword argument handling.
E.g. the IR for the example `NewInstruction` constructor is now fairly
optimized, like:
```julia
julia> Base.code_ircode((NewInstruction,Any,Any,CallInfo)) do newinst, stmt, type, info
NewInstruction(newinst; stmt, type, info)
end |> only
2 1 ── %1 = Base.getfield(_2, :line)::Union{Nothing, Int32} │╻╷ Type##kw
│ %2 = Base.getfield(_2, :flag)::Union{Nothing, UInt8} ││┃ getproperty
│ %3 = (isa)(%1, Nothing)::Bool ││
│ %4 = (isa)(%2, Nothing)::Bool ││
│ %5 = (Core.Intrinsics.and_int)(%3, %4)::Bool ││
└─── goto #3 if not %5 ││
2 ── %7 = %new(Main.NewInstruction, _3, _4, _5, nothing, nothing)::NewInstruction NewInstruction
└─── goto #10 ││
3 ── %9 = (isa)(%1, Int32)::Bool ││
│ %10 = (isa)(%2, Nothing)::Bool ││
│ %11 = (Core.Intrinsics.and_int)(%9, %10)::Bool ││
└─── goto #5 if not %11 ││
4 ── %13 = π (%1, Int32) ││
│ %14 = %new(Main.NewInstruction, _3, _4, _5, %13, nothing)::NewInstruction│││╻ NewInstruction
└─── goto #10 ││
5 ── %16 = (isa)(%1, Nothing)::Bool ││
│ %17 = (isa)(%2, UInt8)::Bool ││
│ %18 = (Core.Intrinsics.and_int)(%16, %17)::Bool ││
└─── goto #7 if not %18 ││
6 ── %20 = π (%2, UInt8) ││
│ %21 = %new(Main.NewInstruction, _3, _4, _5, nothing, %20)::NewInstruction│││╻ NewInstruction
└─── goto #10 ││
7 ── %23 = (isa)(%1, Int32)::Bool ││
│ %24 = (isa)(%2, UInt8)::Bool ││
│ %25 = (Core.Intrinsics.and_int)(%23, %24)::Bool ││
└─── goto #9 if not %25 ││
8 ── %27 = π (%1, Int32) ││
│ %28 = π (%2, UInt8) ││
│ %29 = %new(Main.NewInstruction, _3, _4, _5, %27, %28)::NewInstruction │││╻ NewInstruction
└─── goto #10 ││
9 ── Core.throw(ErrorException("fatal error in type inference (type bound)"))::Union{}
└─── unreachable ││
10 ┄ %33 = φ (#2 => %7, #4 => %14, #6 => %21, #8 => %29)::NewInstruction ││
└─── goto #11 ││
11 ─ return %33 │
=> NewInstruction
```
5dd7a0b to
b8a6b10
Compare
|
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
| # if the first/second parameter of `NamedTuple` is known to be empty, | ||
| # the second/first argument should also be empty tuple type, | ||
| # so refine it here | ||
| return Const(NamedTuple{(),Tuple{}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reasoning seems faulty, since couldn't the parameter also be a TypeVar of any sort?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're quite right:
julia> (()->NamedTuple{(), <:Any})()
NamedTuple{(), Tuple{}}
This commit tries to fix and improve performance for calling keyword
funcs whose arguments types are not fully known but
@nospecialize-d.The final result would look like (this particular example is taken from
our Julia-level compiler implementation):
So for this particular case it achieves roughly 200x speed up.
This is because this commit allows inlining of a call to keyword sorter
as well as removal of
NamedTuplecall.Especially this commit is composed of the following improvements:
structdiff:This change improves the return type inference for a case when
compared
NamedTuples are type unstable but there is no differencein their names, e.g. given two
NamedTuple{(:a,:b),T} where T<:Tuple{Any,Any}s.And in such case the optimizer will remove
structdiffand succeedingpairscalls, letting the keyword sorter to be inlined.NamedTuple{names}(args::Tuple)constructor so that itdirectly forms
:splatnewallocation rather than redirects to thegeneral
NamedTupleconstructor, that could be confused for abstractinput tuple type.
nfields_tfuncaccuracy as for abstractNamedTupletypes.This improvement lets
inline_splatnewto handle more abstractNamedTuples, especially whose names are fully known but its fieldstuple type is abstract.
Those improvements are combined to allow our SROA pass to optimize away
NamedTupleandtuplecalls generated for keyword argument handling.E.g. the IR for the example
NewInstructionconstructor is now fairlyoptimized, like: