Skip to content

Conversation

@aviatesk
Copy link
Owner

@aviatesk aviatesk commented Apr 5, 2022

No description provided.

aviatesk added a commit that referenced this pull request Jul 29, 2022
…Lang#45790)

Currently the `@nospecialize`-d `push!(::Vector{Any}, ...)` can only
take a single item and we will end up with runtime dispatch when we try
to call it with multiple items:
```julia
julia> code_typed(push!, (Vector{Any}, Any))
1-element Vector{Any}:
 CodeInfo(
1 ─      $(Expr(:foreigncall, :(:jl_array_grow_end), Nothing, svec(Any, UInt64), 0, :(:ccall), Core.Argument(2), 0x0000000000000001, 0x0000000000000001))::Nothing
│   %2 = Base.arraylen(a)::Int64
│        Base.arrayset(true, a, item, %2)::Vector{Any}
└──      return a
) => Vector{Any}

julia> code_typed(push!, (Vector{Any}, Any, Any))
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = Base.append!(a, iter)::Vector{Any}
└──      return %1
) => Vector{Any}
```

This commit adds a new specialization that it can take arbitrary-length
items. Our compiler should still be able to optimize the single-input 
case as before via the dispatch mechanism.
```julia
julia> code_typed(push!, (Vector{Any}, Any))
1-element Vector{Any}:
 CodeInfo(
1 ─      $(Expr(:foreigncall, :(:jl_array_grow_end), Nothing, svec(Any, UInt64), 0, :(:ccall), Core.Argument(2), 0x0000000000000001, 0x0000000000000001))::Nothing
│   %2 = Base.arraylen(a)::Int64
│        Base.arrayset(true, a, item, %2)::Vector{Any}
└──      return a
) => Vector{Any}

julia> code_typed(push!, (Vector{Any}, Any, Any))
1-element Vector{Any}:
 CodeInfo(
1 ─ %1  = Base.arraylen(a)::Int64
│         $(Expr(:foreigncall, :(:jl_array_grow_end), Nothing, svec(Any, UInt64), 0, :(:ccall), Core.Argument(2), 0x0000000000000002, 0x0000000000000002))::Nothing
└──       goto JuliaLang#7 if not true
2 ┄ %4  = φ (#1 => 1, JuliaLang#6 => %14)::Int64
│   %5  = φ (#1 => 1, JuliaLang#6 => %15)::Int64
│   %6  = Base.getfield(x, %4, true)::Any
│   %7  = Base.add_int(%1, %4)::Int64
│         Base.arrayset(true, a, %6, %7)::Vector{Any}
│   %9  = (%5 === 2)::Bool
└──       goto JuliaLang#4 if not %9
3 ─       goto JuliaLang#5
4 ─ %12 = Base.add_int(%5, 1)::Int64
└──       goto JuliaLang#5
5 ┄ %14 = φ (JuliaLang#4 => %12)::Int64
│   %15 = φ (JuliaLang#4 => %12)::Int64
│   %16 = φ (JuliaLang#3 => true, JuliaLang#4 => false)::Bool
│   %17 = Base.not_int(%16)::Bool
└──       goto JuliaLang#7 if not %17
6 ─       goto JuliaLang#2
7 ┄       return a
) => Vector{Any}
```

This commit also adds the equivalent implementations for `pushfirst!`.
aviatesk pushed a commit that referenced this pull request Jul 29, 2022
When calling `jl_error()` or `jl_errorf()`, we must check to see if we
are so early in the bringup process that it is dangerous to attempt to
construct a backtrace because the data structures used to provide line
information are not properly setup.

This can be easily triggered by running:

```
julia -C invalid
```

On an `i686-linux-gnu` build, this will hit the "Invalid CPU Name"
branch in `jitlayers.cpp`, which calls `jl_errorf()`.  This in turn
calls `jl_throw()`, which will eventually call `jl_DI_for_fptr` as part
of the backtrace printing process, which fails as the object maps are
not fully initialized.  See the below `gdb` stacktrace for details:

```
$ gdb -batch -ex 'r' -ex 'bt' --args ./julia -C invalid
...
fatal: error thrown and no exception handler available.
ErrorException("Invalid CPU name "invalid".")

Thread 1 "julia" received signal SIGSEGV, Segmentation fault.
0xf75bd665 in std::_Rb_tree<unsigned int, std::pair<unsigned int const, JITDebugInfoRegistry::ObjectInfo>, std::_Select1st<std::pair<unsigned int const, JITDebugInfoRegistry::ObjectInfo> >, std::greater<unsigned int>, std::allocator<std::pair<unsigned int const, JITDebugInfoRegistry::ObjectInfo> > >::lower_bound (__k=<optimized out>, this=0x248) at /usr/local/i686-linux-gnu/include/c++/9.1.0/bits/stl_tree.h:1277
1277    /usr/local/i686-linux-gnu/include/c++/9.1.0/bits/stl_tree.h: No such file or directory.
 #0  0xf75bd665 in std::_Rb_tree<unsigned int, std::pair<unsigned int const, JITDebugInfoRegistry::ObjectInfo>, std::_Select1st<std::pair<unsigned int const, JITDebugInfoRegistry::ObjectInfo> >, std::greater<unsigned int>, std::allocator<std::pair<unsigned int const, JITDebugInfoRegistry::ObjectInfo> > >::lower_bound (__k=<optimized out>, this=0x248) at /usr/local/i686-linux-gnu/include/c++/9.1.0/bits/stl_tree.h:1277
 #1  std::map<unsigned int, JITDebugInfoRegistry::ObjectInfo, std::greater<unsigned int>, std::allocator<std::pair<unsigned int const, JITDebugInfoRegistry::ObjectInfo> > >::lower_bound (__x=<optimized out>, this=0x248) at /usr/local/i686-linux-gnu/include/c++/9.1.0/bits/stl_map.h:1258
 JuliaLang#2  jl_DI_for_fptr (fptr=4155049385, symsize=symsize@entry=0xffffcfa8, slide=slide@entry=0xffffcfa0, Section=Section@entry=0xffffcfb8, context=context@entry=0xffffcf94) at /cache/build/default-amdci5-4/julialang/julia-master/src/debuginfo.cpp:1181
 JuliaLang#3  0xf75c056a in jl_getFunctionInfo_impl (frames_out=0xffffd03c, pointer=4155049385, skipC=0, noInline=0) at /cache/build/default-amdci5-4/julialang/julia-master/src/debuginfo.cpp:1210
 JuliaLang#4  0xf7a6ca98 in jl_print_native_codeloc (ip=4155049385) at /cache/build/default-amdci5-4/julialang/julia-master/src/stackwalk.c:636
 JuliaLang#5  0xf7a6cd54 in jl_print_bt_entry_codeloc (bt_entry=0xf0798018) at /cache/build/default-amdci5-4/julialang/julia-master/src/stackwalk.c:657
 JuliaLang#6  jlbacktrace () at /cache/build/default-amdci5-4/julialang/julia-master/src/stackwalk.c:1090
 JuliaLang#7  0xf7a3cd2b in ijl_no_exc_handler (e=0xf0794010) at /cache/build/default-amdci5-4/julialang/julia-master/src/task.c:605
 JuliaLang#8  0xf7a3d10a in throw_internal (ct=ct@entry=0xf070c010, exception=<optimized out>, exception@entry=0xf0794010) at /cache/build/default-amdci5-4/julialang/julia-master/src/task.c:638
 JuliaLang#9  0xf7a3d330 in ijl_throw (e=0xf0794010) at /cache/build/default-amdci5-4/julialang/julia-master/src/task.c:654
 JuliaLang#10 0xf7a905aa in ijl_errorf (fmt=fmt@entry=0xf7647cd4 "Invalid CPU name \"%s\".") at /cache/build/default-amdci5-4/julialang/julia-master/src/rtutils.c:77
 JuliaLang#11 0xf75a4b22 in (anonymous namespace)::createTargetMachine () at /cache/build/default-amdci5-4/julialang/julia-master/src/jitlayers.cpp:823
 JuliaLang#12 JuliaOJIT::JuliaOJIT (this=<optimized out>) at /cache/build/default-amdci5-4/julialang/julia-master/src/jitlayers.cpp:1044
 JuliaLang#13 0xf7531793 in jl_init_llvm () at /cache/build/default-amdci5-4/julialang/julia-master/src/codegen.cpp:8585
 JuliaLang#14 0xf75318a8 in jl_init_codegen_impl () at /cache/build/default-amdci5-4/julialang/julia-master/src/codegen.cpp:8648
 JuliaLang#15 0xf7a51a52 in jl_restore_system_image_from_stream (f=<optimized out>) at /cache/build/default-amdci5-4/julialang/julia-master/src/staticdata.c:2131
 JuliaLang#16 0xf7a55c03 in ijl_restore_system_image_data (buf=0xe859c1c0 <jl_system_image_data> "8'\031\003", len=125161105) at /cache/build/default-amdci5-4/julialang/julia-master/src/staticdata.c:2184
 JuliaLang#17 0xf7a55cf9 in jl_load_sysimg_so () at /cache/build/default-amdci5-4/julialang/julia-master/src/staticdata.c:424
 JuliaLang#18 ijl_restore_system_image (fname=0x80a0900 "/build/bk_download/julia-d78fdad601/lib/julia/sys.so") at /cache/build/default-amdci5-4/julialang/julia-master/src/staticdata.c:2157
 JuliaLang#19 0xf7a3bdfc in _finish_julia_init (rel=rel@entry=JL_IMAGE_JULIA_HOME, ct=<optimized out>, ptls=<optimized out>) at /cache/build/default-amdci5-4/julialang/julia-master/src/init.c:741
 JuliaLang#20 0xf7a3c8ac in julia_init (rel=<optimized out>) at /cache/build/default-amdci5-4/julialang/julia-master/src/init.c:728
 JuliaLang#21 0xf7a7f61d in jl_repl_entrypoint (argc=<optimized out>, argv=0xffffddf4) at /cache/build/default-amdci5-4/julialang/julia-master/src/jlapi.c:705
 JuliaLang#22 0x080490a7 in main (argc=3, argv=0xffffddf4) at /cache/build/default-amdci5-4/julialang/julia-master/cli/loader_exe.c:59
```

To prevent this, we simply avoid calling `jl_errorf` this early in the
process, punting the problem to a later PR that can update guard
conditions within `jl_error*`.
aviatesk pushed a commit that referenced this pull request Oct 18, 2023
This makes it easier to correlate LLVM IR with the originating source
code by including both argument name and argument type in the LLVM
argument variable.

<details>
<summary>Example 1</summary>

```julia
julia> function f(a, b, c, d, g...)
           e = a + b + c + d
           f = does_not_exist(e) + e
           f
       end
f (generic function with 1 method)

julia> @code_llvm f(0,0,0,0,0)
```
```llvm
;  @ REPL[1]:1 within `f`
define nonnull {}* @julia_f_141(i64 signext %"a::Int64", i64 signext %"b::Int64", i64 signext %"c::Int64", i64 signext %"d::Int64", i64 signext %"g[0]::Int64") #0 {
top:
  %0 = alloca [2 x {}*], align 8
  %gcframe3 = alloca [4 x {}*], align 16
  %gcframe3.sub = getelementptr inbounds [4 x {}*], [4 x {}*]* %gcframe3, i64 0, i64 0
  %1 = bitcast [4 x {}*]* %gcframe3 to i8*
  call void @llvm.memset.p0i8.i64(i8* align 16 %1, i8 0, i64 32, i1 true)
  %thread_ptr = call i8* asm "movq %fs:0, $0", "=r"() JuliaLang#7
  %tls_ppgcstack = getelementptr i8, i8* %thread_ptr, i64 -8
  %2 = bitcast i8* %tls_ppgcstack to {}****
  %tls_pgcstack = load {}***, {}**** %2, align 8
;  @ REPL[1]:3 within `f`
  %3 = bitcast [4 x {}*]* %gcframe3 to i64*
  store i64 8, i64* %3, align 16
  %4 = getelementptr inbounds [4 x {}*], [4 x {}*]* %gcframe3, i64 0, i64 1
  %5 = bitcast {}** %4 to {}***
  %6 = load {}**, {}*** %tls_pgcstack, align 8
  store {}** %6, {}*** %5, align 8
  %7 = bitcast {}*** %tls_pgcstack to {}***
  store {}** %gcframe3.sub, {}*** %7, align 8
  %Main.does_not_exist.cached = load atomic {}*, {}** @0 unordered, align 8
  %iscached.not = icmp eq {}* %Main.does_not_exist.cached, null
  br i1 %iscached.not, label %notfound, label %found

notfound:                                         ; preds = %top
  %Main.does_not_exist.found = call {}* @ijl_get_binding_or_error({}* nonnull inttoptr (i64 139831437630272 to {}*), {}* nonnull inttoptr (i64 139831600565400 to {}*))
  store atomic {}* %Main.does_not_exist.found, {}** @0 release, align 8
  br label %found

found:                                            ; preds = %notfound, %top
  %Main.does_not_exist = phi {}* [ %Main.does_not_exist.cached, %top ], [ %Main.does_not_exist.found, %notfound ]
  %8 = bitcast {}* %Main.does_not_exist to {}**
  %does_not_exist.checked = load atomic {}*, {}** %8 unordered, align 8
  %.not = icmp eq {}* %does_not_exist.checked, null
  br i1 %.not, label %err, label %ok

err:                                              ; preds = %found
  call void @ijl_undefined_var_error({}* inttoptr (i64 139831600565400 to {}*))
  unreachable

ok:                                               ; preds = %found
  %.sub = getelementptr inbounds [2 x {}*], [2 x {}*]* %0, i64 0, i64 0
;  @ REPL[1]:2 within `f`
; ┌ @ operators.jl:587 within `+` @ int.jl:87
   %9 = add i64 %"b::Int64", %"a::Int64"
   %10 = add i64 %9, %"c::Int64"
; │ @ operators.jl:587 within `+`
; │┌ @ operators.jl:544 within `afoldl`
; ││┌ @ int.jl:87 within `+`
     %11 = add i64 %10, %"d::Int64"
     %12 = getelementptr inbounds [4 x {}*], [4 x {}*]* %gcframe3, i64 0, i64 3
     store {}* %does_not_exist.checked, {}** %12, align 8
; └└└
;  @ REPL[1]:3 within `f`
  %13 = call nonnull {}* @ijl_box_int64(i64 signext %11)
  %14 = getelementptr inbounds [4 x {}*], [4 x {}*]* %gcframe3, i64 0, i64 2
  store {}* %13, {}** %14, align 16
  store {}* %13, {}** %.sub, align 8
  %15 = call nonnull {}* @ijl_apply_generic({}* nonnull %does_not_exist.checked, {}** nonnull %.sub, i32 1)
  store {}* %15, {}** %12, align 8
  %16 = call nonnull {}* @ijl_box_int64(i64 signext %11)
  store {}* %16, {}** %14, align 16
  store {}* %15, {}** %.sub, align 8
  %17 = getelementptr inbounds [2 x {}*], [2 x {}*]* %0, i64 0, i64 1
  store {}* %16, {}** %17, align 8
  %18 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 139831370516384 to {}*), {}** nonnull %.sub, i32 2)
  %19 = load {}*, {}** %4, align 8
  %20 = bitcast {}*** %tls_pgcstack to {}**
  store {}* %19, {}** %20, align 8
;  @ REPL[1]:4 within `f`
  ret {}* %18
}
```
</details>

<details>
<summary>Example 2</summary>

```julia
julia> function g(a, b, c, d; kwarg=0)
           a + b + c + d + kwarg
       end
g (generic function with 1 method)

julia> @code_llvm g(0,0,0,0,kwarg=0)
```
```llvm
;  @ REPL[3]:1 within `g`
define i64 @julia_g_160([1 x i64]* nocapture noundef nonnull readonly align 8 dereferenceable(8) %"#1::NamedTuple", i64 signext %"a::Int64", i64 signext %"b::Int64", i64 signext %"c::Int64", i64 signext %"d::Int64") #0 {
top:
  %0 = getelementptr inbounds [1 x i64], [1 x i64]* %"#1::NamedTuple", i64 0, i64 0
; ┌ @ REPL[3]:2 within `#g#1`
; │┌ @ operators.jl:587 within `+` @ int.jl:87
    %1 = add i64 %"b::Int64", %"a::Int64"
    %2 = add i64 %1, %"c::Int64"
; ││ @ operators.jl:587 within `+`
; ││┌ @ operators.jl:544 within `afoldl`
; │││┌ @ int.jl:87 within `+`
      %3 = add i64 %2, %"d::Int64"
; │││└
; │││ @ operators.jl:545 within `afoldl`
; │││┌ @ int.jl:87 within `+`
      %unbox = load i64, i64* %0, align 8
      %4 = add i64 %3, %unbox
; └└└└
  ret i64 %4
}
```
</details>
aviatesk pushed a commit that referenced this pull request Oct 18, 2023
…#51489)

This exposes the GC "stop the world" API to the user, for causing a
thread to quickly stop executing Julia code. This adds two APIs (that
will need to be exported and documented later):
```
julia> @CCall jl_safepoint_suspend_thread(#=tid=#1::Cint, #=magicnumber=JuliaLang#2::Cint)::Cint # roughly tkill(1, SIGSTOP)

julia> @CCall jl_safepoint_resume_thread(#=tid=#1::Cint)::Cint # roughly tkill(1, SIGCONT)
```

You can even suspend yourself, if there is another task to resume you 10
seconds later:
```
julia> ccall(:jl_enter_threaded_region, Cvoid, ())

julia> t = @task let; Libc.systemsleep(10); print("\nhello from $(Threads.threadid())\n"); @CCall jl_safepoint_resume_thread(0::Cint)::Cint; end; ccall(:jl_set_task_tid, Cint, (Any, Cint), t, 1); schedule(t);

julia> @time @CCall jl_safepoint_suspend_thread(0::Cint, 2::Cint)::Cint

hello from 2
  10 seconds (6 allocations: 264 bytes)
1
```

The meaning of the magic number is actually the kind of stop that you
want:
```
// n.b. suspended threads may still run in the GC or GC safe regions
// but shouldn't be observable, depending on which enum the user picks (only 1 and 2 are typically recommended here)
// waitstate = 0 : do not wait for suspend to finish
// waitstate = 1 : wait for gc_state != 0 (JL_GC_STATE_WAITING or JL_GC_STATE_SAFE)
// waitstate = 2 : wait for gc_state != 0 (JL_GC_STATE_WAITING or JL_GC_STATE_SAFE) and that GC is not running on that thread
// waitstate = 3 : wait for full suspend (gc_state == JL_GC_STATE_WAITING) -- this may never happen if thread is sleeping currently
// if another thread comes along and calls jl_safepoint_resume, we also return early
// return new suspend count on success, 0 on failure
```
Only magic number 2 is currently meaningful to the user though. The
difference between waitstate 1 and 2 is only relevant in C code which is
calling this from JL_GC_STATE_SAFE, since otherwise it is a priori known
that GC isn't running, else we too would be running the GC. But the
distinction of those states might be useful if we have a concurrent
collector.

Very important warning: if the stopped thread is holding any locks
(e.g. for codegen or types) that you then attempt to acquire, your
thread will deadlock. This is very likely, unless you are very careful.
A future update to this API may try to change the waitstate to give the
option to wait for the thread to release internal or known locks.
aviatesk pushed a commit that referenced this pull request Jul 22, 2024
…ang#53631)

This PR validates the input parameters to the Julia LAPACK wrappers, so
that the error messages are more informative.
On nightly
```julia
julia> using LinearAlgebra

julia> LAPACK.geev!('X', 'X', rand(2,2))
 ** On entry to DGEEV  parameter number  1 had an illegal value
ERROR: ArgumentError: invalid argument #1 to LAPACK call
```
This PR
```julia
julia> using LinearAlgebra

julia> LAPACK.geev!('X', 'X', rand(2,2))
ERROR: ArgumentError: argument #1: jobvl must be one of ('N', 'V'), but 'X' was passed
```

Secondly, moved certain allocations (e.g. in `geevx`) below the
validation checks, so that these only happen for valid parameter values.

Thirdly, added `require_one_based_indexing` checks to functions where
these were missing.
aviatesk pushed a commit that referenced this pull request Jul 22, 2024
This is an alternative to JuliaLang#53642

The `dom_edges()` for an exit block in the CFG are empty when computing
the PostDomTree so the loop below this may not actually run. In that
case, the right semidominator is the ancestor from the DFSTree, which is
the "virtual" -1 block.

This resolves half of the issue in
JuliaLang#53613:
```julia
julia> let code = Any[
               # block 1
               GotoIfNot(Argument(2), 3),
               # block 2
               ReturnNode(Argument(3)),
               # block 3 (we should visit this block)
               Expr(:call, throw, "potential throw"),
               ReturnNode(), # unreachable
           ]
           ir = make_ircode(code; slottypes=Any[Any,Bool,Bool])
           visited = BitSet()
           @test !Core.Compiler.visit_conditional_successors(CC.LazyPostDomtree(ir), ir, #=bb=#1) do succ::Int
               push!(visited, succ)
               return false
           end
           @test 2 ∈ visited
           @test 3 ∈ visited
       end
Test Passed
```

This needs some tests (esp. since I don't think we have any DomTree
tests at all right now), but otherwise should be good to go.
aviatesk added a commit that referenced this pull request Jul 22, 2024
…iaLang#53642)

This commit fixes the first problem that was found while digging into
JuliaLang#53613. It turns out that the post-domtree constructed
from regular `IRCode` doesn't work for visiting conditional successors
for post-opt analysis in cases like:
```julia
julia> let code = Any[
               # block 1
               GotoIfNot(Argument(2), 3),
               # block 2
               ReturnNode(Argument(3)),
               # block 3 (we should visit this block)
               Expr(:call, throw, "potential throw"),
               ReturnNode(), # unreachable
           ]
           ir = make_ircode(code; slottypes=Any[Any,Bool,Bool])
           visited = BitSet()
           @test !Core.Compiler.visit_conditional_successors(CC.LazyPostDomtree(ir), ir, #=bb=#1) do succ::Int
               push!(visited, succ)
               return false
           end
           @test 2 ∉ visited
           @test 3 ∈ visited
       end
Test Failed at REPL[14]:16
  Expression: 2 ∉ visited
   Evaluated: 2 ∉ BitSet([2])
```

This might mean that we need to fix on the `postdominates` end, but for
now, this commit tries to get around it by using the augmented post
domtree in `visit_conditional_successors`. Since the augmented post
domtree is enforced to have a single return, we can keep using the
current `postdominates` to fix the issue.

However, this commit isn't enough to fix the NeuralNetworkReachability
segfault as reported in JuliaLang#53613, and we need to tackle the second issue
reported there too
(JuliaLang#53613 (comment)).
aviatesk added a commit that referenced this pull request Dec 9, 2024
…aLang#55600)

As an application of JuliaLang#55545, this commit avoids the
insertion of `:throw_undef_if_not` nodes when the defined-ness of a slot
is guaranteed by abstract interpretation.

```julia
julia> function isdefined_nothrow(c, x)
           local val
           if c
               val = x
           end
           if @isdefined val
               return val
           end
           return zero(Int)
       end;

julia> @code_typed isdefined_nothrow(true, 42)
```
```diff
diff --git a/old b/new
index c4980a5c9c..3d1d6d30f0 100644
--- a/old
+++ b/new
@@ -4,7 +4,6 @@ CodeInfo(
 3 ┄ %3 = φ (JuliaLang#2 => x, #1 => #undef)::Int64
 │   %4 = φ (JuliaLang#2 => true, #1 => false)::Bool
 └──      goto JuliaLang#5 if not %4
-4 ─      $(Expr(:throw_undef_if_not, :val, :(%4)))::Any
-└──      return %3
+4 ─      return %3
 5 ─      return 0
 ) => Int64
```
aviatesk pushed a commit that referenced this pull request Dec 9, 2024
Prior to this, especially on macOS, the gc-safepoint here would cause
the process to segfault as we had already freed the current_task state.
Rearrange this code so that the GC interactions (except for the atomic
store to current_task) are all handled before entering GC safe, and then
signaling the thread is deleted (via setting current_task = NULL,
published by jl_unlock_profile_wr to other threads) is last.

```
ERROR: Exception handler triggered on unmanaged thread.
Process 53827 stopped
* thread JuliaLang#5, stop reason = EXC_BAD_ACCESS (code=2, address=0x100018008)
    frame #0: 0x0000000100b74344 libjulia-internal.1.12.0.dylib`jl_delete_thread [inlined] jl_gc_state_set(ptls=0x000000011f8b3200, state='\x02', old_state=<unavailable>) at julia_threads.h:272:9 [opt]
   269 	    assert(old_state != JL_GC_CONCURRENT_COLLECTOR_THREAD);
   270 	    jl_atomic_store_release(&ptls->gc_state, state);
   271 	    if (state == JL_GC_STATE_UNSAFE || old_state == JL_GC_STATE_UNSAFE)
-> 272 	        jl_gc_safepoint_(ptls);
   273 	    return old_state;
   274 	}
   275 	STATIC_INLINE int8_t jl_gc_state_save_and_set(jl_ptls_t ptls,
Target 0: (julia) stopped.
(lldb) up
frame #1: 0x0000000100b74320 libjulia-internal.1.12.0.dylib`jl_delete_thread [inlined] jl_gc_state_save_and_set(ptls=0x000000011f8b3200, state='\x02') at julia_threads.h:278:12 [opt]
   275 	STATIC_INLINE int8_t jl_gc_state_save_and_set(jl_ptls_t ptls,
   276 	                                              int8_t state)
   277 	{
-> 278 	    return jl_gc_state_set(ptls, state, jl_atomic_load_relaxed(&ptls->gc_state));
   279 	}
   280 	#ifdef __clang_gcanalyzer__
   281 	// these might not be a safepoint (if they are no-op safe=>safe transitions), but we have to assume it could be (statically)
(lldb)
frame JuliaLang#2: 0x0000000100b7431c libjulia-internal.1.12.0.dylib`jl_delete_thread(value=0x000000011f8b3200) at threading.c:537:11 [opt]
   534 	    ptls->root_task = NULL;
   535 	    jl_free_thread_gc_state(ptls);
   536 	    // then park in safe-region
-> 537 	    (void)jl_gc_safe_enter(ptls);
   538 	}
```

(test incorporated into JuliaLang#55793)
aviatesk pushed a commit that referenced this pull request Dec 9, 2024
Rebase and extension of @alexfanqi's initial work on porting Julia to
RISC-V. Requires LLVM 19.

Tested on a VisionFive2, built with:

```make
MARCH := rv64gc_zba_zbb
MCPU := sifive-u74

USE_BINARYBUILDER:=0

DEPS_GIT = llvm
override LLVM_VER=19.1.1
override LLVM_BRANCH=julia-release/19.x
override LLVM_SHA1=julia-release/19.x
```

```julia-repl
❯ ./julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.12.0-DEV.1374 (2024-10-14)
 _/ |\__'_|_|_|\__'_|  |  riscv/25092a3982* (fork: 1 commits, 0 days)
|__/                   |

julia> versioninfo(; verbose=true)
Julia Version 1.12.0-DEV.1374
Commit 25092a3* (2024-10-14 09:57 UTC)
Platform Info:
  OS: Linux (riscv64-unknown-linux-gnu)
  uname: Linux 6.11.3-1-riscv64 #1 SMP Debian 6.11.3-1 (2024-10-10) riscv64 unknown
  CPU: unknown:
              speed         user         nice          sys         idle          irq
       #1  1500 MHz        922 s          0 s        265 s     160953 s          0 s
       JuliaLang#2  1500 MHz        457 s          0 s        280 s     161521 s          0 s
       JuliaLang#3  1500 MHz        452 s          0 s        270 s     160911 s          0 s
       JuliaLang#4  1500 MHz        638 s         15 s        301 s     161340 s          0 s
  Memory: 7.760246276855469 GB (7474.08203125 MB free)
  Uptime: 16260.13 sec
  Load Avg:  0.25  0.23  0.1
  WORD_SIZE: 64
  LLVM: libLLVM-19.1.1 (ORCJIT, sifive-u74)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)
Environment:
  HOME = /home/tim
  PATH = /home/tim/.local/bin:/usr/local/bin:/usr/bin:/bin:/usr/games
  TERM = xterm-256color


julia> ccall(:jl_dump_host_cpu, Nothing, ())
CPU: sifive-u74
Features: +zbb,+d,+i,+f,+c,+a,+zba,+m,-zvbc,-zksed,-zvfhmin,-zbkc,-zkne,-zksh,-zfh,-zfhmin,-zknh,-v,-zihintpause,-zicboz,-zbs,-zvknha,-zvksed,-zfa,-ztso,-zbc,-zvknhb,-zihintntl,-zknd,-zvbb,-zbkx,-zkt,-zvkt,-zicond,-zvksh,-zvfh,-zvkg,-zvkb,-zbkb,-zvkned


julia> @code_native debuginfo=:none 1+2.
	.text
	.attribute	4, 16
	.attribute	5, "rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zmmul1p0_zba1p0_zbb1p0"
	.file	"+"
	.globl	"julia_+_3003"
	.p2align	1
	.type	"julia_+_3003",@function
"julia_+_3003":
	addi	sp, sp, -16
	sd	ra, 8(sp)
	sd	s0, 0(sp)
	addi	s0, sp, 16
	fcvt.d.l	fa5, a0
	ld	ra, 8(sp)
	ld	s0, 0(sp)
	fadd.d	fa0, fa5, fa0
	addi	sp, sp, 16
	ret
.Lfunc_end0:
	.size	"julia_+_3003", .Lfunc_end0-"julia_+_3003"

	.type	".L+Core.Float64#3005",@object
	.section	.data.rel.ro,"aw",@progbits
	.p2align	3, 0x0
".L+Core.Float64#3005":
	.quad	".L+Core.Float64#3005.jit"
	.size	".L+Core.Float64#3005", 8

.set ".L+Core.Float64#3005.jit", 272467692544
	.size	".L+Core.Float64#3005.jit", 8
	.section	".note.GNU-stack","",@progbits
```

Lots of bugs guaranteed, but with this we at least have a functional
build and REPL for further development by whoever is interested.

Also requires Linux 6.4+, since the fallback processor detection
used here relies on LLVM's `sys::getHostCPUFeatures`, which for
RISC-V is implemented using hwprobe introduced in 6.4. We could
probably add a fallback that parses `/proc/cpuinfo`, either by building
a CPU database much like how we've done for AArch64, or by parsing the
actual ISA string contained there. That would probably also be a good
place to add support for profiles, which are supposedly the way forward
to package RISC-V binaries. That can happen in follow-up PRs though.
For now, on older kernels, use the `-C` arg to Julia to specify an ISA.

Co-authored-by: Alex Fan <[email protected]>
aviatesk pushed a commit that referenced this pull request Dec 9, 2024
…uliaLang#56300)

The pipeline-prints test currently fails when running on an
aarch64-macos device:

```
/Users/tim/Julia/src/julia/test/llvmpasses/pipeline-prints.ll:309:23: error: AFTERVECTORIZATION: expected string not found in input
; AFTERVECTORIZATION: vector.body
                      ^
<stdin>:2:40: note: scanning from here
; *** IR Dump Before AfterVectorizationMarkerPass on julia_f_199 ***
                                       ^
<stdin>:47:27: note: possible intended match here
; *** IR Dump Before AfterVectorizationMarkerPass on jfptr_f_200 ***
                          ^

Input file: <stdin>
Check file: /Users/tim/Julia/src/julia/test/llvmpasses/pipeline-prints.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
             1: opt: WARNING: failed to create target machine for 'x86_64-unknown-linux-gnu': unable to get target for 'x86_64-unknown-linux-gnu', see --version and --triple.
             2: ; *** IR Dump Before AfterVectorizationMarkerPass on julia_f_199 ***
check:309'0                                            X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
             3: define i64 @julia_f_199(ptr addrspace(10) noundef nonnull align 16 dereferenceable(40) %0) #0 !dbg !4 {
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
             4: top:
check:309'0     ~~~~~
             5:  %1 = call ptr @julia.get_pgcstack()
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
             6:  %ptls_field = getelementptr inbounds ptr, ptr %1, i64 2
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
             7:  %ptls_load45 = load ptr, ptr %ptls_field, align 8, !tbaa !8
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
             .
             .
             .
            42:
check:309'0     ~
            43: L41: ; preds = %L41.loopexit, %L17, %top
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            44:  %value_phi10 = phi i64 [ 0, %top ], [ %7, %L17 ], [ %.lcssa, %L41.loopexit ]
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            45:  ret i64 %value_phi10, !dbg !52
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            46: }
check:309'0     ~~
            47: ; *** IR Dump Before AfterVectorizationMarkerPass on jfptr_f_200 ***
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
check:309'1                               ?                                           possible intended match
            48: ; Function Attrs: noinline optnone
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            49: define nonnull ptr addrspace(10) @jfptr_f_200(ptr addrspace(10) %0, ptr noalias nocapture noundef readonly %1, i32 %2) #1 {
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            50: top:
check:309'0     ~~~~~
            51:  %3 = call ptr @julia.get_pgcstack()
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            52:  %4 = getelementptr inbounds ptr addrspace(10), ptr %1, i32 0
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
             .
             .
             .
>>>>>>

--

********************
Failed Tests (1):
  Julia :: pipeline-prints.ll
```

The problem is that these tests assume x86_64, which fails because the
target isn't available, so it presumably uses the native target which
has different vectorization characteristics:

```
❯ ./usr/tools/opt --load-pass-plugin=libjulia-codegen.dylib -passes='julia' --print-before=AfterVectorization -o /dev/null ../../test/llvmpasses/pipeline-prints.ll
./usr/tools/opt: WARNING: failed to create target machine for 'x86_64-unknown-linux-gnu': unable to get target for 'x86_64-unknown-linux-gnu', see --version and --triple.
```

There's other tests that assume this (e.g. the `fma` cpufeatures one),
but they don't fail, so I've left them as is.
aviatesk pushed a commit that referenced this pull request Feb 28, 2025
At some point LLVM on MacOS started doing frame pointer optimization by
default. We should ask for a frame pointer on every function, on all
platforms.

Prior to this change, on `1.11.3+0.aarch64.apple.darwin14`:
```
julia> @code_native ((x,y) -> Core.Intrinsics.add_float(x,y))(1.0,2.0)
	.section	__TEXT,__text,regular,pure_instructions
	.build_version macos, 15, 0
	.globl	"_julia_#1_678"                 ; -- Begin function julia_#1_678
	.p2align	2
"_julia_#1_678":                        ; @"julia_#1_678"
; Function Signature: var"#1"(Float64, Float64)
; ┌ @ REPL[1]:1 within `#1`
; %bb.0:                                ; %top
; │ @ REPL[1] within `#1`
	;DEBUG_VALUE: #1:x <- $d0
	;DEBUG_VALUE: #1:x <- $d0
	;DEBUG_VALUE: #1:y <- $d1
	;DEBUG_VALUE: #1:y <- $d1
; │ @ REPL[1]:1 within `#1`
	fadd	d0, d0, d1
	ret
; └
                                        ; -- End function
	.section	__DATA,__const
	.p2align	3, 0x0                          ; @"+Core.Float64#680"
"l_+Core.Float64#680":
	.quad	"l_+Core.Float64#680.jit"

.set "l_+Core.Float64#680.jit", 5490712608
.subsections_via_symbols
```

Prior to this change, on `1.11.3+0.aarch64.linux.gnu`:
```
julia> @code_native ((x,y) -> Core.Intrinsics.add_float(x,y))(1.0,2.0)
	.text
	.file	"#1"
	.globl	"julia_#1_656"                  // -- Begin function julia_#1_656
	.p2align	2
	.type	"julia_#1_656",@function
"julia_#1_656":                         // @"julia_#1_656"
; Function Signature: var"#1"(Float64, Float64)
; ┌ @ REPL[1]:1 within `#1`
// %bb.0:                               // %top
; │ @ REPL[1] within `#1`
	//DEBUG_VALUE: #1:x <- $d0
	//DEBUG_VALUE: #1:x <- $d0
	//DEBUG_VALUE: #1:y <- $d1
	//DEBUG_VALUE: #1:y <- $d1
	stp	x29, x30, [sp, #-16]!           // 16-byte Folded Spill
	mov	x29, sp
; │ @ REPL[1]:1 within `#1`
	fadd	d0, d0, d1
	ldp	x29, x30, [sp], JuliaLang#16             // 16-byte Folded Reload
	ret
.Lfunc_end0:
	.size	"julia_#1_656", .Lfunc_end0-"julia_#1_656"
; └
                                        // -- End function
	.type	".L+Core.Float64#658",@object   // @"+Core.Float64#658"
	.section	.rodata,"a",@progbits
	.p2align	3, 0x0
".L+Core.Float64#658":
	.xword	".L+Core.Float64#658.jit"
	.size	".L+Core.Float64#658", 8

.set ".L+Core.Float64#658.jit", 278205186835760
	.size	".L+Core.Float64#658.jit", 8
	.section	".note.GNU-stack","",@progbits
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants