Skip to content

Conversation

@vtjnash
Copy link
Member

@vtjnash vtjnash commented Jan 22, 2024

Ensures that only one update can occur simultaneously by adding a lock around it, and that the acquire-release on the world-age will be sequenced after all of the invalidations to the caches, by updating it last.

It should now be safe to add methods in parallel, concurrently with running code. However, there are still no locks here to ensure that only one module is deserialized at a time, which means that parallel require calls are still unsafe. (and also because the loading.jl code is currently a thread-safety disaster zone)

As future work, all of the method inserting and (separately) the validation/insertion work could run in parallel after loading an image, since there is a fine-grained lock on each individual part of that, as well as the big lock so that only one module at a time can load methods.

The min/max fields are always written behind a lock, but they are not usually read from behind a lock, so add relaxed atomic markers to all uses of that field to make any data-race analysis tools happy. These are never used in a comparison with a value greater than jl_atomic_load_acquire(&jl_world_counter), and therefore the results achieved are consistent.

@vtjnash vtjnash added the multithreading Base.Threads and related functionality label Jan 22, 2024
@vtjnash vtjnash force-pushed the jn/lock-method-add branch from f5e0c2e to b060560 Compare January 22, 2024 02:31
@vtjnash vtjnash force-pushed the jn/lock-method-add branch from b060560 to 7cc03c5 Compare January 24, 2024 16:22
@vtjnash vtjnash force-pushed the jn/lock-method-add branch from 7cc03c5 to bbe0c93 Compare January 26, 2024 21:44
@vtjnash vtjnash added the merge me PR is reviewed. Merge when all tests are passing label Jan 26, 2024
Ensures that only one update can occur simultaneously, and that the
acquire-release on the world-age will be sequenced after all of the
invalidations to the caches.

It should now be safe to add methods in parallel, concurrently with
running code. However, there are still no locks here to ensure that only
one module is deserialized at a time, which means that is still unsafe.

As future work, all of the method inserting and (separately) the
invalidation work could run in parallel after loading an image, since
there is a fine-grained lock on each individual part of that, as well as
the big lock so that only one module at a time can load methods.

The min/max are always written behind a lock, but they are not usually
read from behind a lock, so add relaxed atomic markers to make any
analyzers happy.
@vtjnash vtjnash force-pushed the jn/lock-method-add branch from bbe0c93 to 98a1269 Compare January 27, 2024 00:23
@vtjnash vtjnash merged commit 6ce6d31 into master Jan 27, 2024
@vtjnash vtjnash deleted the jn/lock-method-add branch January 27, 2024 17:45
@inkydragon inkydragon removed the merge me PR is reviewed. Merge when all tests are passing label Jan 27, 2024
@maleadt
Copy link
Member

maleadt commented Jan 29, 2024

Daily PkgEval started failing on GeoParquet.jl and SubpixelRegistration.jl; I haven't bisected or reduced but this PR seems related:

ERROR: LoadError: The following 1 direct dependency failed to precompile:

GeoParquet [e99870d8-ce00-4fdd-aeee-e09192881159]

Failed to precompile GeoParquet [e99870d8-ce00-4fdd-aeee-e09192881159] to "/home/pkgeval/.julia/compiled/v1.11/GeoParquet/jl_XZQ5OP".
julia: /source/src/staticdata_utils.c:1215: jl_insert_backedges: Assertion `__extension__ ({ __auto_type __atomic_load_ptr = (&codeinst->max_world); __typeof__ (*__atomic_load_ptr) __atomic_load_tmp; __atomic_load (__atomic_load_ptr, &__atomic_load_tmp, (memory_order_relaxed)); __atomic_load_tmp; }) == WORLD_AGE_REVALIDATION_SENTINEL' failed.

[33] signal 6 (-6): Aborted
in expression starting at /home/pkgeval/.julia/packages/GeoParquet/vfcaG/src/GeoParquet.jl:3
unknown function (ip: 0x7fe388265d3c)
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7fe388201394)
__assert_fail at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
jl_insert_backedges at /source/src/staticdata_utils.c:1215 [inlined]
jl_restore_package_image_from_stream at /source/src/staticdata.c:3627
ijl_restore_incremental at /source/src/staticdata.c:3680
_include_from_serialized at ./loading.jl:1131
_include_from_serialized at ./loading.jl:1103 [inlined]
#_require_search_from_serialized#1037 at ./loading.jl:1690
_require_search_from_serialized at ./loading.jl:1645

https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_date/2024-01/27/GeoParquet.primary.log
https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_date/2024-01/27/SubpixelRegistration.primary.log

rr traces included

aviatesk added a commit to aviatesk/JET.jl that referenced this pull request Jan 29, 2024
aviatesk added a commit to aviatesk/JET.jl that referenced this pull request Jan 29, 2024
@vtjnash
Copy link
Member Author

vtjnash commented Jan 29, 2024

adding this assert catches the mistake even earlier:

diff --git a/src/staticdata_utils.c b/src/staticdata_utils.c
index 199724e54a..a2c213e03f 100644
--- a/src/staticdata_utils.c
+++ b/src/staticdata_utils.c
@@ -1162,6 +1162,7 @@ static void jl_insert_backedges(jl_array_t *edges, jl_array_t *ext_targets, jl_a
         jl_code_instance_t *ci = (jl_code_instance_t*)jl_array_ptr_ref(ext_ci_list, i);
         if (jl_atomic_load_relaxed(&ci->max_world) == WORLD_AGE_REVALIDATION_SENTINEL) {
             assert(jl_atomic_load_relaxed(&ci->min_world) == minworld);
+            assert(ptrhash_get(&cis_pending_validation, (void*)ci->def) == HT_NOTFOUND);
             ptrhash_put(&cis_pending_validation, (void*)ci->def, (void*)ci);
         }
         else {

Looks like there are 2, very close to being almost identical, CodeInstance objects for this MethodInstance, which is broken in this part of the code.

(rr) p jl_(ci)
0x7f8ec4834b70 = Core.CodeInstance(def=getindex(WeakRefStrings.StringArray{Union{Base.Missing, 𝒯}, 1} where 𝒯, Int64) from getindex(WeakRefStrings.StringArray{T, N} where N, Integer...) where {T}, next=#<null>, min_world=0x0000000000006639, max_world=0x0000000000000001, rettype=Any, exctype=Any, rettype_const=#<null>, inferred="\x13\x00\x00\x0f\x04\x03\x00\x00\x00\x00\x08\x08\x16+@\x00\xe1.\x1f!L8\xbd49i\x8d\x1d~\x89\xb9\x18\x00\x11\x00.\x1f L8\xbe\xbc\x09\x00\x1d\xa06\x03\x03\xcc.\x1f!}\x03\x02\xbc.\x1f!g\x1f!\x9d\x03\x05.\x1f!L\x03\x014w\x09\x01\x1d\xa0?\x09\x04B\x1f!L\x03\x07\xbc\x03\x08.\x1f!g\x1f!\x9d\x03\x09.\x1f!\x9c\x03\x06\x03\x0a6\x03\x0b\xc93\xcc-\x1f G\x03\x02\x09\x04C9\xa1T\x0c7\x94\xb9\x18\x00\x11\x00\x1f!\x1d\x8d\x03\x01\x03\x0e7\x08.\x1f!L\x03\x0149\xa1T\x0c7\x94\xb9\x18\x00\x11\x01\x09\x04B\x1f!\x1dc\x03\x11\x03\x02>\x09\x04B\x1f!\x1df\x03\x124\x1dY>3+\x15\x00\x1e9i\x8d\x1d~\x89\xb9\x18\x00\x11\x01.\x1f!N\x03\x13\x03\x156\x03\x16+\x1b\x00\x09\x01B\x1e\x1d\xe4-\x1e]\x03\x187\x08\x09\x01p\xbc.\x1e\x1d\xef\x1e9i\x8d\x1d~\x89\xb9\x18\x00\x11\x02\x03\x1b6\x03\x1c+!\x00.\x1eN\x03\x13\x1e9i\x8d\x1d~\x89\xb9\x18\x00\x11\x036\x03\x1e+!\x007\x1e\x1d\xe7\x09\x01p\xbc.\x1f!L8\xbd49i\x8d\x1d~\x89\xb9\x18\x00\x11\x05.\x1f!L\x03\"49\xa1T\x0c7\x94\xb9\x18\x00\x11\x01.\x1f!L\x03#49\xa1T\x0c7\x94\xb9\x18\x00\x11\x02.\x1f!g(\x00&\x03\$.\x1f g\x1f \x9d\x03%.\x1f!\x1dT\x03&\x03\x13.\x1f g(\x00&\x03'.\x1f!L8\xbd49i\x8d\x1d~\x89\xb9\x18\x00\x11\x06.\x1f L8\xbe\xbc\x09\x00\x1d\xa06\x03++9\x00.\x1f!}\x03*\xbc.\x1f!g\x1f!\x9d\x03-.\x1f!L\x03)4w\x09\x01\x1d\xa0?\x09\x04B\x1f!L\x03/\xbc\x030.\x1f!g\x1f!\x9d\x031.\x1f!\x9c\x03.\x0326\x033+6\x003+9\x00-\x1f G\x03*\x09\x04C9\xa1T\x0c7\x94\xb9\x18\x00\x11\x03\x1f!\x1d\x8d\x03)\x0367\x08.\x1f!L\x03)49\xa1T\x0c7\x94\xb9\x18\x00\x11\x01\x09\x04B\x1f!\x1dc\x039\x03*>\x09\x04B\x1f!\x1df\x03:4\x1dY>3+=\x00.\x1f \x9e\x1f |\x03;\x09\x03R9\xa1T\x0c7\x94\xb9\x18\x00\x11\x04\x03(\x03=.\x1eP\x03!\x03>7\x03?\x16+@\x00\xe1'\xf4\x00%\xd0A\x00%\xf49\xa1T\x0c7\x94\xb9\x18\x00\x11\x05\xd0\x00%\xf4\xd0AA9\xa1T\x0c7\x94\xb9\x18\x00\x11\x05\xd5\xd59\xa1T\x0c7\x94\xb9\x18\x00\x11\x069\xa1T\x0c7\x94\xb9\x18\x00\x11\x06\xf4A\xf4\xd0AA\xd5\xd59\xa1T\x0c7\x94\xb9\x18\x00\x11\x07\xd0A\xd0AA9\xa1T\x0c7\x94\xb9\x18\x00\x11\x07\xec9\xa1T\x0c7\x94\xb9\x18\x00\x11\x08\xe5(\x00&\xf4\xf4(\x00&\xed\x00%\xd0A\x00%\xf49\xa1T\x0c7\x94\xb9\x18\x00\x11\x05\xd0\x00%\xf4\xd0AA9\xa1T\x0c7\x94\xb9\x18\x00\x11\x05\xd5\xd59\xa1T\x0c7\x94\xb9\x18\x00\x11\x099\xa1T\x0c7\x94\xb9\x18\x00\x11\x09\xf3A\x00%9\xa1T\x0c7\x94\xb9\x18\x00\x11\x04AA\x16+@\x00\xed\xe0\x0a\x00\x00\xe0\x00\x00\x00\xc0\x02\x00\x00\xa0\x02\x00\x00\xe0\x02\x00\x00\xe0\x02\x00\x00\xc0\x0a\x00\x00\xc0\x02\x00\x00\xe0\x02\x00\x00\xe0\x02\x00\x00\xe0\x02\x00\x00\xa0\x02\x00\x00\xa0\x02\x00\x00\xe0\x02\x00\x00`\x02\x00\x00\xa0\x02\x00\x00\xc0\x0a\x00\x00\xe0\x08\x00\x00\xc0\x08\x00\x00\xa0\x02\x00\x00\xc0\x00\x00\x00\xe0\x02\x00\x00\xa0\x02\x00\x00(\x00\x00\x00h\x02\x00\x00\xa0\x02\x00\x00`\x02\x00\x00\xe0\x02\x00\x00\xa0\x02\x00\x00\xe0\x02\x00\x00\xa0\x02\x00\x00\xa0\x02\x00\x00`\x02\x00\x00\xe0\x0a\x00\x00\xc0\x0a\x00\x00\xe0\x0a\x00\x00\xe0\x02\x00\x00\xe0\x02\x00\x00\xe0\x02\x00\x00\xe0\x02\x00\x00\xe0\x0a\x00\x00\xe0\x00\x00\x00\xc0\x02\x00\x00\xa0\x02\x00\x00\xe0\x02\x00\x00\xe0\x02\x00\x00\xc0\x0a\x00\x00\xc0\x02\x00\x00\xe0\x02\x00\x00\xe0\x02\x00\x00\xe0\x02\x00\x00\xa0\x02\x00\x00\xa0\x02\x00\x00\xe0\x02\x00\x00`\x02\x00\x00\xa0\x02\x00\x00\xc0\x0a\x00\x00\xe0\x08\x00\x00\xc0\x08\x00\x00\xa0\x02\x00\x00\xe0\x02\x00\x00\xe0\x02\x00\x00\x00\x00\x00\x00\xa0\x02\x00\x00@\x16+1\x00'\x00//#O9i\x8d\x1d~\x89\xb9\x18\x00\x11\x07,\x04\x01\xa6/#O9i\x8d\x1d~\x89\xb9\x18\x00\x11\x07,\x05\x01\xa6/#O9i\x8d\x1d~\x89\xb9\x18\x00\x11\x07,\x06\x01\xa6/#O9i\x8d\x1d~\x89\xb9\x18\x00\x11\x07,\x09\x01\xa6/#O9i\x8d\x1d~\x89\xb9\x18\x00\x11\x07,\x0a\x01\xa6/#O9i\x8d\x1d~\x89\xb9\x18\x00\x11\x07,\x0d\x01\xa6/!K[,1\x00\xa7/!OV,o\x03\xa7/!OV,p\x03\xa7/!OV,q\x03\xa7/!OV,r\x03\xa7/!_V\xb1\xb0/!Oe,\x1f\x00\xb2/!Q^,l\x02\xa8/!K[,1\x00\xac/!\x1d\x17y,\xe0\x04\xac/!n\x85,A\x00\xb6/!d\x85,Y\x00\xb6/!d\x85,L\x00\xb8/!d\x85,M\x00\xb8/!d\x85,N\x00\xb8/!d\x85,O\x00\xb8/!d\x85,P\x00\xb8/!d\x85,Q\x00\xb8/!d\x85,R\x00\xb8/!d\x85,R\x00\xb8/!d\x85,S\x00\xb8/!d\x85,T\x00\xb8/!d\x85,U\x00\xb8/!d\x85,W\x00\xb8/!P\x85,\x1e\x00\xb8/!T\x85,;\x01\xac/ tU,\x81\x03, \x00/!zS,o\x02, \x00/!zS,o\x02, \x00/!\x1d3V,|\x02, \x00/!P\x85,\x1a\x00,\$\x00/ \x82U,\x86\x03,%\x00/!K[,1\x00\xac/!OV,o\x03\xac/!OV,p\x03\xac/!OV,q\x03\xac/!OV,r\x03\xac/!_V\xb1,*\x00/!Oe,\x1f\x00,,\x00/#9i\x8d\x1d~\x89\xb9\x18\x00\x11\x049i\x8d\x1d~\x89\xb9\x18\x00\x11\x07,\"\x00\xac/ |U,v\x03,.\x00/ \x83U,*\x03,/\x00/#9i\x8d\x1d~\x89\xb9\x18\x00\x11\x049i\x8d\x1d~\x89\xb9\x18\x00\x11\x07,\x1e\x00,.\x00@\x07\x01\x0a\x0a\x0a\x0a\x0c\x0d\x0d\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0b\x0b\x0b\x0b\x02\x0e\x02\x03\x03\x03\x04\x04\x04\x04\x04\x05\x06\x0f\x11\x15\x1f! &'\x06****,--*******++++01\x06\x06\x01", ipo_purity_bits=0x00000509, purity_bits=0x00000509, analysis_results=Core.Compiler.AnalysisResults(result=nothing, next=#<null>), isspecsig=false, precompile=false, relocatability=0x01, invoke=0x0000000000000000, specptr=0x0000000000000000)$5 = void
(rr) p jl_(&$4)
0x7f8ec4836070 = Core.CodeInstance(def=getindex(WeakRefStrings.StringArray{Union{Base.Missing, 𝒯}, 1} where 𝒯, Int64) from getindex(WeakRefStrings.StringArray{T, N} where N, Integer...) where {T}, next=#<null>, min_world=0x0000000000006639, max_world=0x0000000000000001, rettype=Any, exctype=Any, rettype_const=#<null>, inferred="\x13\x00\x00\x0f\x04\x03\x00\x00\x00\x00\x08\x08\x16+@\x00\xe1.\x1f!L8\xbd49i\x8d\x1d~\x89\xb9\x18\x00\x11\x00.\x1f L8\xbe\xbc\x09\x00\x1d\xa06\x03\x03\xcc.\x1f!}\x03\x02\xbc.\x1f!g\x1f!\x9d\x03\x05.\x1f!L\x03\x014w\x09\x01\x1d\xa0?\x09\x04B\x1f!L\x03\x07\xbc\x03\x08.\x1f!g\x1f!\x9d\x03\x09.\x1f!\x9c\x03\x06\x03\x0a6\x03\x0b\xc93\xcc-\x1f G\x03\x02\x09\x04C9\xa1T\x0c7\x94\xb9\x18\x00\x11\x00\x1f!\x1d\x8d\x03\x01\x03\x0e7\x08.\x1f!L\x03\x0149\xa1T\x0c7\x94\xb9\x18\x00\x11\x01\x09\x04B\x1f!\x1dc\x03\x11\x03\x02>\x09\x04B\x1f!\x1df\x03\x124\x1dY>3+\x15\x00\x1e9i\x8d\x1d~\x89\xb9\x18\x00\x11\x01.\x1f!N\x03\x13\x03\x156\x03\x16+\x1b\x00\x09\x01B\x1e\x1d\xe4-\x1e]\x03\x187\x08\x09\x01p\xbc.\x1e\x1d\xef\x1e9i\x8d\x1d~\x89\xb9\x18\x00\x11\x02\x03\x1b6\x03\x1c+!\x00.\x1eN\x03\x13\x1e9i\x8d\x1d~\x89\xb9\x18\x00\x11\x036\x03\x1e+!\x007\x1e\x1d\xe7\x09\x01p\xbc.\x1f!L8\xbd49i\x8d\x1d~\x89\xb9\x18\x00\x11\x05.\x1f!L\x03\"49\xa1T\x0c7\x94\xb9\x18\x00\x11\x01.\x1f!L\x03#49\xa1T\x0c7\x94\xb9\x18\x00\x11\x02.\x1f!g(\x00&\x03\$.\x1f g\x1f \x9d\x03%.\x1f!\x1dT\x03&\x03\x13.\x1f g(\x00&\x03'.\x1f!L8\xbd49i\x8d\x1d~\x89\xb9\x18\x00\x11\x06.\x1f L8\xbe\xbc\x09\x00\x1d\xa06\x03++9\x00.\x1f!}\x03*\xbc.\x1f!g\x1f!\x9d\x03-.\x1f!L\x03)4w\x09\x01\x1d\xa0?\x09\x04B\x1f!L\x03/\xbc\x030.\x1f!g\x1f!\x9d\x031.\x1f!\x9c\x03.\x0326\x033+6\x003+9\x00-\x1f G\x03*\x09\x04C9\xa1T\x0c7\x94\xb9\x18\x00\x11\x03\x1f!\x1d\x8d\x03)\x0367\x08.\x1f!L\x03)49\xa1T\x0c7\x94\xb9\x18\x00\x11\x01\x09\x04B\x1f!\x1dc\x039\x03*>\x09\x04B\x1f!\x1df\x03:4\x1dY>3+=\x00.\x1f \x9e\x1f |\x03;\x09\x03R9\xa1T\x0c7\x94\xb9\x18\x00\x11\x04\x03(\x03=.\x1eP\x03!\x03>7\x03?\x16+@\x00\xe1'\xf4\x00%\xd0A\x00%\xf49\xa1T\x0c7\x94\xb9\x18\x00\x11\x05\xd0\x00%\xf4\xd0AA9\xa1T\x0c7\x94\xb9\x18\x00\x11\x05\xd5\xd59\xa1T\x0c7\x94\xb9\x18\x00\x11\x069\xa1T\x0c7\x94\xb9\x18\x00\x11\x06\xf4A\xf4\xd0AA\xd5\xd59\xa1T\x0c7\x94\xb9\x18\x00\x11\x0a\xd0A\xd0AA9\xa1T\x0c7\x94\xb9\x18\x00\x11\x0a\xec9\xa1T\x0c7\x94\xb9\x18\x00\x11\x08\xe5(\x00&\xf4\xf4(\x00&\xed\x00%\xd0A\x00%\xf49\xa1T\x0c7\x94\xb9\x18\x00\x11\x05\xd0\x00%\xf4\xd0AA9\xa1T\x0c7\x94\xb9\x18\x00\x11\x05\xd5\xd59\xa1T\x0c7\x94\xb9\x18\x00\x11\x099\xa1T\x0c7\x94\xb9\x18\x00\x11\x09\xf3A\x00%9\xa1T\x0c7\x94\xb9\x18\x00\x11\x04AA\x16+@\x00\xed\xe0\x0a\x00\x00\xe0\x00\x00\x00\xc0\x02\x00\x00\xa0\x02\x00\x00\xe0\x02\x00\x00\xe0\x02\x00\x00\xc0\x0a\x00\x00\xc0\x02\x00\x00\xe0\x02\x00\x00\xe0\x02\x00\x00\xe0\x02\x00\x00\xa0\x02\x00\x00\xa0\x02\x00\x00\xe0\x02\x00\x00`\x02\x00\x00\xa0\x02\x00\x00\xc0\x0a\x00\x00\xe0\x08\x00\x00\xc0\x08\x00\x00\xa0\x02\x00\x00\xc0\x00\x00\x00\xe0\x02\x00\x00\xa0\x02\x00\x00(\x00\x00\x00h\x02\x00\x00\xa0\x02\x00\x00\xe0\x02\x00\x00\xe0\x02\x00\x00\xa0\x02\x00\x00\xe0\x02\x00\x00\xa0\x02\x00\x00\xa0\x02\x00\x00\xe0\x02\x00\x00\xe0\x0a\x00\x00\xc0\x0a\x00\x00\xe0\x0a\x00\x00\xe0\x02\x00\x00\xe0\x02\x00\x00\xe0\x02\x00\x00\xe0\x02\x00\x00\xe0\x0a\x00\x00\xe0\x00\x00\x00\xc0\x02\x00\x00\xa0\x02\x00\x00\xe0\x02\x00\x00\xe0\x02\x00\x00\xc0\x0a\x00\x00\xc0\x02\x00\x00\xe0\x02\x00\x00\xe0\x02\x00\x00\xe0\x02\x00\x00\xa0\x02\x00\x00\xa0\x02\x00\x00\xe0\x02\x00\x00`\x02\x00\x00\xa0\x02\x00\x00\xc0\x0a\x00\x00\xe0\x08\x00\x00\xc0\x08\x00\x00\xa0\x02\x00\x00\xe0\x02\x00\x00\xe0\x02\x00\x00\x00\x00\x00\x00\xa0\x02\x00\x00@\x16+1\x00'\x00//#O9i\x8d\x1d~\x89\xb9\x18\x00\x11\x07,\x04\x01\xa6/#O9i\x8d\x1d~\x89\xb9\x18\x00\x11\x07,\x05\x01\xa6/#O9i\x8d\x1d~\x89\xb9\x18\x00\x11\x07,\x06\x01\xa6/#O9i\x8d\x1d~\x89\xb9\x18\x00\x11\x07,\x09\x01\xa6/#O9i\x8d\x1d~\x89\xb9\x18\x00\x11\x07,\x0a\x01\xa6/#O9i\x8d\x1d~\x89\xb9\x18\x00\x11\x07,\x0d\x01\xa6/!K[,1\x00\xa7/!OV,o\x03\xa7/!OV,p\x03\xa7/!OV,q\x03\xa7/!OV,r\x03\xa7/!_V\xb1\xb0/!Oe,\x1f\x00\xb2/!Q^,l\x02\xa8/!K[,1\x00\xac/!\x1d\x17y,\xe0\x04\xac/!n\x85,A\x00\xb6/!d\x85,Y\x00\xb6/!d\x85,L\x00\xb8/!d\x85,M\x00\xb8/!d\x85,N\x00\xb8/!d\x85,O\x00\xb8/!d\x85,P\x00\xb8/!d\x85,Q\x00\xb8/!d\x85,R\x00\xb8/!d\x85,R\x00\xb8/!d\x85,S\x00\xb8/!d\x85,T\x00\xb8/!d\x85,U\x00\xb8/!d\x85,W\x00\xb8/!P\x85,\x1e\x00\xb8/!T\x85,;\x01\xac/ tU,\x81\x03, \x00/!zS,o\x02, \x00/!zS,o\x02, \x00/!\x1d3V,|\x02, \x00/!P\x85,\x1a\x00,\$\x00/ \x82U,\x86\x03,%\x00/!K[,1\x00\xac/!OV,o\x03\xac/!OV,p\x03\xac/!OV,q\x03\xac/!OV,r\x03\xac/!_V\xb1,*\x00/!Oe,\x1f\x00,,\x00/#9i\x8d\x1d~\x89\xb9\x18\x00\x11\x049i\x8d\x1d~\x89\xb9\x18\x00\x11\x07,\"\x00\xac/ |U,v\x03,.\x00/ \x83U,*\x03,/\x00/#9i\x8d\x1d~\x89\xb9\x18\x00\x11\x049i\x8d\x1d~\x89\xb9\x18\x00\x11\x07,\x1e\x00,.\x00@\x07\x01\x0a\x0a\x0a\x0a\x0c\x0d\x0d\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0b\x0b\x0b\x0b\x02\x0e\x02\x03\x03\x03\x04\x04\x04\x04\x04\x05\x06\x0f\x11\x15\x1f! &'\x06****,--*******++++01\x06\x06\x01", ipo_purity_bits=0x00000509, purity_bits=0x00000509, analysis_results=Core.Compiler.AnalysisResults(result=nothing, next=#<null>), isspecsig=false, precompile=false, relocatability=0x01, invoke=0x0000000000000000, specptr=0x0000000000000000)

And if we look this up in edges, we see that it also appears twice, with identical edges:

  Array{Int32, 1}(dims=(13,), mem=Memory{Int32}(13, 0x7f8ec50d6870)[7, 11968, 5022, 536, 2423, 11969, 11970, 318, 8572, 19952, 18693, 22915, 18383]),                                                                                                   
  Array{Int32, 1}(dims=(13,), mem=Memory{Int32}(13, 0x7f8ec4f08510)[7, 11968, 16477, 536, 2423, 11969, 11970, 318, 8572, 19952, 18693, 22915, 18383]),                                                                                     

@vtjnash
Copy link
Member Author

vtjnash commented Jan 29, 2024

Looks like what happened is during serialization of Parquet2 there were indeed 2 distinct MethodInstance objects

getindex(WeakRefStrings.StringArray{Union{Base.Missing, 𝒯}, 1} where 𝒯, Int64)
getindex(WeakRefStrings.StringArray{𝒯, 1} where Base.Missing<:𝒯<:Any, Int64)

But during deserialization of it, these got merged into a single object.

So basically, it is version of this bug:

julia> A = Tuple{Int, Val{Union{Base.Missing, 𝒯}} where 𝒯, Int}
Tuple{Int64, Val{Union{Missing, 𝒯}} where 𝒯, Int64}

julia> B = Tuple{Int, Val{𝒯}, Int} where Missing<:𝒯<:Any
Tuple{Int64, Val{𝒯}, Int64} where 𝒯>:Missing

julia> hash(A)
0x77cfa1eef01bca90

julia> hash(B)
0x3e21480671c6d257

julia> A == B
true

Though also possibly a case that deserialization should just accept and ignore this bug.

@fonsp
Copy link
Member

fonsp commented Feb 7, 2024

This causes a test failure in Pluto.jl, I wonder why this was not caught by PkgEval? Running pkg> test Pluto on nightly will fail. (Luckily it's an easy fix for us.)

vchuravy added a commit to JuliaGPU/GPUCompiler.jl that referenced this pull request Feb 8, 2024
@fonsp
Copy link
Member

fonsp commented Feb 20, 2024

Does someone know why the failing Pluto tests were not caught by PkgEval? It looks like other packages were checked (#52997 (comment)).

@vtjnash
Copy link
Member Author

vtjnash commented Feb 20, 2024

It can be a somewhat stochastic failure, since it depends on the specific order in which hashes are inserted into the dictionary, which can depend on the memory layout

@fonsp
Copy link
Member

fonsp commented Feb 20, 2024

Our case was not stochastic, we did setfield!(method, primary_world, 1) which throws on nightly. I'm just wondering because we often seem to get issues on nightly that we didn't get notified about during the development.

@vtjnash
Copy link
Member Author

vtjnash commented Feb 20, 2024

Mutating that field is undefined behavior because it changes internal state that corrupts the internal invariants regarding the program representation and inference, so it is not expected to be safe

@fonsp
Copy link
Member

fonsp commented Feb 20, 2024

After this PR was merged, the Pluto tests failed with:

ERROR: LoadError: ConcurrencyViolationError("setfield!: atomic field cannot be written non-atomically")

but this was not caught by PkgEval and I'm wondering why. We fixed it afterwards in fonsp/Pluto.jl#2807

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

multithreading Base.Threads and related functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants