Fix memory corruption in mapreduce #2907

christiangnrd · 2025-10-01T19:03:01Z

@bvdmitri Can you check that this fixes your issue?

github-actions

CUDA.jl Benchmarks

Details

Benchmark suite	Current: `30c5352`	Previous: `bb88163`	Ratio
`latency/precompile`	`57177092254` ns	`57317397112` ns	`1.00`
`latency/ttfp`	`8163795201.5` ns	`8147713210` ns	`1.00`
`latency/import`	`4523409627` ns	`4526239162` ns	`1.00`
`integration/volumerhs`	`9616953.5` ns	`9611180` ns	`1.00`
`integration/byval/slices=1`	`146915` ns	`147188` ns	`1.00`
`integration/byval/slices=3`	`425906` ns	`426028` ns	`1.00`
`integration/byval/reference`	`145016` ns	`145313` ns	`1.00`
`integration/byval/slices=2`	`286368` ns	`286701.5` ns	`1.00`
`integration/cudadevrt`	`103608` ns	`103791` ns	`1.00`
`kernel/indexing`	`14214` ns	`14403` ns	`0.99`
`kernel/indexing_checked`	`15045` ns	`15312` ns	`0.98`
`kernel/occupancy`	`672.6582278481013` ns	`669.5886075949367` ns	`1.00`
`kernel/launch`	`2220.8888888888887` ns	`2207.5555555555557` ns	`1.01`
`kernel/rand`	`14822` ns	`15091` ns	`0.98`
`array/reverse/1d`	`20317` ns	`20158` ns	`1.01`
`array/reverse/2dL_inplace`	`66973.5` ns	`67142.5` ns	`1.00`
`array/reverse/1dL`	`70557` ns	`70245` ns	`1.00`
`array/reverse/2d`	`21952` ns	`22186` ns	`0.99`
`array/reverse/1d_inplace`	`9734` ns	`9826` ns	`0.99`
`array/reverse/2d_inplace`	`11155` ns	`13581` ns	`0.82`
`array/reverse/2dL`	`73918.5` ns	`74246` ns	`1.00`
`array/reverse/1dL_inplace`	`66887` ns	`66920` ns	`1.00`
`array/copy`	`20876` ns	`20927` ns	`1.00`
`array/iteration/findall/int`	`157196` ns	`157946` ns	`1.00`
`array/iteration/findall/bool`	`139865` ns	`138951` ns	`1.01`
`array/iteration/findfirst/int`	`161472` ns	`160970.5` ns	`1.00`
`array/iteration/findfirst/bool`	`162192` ns	`161691.5` ns	`1.00`
`array/iteration/scalar`	`72824` ns	`74056.5` ns	`0.98`
`array/iteration/logical`	`215807` ns	`216360.5` ns	`1.00`
`array/iteration/findmin/1d`	`50744` ns	`50590` ns	`1.00`
`array/iteration/findmin/2d`	`96457` ns	`96901` ns	`1.00`
`array/reductions/reduce/Int64/1d`	`43428` ns	`43650` ns	`0.99`
`array/reductions/reduce/Int64/dims=1`	`55108` ns	`44628` ns	`1.23`
`array/reductions/reduce/Int64/dims=2`	`61402` ns	`61474` ns	`1.00`
`array/reductions/reduce/Int64/dims=1L`	`89003` ns	`89082` ns	`1.00`
`array/reductions/reduce/Int64/dims=2L`	`87870` ns	`87873` ns	`1.00`
`array/reductions/reduce/Float32/1d`	`36672` ns	`37502.5` ns	`0.98`
`array/reductions/reduce/Float32/dims=1`	`46136.5` ns	`42229.5` ns	`1.09`
`array/reductions/reduce/Float32/dims=2`	`59577` ns	`60144` ns	`0.99`
`array/reductions/reduce/Float32/dims=1L`	`52448` ns	`52690` ns	`1.00`
`array/reductions/reduce/Float32/dims=2L`	`71869` ns	`72227.5` ns	`1.00`
`array/reductions/mapreduce/Int64/1d`	`43769` ns	`43386` ns	`1.01`
`array/reductions/mapreduce/Int64/dims=1`	`44642.5` ns	`49211.5` ns	`0.91`
`array/reductions/mapreduce/Int64/dims=2`	`61685.5` ns	`61803` ns	`1.00`
`array/reductions/mapreduce/Int64/dims=1L`	`89029` ns	`89097` ns	`1.00`
`array/reductions/mapreduce/Int64/dims=2L`	`88149.5` ns	`88388` ns	`1.00`
`array/reductions/mapreduce/Float32/1d`	`36674` ns	`38045` ns	`0.96`
`array/reductions/mapreduce/Float32/dims=1`	`42398` ns	`42251.5` ns	`1.00`
`array/reductions/mapreduce/Float32/dims=2`	`60172` ns	`60265` ns	`1.00`
`array/reductions/mapreduce/Float32/dims=1L`	`52809` ns	`52870` ns	`1.00`
`array/reductions/mapreduce/Float32/dims=2L`	`72225.5` ns	`72214` ns	`1.00`
`array/broadcast`	`20138` ns	`20220` ns	`1.00`
`array/copyto!/gpu_to_gpu`	`11333` ns	`13230` ns	`0.86`
`array/copyto!/cpu_to_gpu`	`216090` ns	`215637` ns	`1.00`
`array/copyto!/gpu_to_cpu`	`285576` ns	`283097` ns	`1.01`
`array/accumulate/Int64/1d`	`125094` ns	`124870` ns	`1.00`
`array/accumulate/Int64/dims=1`	`83637` ns	`83478` ns	`1.00`
`array/accumulate/Int64/dims=2`	`157733` ns	`157866` ns	`1.00`
`array/accumulate/Int64/dims=1L`	`1709172.5` ns	`1708781.5` ns	`1.00`
`array/accumulate/Int64/dims=2L`	`966298.5` ns	`966771.5` ns	`1.00`
`array/accumulate/Float32/1d`	`109360` ns	`109240` ns	`1.00`
`array/accumulate/Float32/dims=1`	`80190.5` ns	`80433` ns	`1.00`
`array/accumulate/Float32/dims=2`	`147574` ns	`147663` ns	`1.00`
`array/accumulate/Float32/dims=1L`	`1618271.5` ns	`1617944.5` ns	`1.00`
`array/accumulate/Float32/dims=2L`	`698216` ns	`698274` ns	`1.00`
`array/construct`	`1285.8` ns	`1301.7` ns	`0.99`
`array/random/randn/Float32`	`45307.5` ns	`45481` ns	`1.00`
`array/random/randn!/Float32`	`25085` ns	`25068` ns	`1.00`
`array/random/rand!/Int64`	`27391` ns	`27506` ns	`1.00`
`array/random/rand!/Float32`	`8819` ns	`8985.666666666666` ns	`0.98`
`array/random/rand/Int64`	`29957` ns	`30362` ns	`0.99`
`array/random/rand/Float32`	`13203` ns	`13368.5` ns	`0.99`
`array/permutedims/4d`	`60161` ns	`60223.5` ns	`1.00`
`array/permutedims/2d`	`53815` ns	`54018.5` ns	`1.00`
`array/permutedims/3d`	`54707` ns	`54770.5` ns	`1.00`
`array/sorting/1d`	`2757932` ns	`2758706` ns	`1.00`
`array/sorting/by`	`3344898.5` ns	`3345315.5` ns	`1.00`
`array/sorting/2d`	`1080629` ns	`1082259` ns	`1.00`
`cuda/synchronization/stream/auto`	`1061.2` ns	`1038.3` ns	`1.02`
`cuda/synchronization/stream/nonblocking`	`7540.4` ns	`8063.700000000001` ns	`0.94`
`cuda/synchronization/stream/blocking`	`827.8958333333334` ns	`818` ns	`1.01`
`cuda/synchronization/context/auto`	`1177.3` ns	`1188.9` ns	`0.99`
`cuda/synchronization/context/nonblocking`	`7189.6` ns	`8084.1` ns	`0.89`
`cuda/synchronization/context/blocking`	`909.7777777777778` ns	`908.8913043478261` ns	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

maleadt · 2025-10-02T10:09:02Z

Hmm, this is why I wanted to avoid duplicating the whole launch configuration determination. It wasn't clear to me what exactly we could re-use from the previous computations, and it's a lot of code to copy/paste.

Let's hope this fixes the issue.

bvdmitri · 2025-10-02T14:33:35Z

I will test it on our cluster as soon as its done with other computations, as I don't want to interrupt it and it uses pretty much 100% GPU

christiangnrd · 2025-10-02T14:56:34Z

this is why I wanted to avoid duplicating the whole launch configuration determination

Yeah this was a nasty one.

I know your policy is to only do the work if someone asks, but I'm wondering if this should also be backported and released as 5.8.5 since this is a fix for a fix of 5.8.4. With your go-ahead, once this is merged, since this is my bug, I'll do all the backport work I can (everything but merging the backport PR)

I've opened #2909 to hopefully resolve the 1.11 failures.

vchuravy · 2025-10-02T15:03:56Z

Yes we should backport if possible.

bvdmitri · 2025-10-03T06:43:17Z

As far as I can tell the issue has been resolved, I couldn't reproduce for a while. Thanks! Do you mind adding a couple of tests to reduce the probability of it happening in the future? I couldn't easily find a single test for sum(...; dims = 1) so even a basic test (not sure how to test this particular bug tbh) would be a nice improvement

maleadt · 2025-10-03T06:44:53Z

I couldn't easily find a single test for sum(...; dims = 1)

Plenty of tests here: https://github.com/JuliaGPU/GPUArrays.jl/blob/3be4a0978f643b2322c4574f1c7d48722ef43eed/test/testsuite/reductions.jl#L101-L114

Typo

30c5352

github-actions bot reviewed Oct 1, 2025

View reviewed changes

christiangnrd mentioned this pull request Oct 2, 2025

Memory corruption in sum(...; dims = 1) #2903

Closed

maleadt approved these changes Oct 3, 2025

View reviewed changes

maleadt added the bugfix This gets something working again. label Oct 3, 2025

maleadt merged commit 3d278b6 into JuliaGPU:master Oct 3, 2025
2 of 3 checks passed

christiangnrd deleted the patch-2 branch October 3, 2025 10:33

christiangnrd mentioned this pull request Oct 3, 2025

Backports for 5.8.5 #2916

Merged

maleadt pushed a commit to christiangnrd/CUDA.jl that referenced this pull request Oct 3, 2025

Fix memory corruption in mapreduce (JuliaGPU#2907)

75887d9

This was referenced Oct 7, 2025

Update CUDA and JLD2 compatibility versions FourierFlows/GeophysicalFlows.jl#397

Merged

(0.100.1) Set CUDA.jl compatibility to v5.9+ CliMA/Oceananigans.jl#4824

Merged

maleadt pushed a commit that referenced this pull request Oct 7, 2025

Fix memory corruption in mapreduce (#2907)

f6d0e11

xkykai mentioned this pull request Oct 14, 2025

Accomodate [email protected] in Project.toml temporarily CliMA/Oceananigans.jl#4856

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix memory corruption in mapreduce #2907

Fix memory corruption in mapreduce #2907

Uh oh!

christiangnrd commented Oct 1, 2025

Uh oh!

github-actions bot left a comment

Uh oh!

maleadt commented Oct 2, 2025

Uh oh!

bvdmitri commented Oct 2, 2025

Uh oh!

christiangnrd commented Oct 2, 2025

Uh oh!

vchuravy commented Oct 2, 2025

Uh oh!

bvdmitri commented Oct 3, 2025

Uh oh!

maleadt commented Oct 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix memory corruption in mapreduce #2907

Fix memory corruption in mapreduce #2907

Uh oh!

Conversation

christiangnrd commented Oct 1, 2025

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

CUDA.jl Benchmarks

Uh oh!

maleadt commented Oct 2, 2025

Uh oh!

bvdmitri commented Oct 2, 2025

Uh oh!

christiangnrd commented Oct 2, 2025

Uh oh!

vchuravy commented Oct 2, 2025

Uh oh!

bvdmitri commented Oct 3, 2025

Uh oh!

maleadt commented Oct 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants