Skip to content

Conversation

@maleadt
Copy link
Member

@maleadt maleadt commented Sep 1, 2025

Requiring multiple launches probably didn't make it worth it anyway, and it introduces complexities wrt. the launch configuration, having to recompile and re-compute the size of the partial reduction.

Should fix #2863

Requiring multiple launches probably didn't make it worth it anyway,
and it introduces complexities wrt. the launch configuration, having
to recompile and re-compute the size of the partial reduction.
@ali-ramadhan
Copy link

Thank you again for the fix @maleadt! Would it be possible to get a new tagged release (v5.8.4?)? This would fix a lot of simulation scripts.

@maleadt
Copy link
Member Author

maleadt commented Sep 3, 2025

I'll create a backport release.

@maleadt maleadt merged commit 5f1ef7d into master Sep 3, 2025
2 of 3 checks passed
@maleadt maleadt deleted the tb/mapreduce branch September 3, 2025 06:49
maleadt added a commit that referenced this pull request Sep 3, 2025
Requiring multiple launches probably didn't make it worth it anyway,
and it introduces complexities wrt. the launch configuration, having
to recompile and re-compute the size of the partial reduction.
@christiangnrd
Copy link
Member

This seems to cause a pretty nasty performance regression for reductions that used the optimizations.

image

@maleadt
Copy link
Member Author

maleadt commented Sep 9, 2025

Thanks for catching this. There was a report on Slack as well.

kshyatt pushed a commit to christiangnrd/CUDA.jl that referenced this pull request Sep 9, 2025
christiangnrd added a commit to christiangnrd/CUDA.jl that referenced this pull request Sep 10, 2025
maleadt added a commit that referenced this pull request Sep 22, 2025
Requiring multiple launches probably didn't make it worth it anyway,
and it introduces complexities wrt. the launch configuration, having
to recompile and re-compute the size of the partial reduction.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Invalid kernel config generated by mapreducedim! with SubArray input and output

4 participants