KA currently uses a very verbose and explicit dependency management.
event = kernel(CPU())(...)
event = kernel(CPU())(..., dependencies=(event,))
This was added since at the time CUDA.jl used one stream, and thus exposing concurrency was harder.
Now @maleadt added a really nice design around task local streams, allowing users to use Julia tasks to express concurrency on the GPU as well.
So I am thinking that in the interest of reducing the complexity of KA in usage and to align it better with CUDA.jl I would like to remove the dependency management
and move to a stream based model.
One open question is how to deal with the CPU (but this could mean we simply move to synchronous execution there, reducing latency as well)
An alternative that I see is to explore an more implicit dependency model based on the arguments to the kernel, I think that would be similar to SYCL or what AMDGPU currently does.
This would be the first step towards KA 1.0
CC interested parties: @glwagner @lcw @jpsamaroo @simonbyrne @kpamnany @omlins
KA currently uses a very verbose and explicit dependency management.
This was added since at the time CUDA.jl used one stream, and thus exposing concurrency was harder.
Now @maleadt added a really nice design around task local streams, allowing users to use Julia tasks to express concurrency on the GPU as well.
So I am thinking that in the interest of reducing the complexity of KA in usage and to align it better with CUDA.jl I would like to remove the dependency management
and move to a stream based model.
One open question is how to deal with the CPU (but this could mean we simply move to synchronous execution there, reducing latency as well)
An alternative that I see is to explore an more implicit dependency model based on the arguments to the kernel, I think that would be similar to SYCL or what AMDGPU currently does.
This would be the first step towards KA 1.0
CC interested parties: @glwagner @lcw @jpsamaroo @simonbyrne @kpamnany @omlins