Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 27 additions & 2 deletions sycl/doc/design/CommandGraph.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,8 +133,8 @@ data.
Implementation of [UR command-buffers](#UR-command-buffer-experimental-feature)
for each of the supported SYCL 2020 backends.

This is currently only Level Zero but more sub-sections will be added here as
other backends are implemented.
Currently Level Zero and CUDA backends are implemented.
More sub-sections will be added here as other backends are supported.

### Level Zero

Expand Down Expand Up @@ -215,3 +215,28 @@ Level Zero:

Future work will include exploring L0 API extensions to improve the mapping of
UR command-buffer to L0 command-list.

### CUDA

The SYCL Graph CUDA backend relies on the
[CUDA Graphs feature](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-graphs),
which is the CUDA public API for batching series of operations,
such as kernel launches, connected by dependencies.

UR commands (e.g. kernels) are mapped as graph nodes using the
[CUDA Driver API](https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__GRAPH.html#group__CUDA__GRAPH).
The CUDA Driver API is preferred over the CUDA Runtime API to implement
the SYCL Graph backend to remain consistent with other UR functions.
Synchonizations between commands (UR sync-points) are implemented
using graph dependencies.

Executable CUDA Graphs can be submitted to a CUDA stream
in the same way as regular kernels.
The CUDA backend enables enqueuing events to wait for into a stream.
It also allows signaling the completion of a submission with an event.
Therefore, submitting a UR command-buffer consists only of submitting to a stream
the executable CUDA Graph that represent this series of operations.

An executable CUDA Graph, which contains all commands and synchonization
information, is saved in the UR command-buffer to allow for efficient
graph resubmission.