-
Notifications
You must be signed in to change notification settings - Fork 808
[SYCL] [DOC] Prepare design-document for assert feature #3461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
2911ea7
b69a1cd
15ea88e
ca08fec
1f8d9a9
2ee590c
77699a2
001a573
32b6479
b8637c2
b0cd85f
8c03648
121c945
13b40fd
a4b4884
c06db5f
823124a
a99368b
78d7fcb
6882e95
32663e0
2b84a83
423107b
7611511
a31b808
257054a
3f50173
c1326aa
5095b1a
5078fcc
4dc7b1f
9bcac02
7ec3ac8
8cbfde7
cc085f5
8835bf8
8835756
ecb8659
07debdb
995e4d8
b57ac48
d2f13ff
6281bc5
a5461f3
32a32f4
641d071
dc058a9
16fd8f0
fbca768
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,138 @@ | ||||||||||
| # Assert feature | ||||||||||
|
|
||||||||||
| **IMPORTANT**: This document is a draft. | ||||||||||
|
|
||||||||||
| During debugging of kernel code user may put assertions here and there. | ||||||||||
| The expected behaviour of assertion failure at host is application abort. | ||||||||||
| Our choice for device-side assertions is asynchronous exception in order to | ||||||||||
| allow for extensibility. | ||||||||||
|
|
||||||||||
| The user is free to disable assertions by defining `NDEBUG` macro at | ||||||||||
| compile-time. | ||||||||||
|
|
||||||||||
|
|
||||||||||
| ## Use-case example | ||||||||||
|
|
||||||||||
| ``` | ||||||||||
| using namespace cl::sycl; | ||||||||||
s-kanaev marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
| auto ErrorHandler = [] (exception_list Exs) { | ||||||||||
| for (exception_ptr const& E : Exs) { | ||||||||||
s-kanaev marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
| try { | ||||||||||
| std::rethrow_exception(E); | ||||||||||
| } | ||||||||||
| catch (event_error const& Ex) { | ||||||||||
s-kanaev marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
| std::cout << “Exception - ” << Ex.what(); // assertion failed | ||||||||||
| std::abort(); | ||||||||||
s-kanaev marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
| } | ||||||||||
| } | ||||||||||
| }; | ||||||||||
|
|
||||||||||
| void user_func(item<2> Item) { | ||||||||||
| assert((Item[0] % 2) && “Nil”); | ||||||||||
| } | ||||||||||
|
|
||||||||||
| int main() { | ||||||||||
| queue Q(ErrorHandler); | ||||||||||
| q.submit([&] (handler& CGH) { | ||||||||||
| CGH.parallel_for<class TheKernel>(range<2>{N, M}, [=](item<2> It) { | ||||||||||
| do_smth(); | ||||||||||
| user_func(It); | ||||||||||
| do_smth_else(); | ||||||||||
| }); | ||||||||||
| }); | ||||||||||
| Q.wait_and_throw(); | ||||||||||
| std::cout << “One shouldn’t see this message.“; | ||||||||||
| return 0; | ||||||||||
| } | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| In this use-case every work-item with even X dimension will trigger assertion | ||||||||||
s-kanaev marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
| failure. Assertion failure should be reported via asynchronous exceptions. If | ||||||||||
| asynchronous exception handler is set the failure is reported with | ||||||||||
| `cl::sycl::event_error` exception. Otherwise, SYCL Runtime should trigger abort. | ||||||||||
s-kanaev marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
| At least one failed assertion should be reported. | ||||||||||
|
|
||||||||||
| When multiple kernels are enqueued and both fail at assertion at least single | ||||||||||
| assertion should be reported. | ||||||||||
s-kanaev marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
|
|
||||||||||
| ## User requirements | ||||||||||
|
|
||||||||||
| From user's point of view there are the following requirements: | ||||||||||
|
|
||||||||||
| | # | Title | Description | Importance | | ||||||||||
| | - | ----- | ----------- | ---------- | | ||||||||||
| | 1 | Handle assertion failure | Signal about assertion failure via SYCL asynchronous exception | Must have | | ||||||||||
| | 2 | Print assert message | Assert function should print message to stderr at host | Must have | | ||||||||||
| | 3 | Stop under debugger | When debugger is attached, break at assertion point | Highly desired | | ||||||||||
| | 4 | Reliability | Assert failure should be reported regardless of kernel deadlock | Highly desired | | ||||||||||
|
|
||||||||||
s-kanaev marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
| ## Contents of `cl::sycl::event_error` | ||||||||||
s-kanaev marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
|
|
||||||||||
| `cl::sycl::event_error::what()` should return the same assertion failure message | ||||||||||
| as is printed at the time being. | ||||||||||
|
|
||||||||||
| Other than that, interface of `cl::sycl::event_error` should look like: | ||||||||||
| ``` | ||||||||||
| class event_error : public runtime_error { | ||||||||||
s-kanaev marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
| public: | ||||||||||
| event_error() = default; | ||||||||||
|
|
||||||||||
| event_error(const char *Msg, cl_int Err) | ||||||||||
| : event_error(string_class(Msg), Err) {} | ||||||||||
|
|
||||||||||
| event_error(const string_class &Msg, cl_int Err) : runtime_error(Msg, Err) {} | ||||||||||
| }; | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Besides, it is not always possible to build a callgraph for a kernel, so, this should be implemented too:
Suggested change
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @kbobrovs: This is probably not how it will work. My understanding is that the extension to allow indirect calls in device code is not yet complete or approved. Last time we talked about this in the language evolution group, we identify ways to narrow down the set of functions that could possibly be called from any one call site. I expect that all parts of DPC++ that need to form a static call graph will use these techniques to limit the call graph when there are indirect function calls. Since the indirect function call extension isn't finished yet, it seems better to me that we don't speculate here on how we will handle that case.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @gmlueck, in future - maybe, but I'm looking at today's function pointers implementation which works and needs to be covered by this design. Once something changes with function pointers spec, we will update. Alternatively, the design can mention that the callgraph may be inaccurate in presence of function pointers, and compiler must mark a kernel as "using assert" always, unless it can prove (using future language hints or analysis) that the kernel can never call an indirectly callable function whose (possibly inaccurate) callgraph contains a call to
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added suggested description. |
||||||||||
| Regardless of whether asynchronous exception handler is set or not, there's an | ||||||||||
| action to be performed by SYCL Runtime. To achieve this, information about | ||||||||||
| assert failure should be propagated from device-side to SYCL Runtime. This | ||||||||||
| should be performed via calls to `clGetEventInfo` for OpenCL backend and | ||||||||||
s-kanaev marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
| `zeEventQueryStatus` for Level-Zero backend. | ||||||||||
s-kanaev marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
|
|
||||||||||
| ## Terms | ||||||||||
|
|
||||||||||
| - Device-side Runtime - part of device-code, which is supplied by Device-side | ||||||||||
| Compiler. | ||||||||||
s-kanaev marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
| - Low-level Runtime - the backend/runtime, behind DPCPP Runtime. | ||||||||||
| - Device-side Compiler - compiler which generates device-native bitcode based | ||||||||||
| on input SPIR-V image. | ||||||||||
| - Accessor metadata - parts of accessor representation at device-side: pointer, | ||||||||||
| ranges, offset. | ||||||||||
|
|
||||||||||
| ## How it works? | ||||||||||
|
|
||||||||||
| For the time being, `assert(expr)` macro ends up in call to | ||||||||||
s-kanaev marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
| `__devicelib_assert_fail` function. This function is part of [Device library extension](doc/extensions/C-CXX-StandardLibrary/DeviceLibExtensions.rst#cl_intel_devicelib_cassert). | ||||||||||
| Device code already contains call to the function. Currently, a device-binary | ||||||||||
| is always linked against fallback implementation. | ||||||||||
| Device-side compiler/linker provides their implementation of `__devicelib_assert_fail` | ||||||||||
| and prefer this implementation over fallback one. | ||||||||||
|
|
||||||||||
| If Device-side Runtime supports `__devicelib_assert_fail` then Low-Level Runtime | ||||||||||
| is responsible for: | ||||||||||
s-kanaev marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
| - detecting if assert failure took place; | ||||||||||
| - flushing assert message to `stderr` on host. | ||||||||||
| When detected, Low-level Runtime reports assert failure to DPCPP Runtime | ||||||||||
| at synchronization points. | ||||||||||
s-kanaev marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
s-kanaev marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
|
|
||||||||||
| Refer to [OpenCL](doc/extensions/Assert/opencl.md) and [Level-Zero](doc/extensions/Assert/level-zero.md) | ||||||||||
| extensions. | ||||||||||
s-kanaev marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
|
|
||||||||||
| If Device-side Runtime doesn't support `__devicelib_assert_fail` then a buffer | ||||||||||
| based approach comes in place. The approach doesn't require any support from | ||||||||||
| Device-side Runtime. Neither it does from Low-level Runtime. | ||||||||||
|
|
||||||||||
| Within this approach, a dedicated assert buffer is allocated and implicit kernel | ||||||||||
| argument is introduced. The argument is an accessor with `discard_read_write` | ||||||||||
| or `discard_write` access mode. Accessor metadata is stored to program scope | ||||||||||
s-kanaev marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
s-kanaev marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
| variable. This allows to refer to the accessor without modifying each and every | ||||||||||
| user's function. Fallback implementation of `__devicelib_assert_fail` restores | ||||||||||
| accessor metadata from program scope variable and writes assert information to | ||||||||||
| the assert buffer. Atomic operations are used in order to not overwrite existing | ||||||||||
| information. | ||||||||||
|
|
||||||||||
| Storing and restoring of accessor metadata to/from program scope variable is | ||||||||||
| performed with help of builtins. Implementations of these builtins are | ||||||||||
| substituted by frontend. | ||||||||||
s-kanaev marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
|
|
||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| # Overview | ||
s-kanaev marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| This extension enables detection of assert failure of kernel. | ||
|
|
||
| # New enum value | ||
|
|
||
| `ze_result_t` enumeration should be augmented with `ZE_RESULT_ABORTED` enum | ||
s-kanaev marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| element. This enum value indicated a detected assert failure at device-side. | ||
|
|
||
| # Changed API | ||
|
|
||
| ``` | ||
| ze_event_handle_t Event; // describes an event of kernel been submitted previously | ||
| ze_result Result = zeEventQueryStatus(Event); | ||
| ``` | ||
|
|
||
| If kernel failed an assertion `zeEventQueryStatus` should return | ||
|
||
| `ZE_RESULT_ABORTED`. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| # Overview | ||
|
|
||
| This extension enables detection of assert failure of kernel. | ||
s-kanaev marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| # New error code | ||
|
|
||
| `CL_ASSERT_FAILURE` is added to indicate a detected assert failure at | ||
| device-side. | ||
|
|
||
| # Changed API | ||
|
|
||
| ``` | ||
| cl_event Event; // describes an event of kernel been submitted previously | ||
| cl_int Result; | ||
| size_t ResultSize; | ||
|
|
||
| clGetEventInfo(Event, CL_EVENT_COMMAND_EXECUTION_STATUS, sizeof(Result), &Result, &ResultSize); | ||
| ``` | ||
|
|
||
| If kernel failed an assertion `clGetEventInfo` should put `CL_ASSERT_FAILURE` | ||
| in `Result`. | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.