-
Notifications
You must be signed in to change notification settings - Fork 109
Feature: Device messages #1832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Feature: Device messages #1832
Conversation
- This moves message container to RAJA to avoid dependency
issues with required atomic operation. Currently, testing and
waiting for messages will block until the stream is synchronized.
|
I think the design would benefit from a view like object that can be used in device kernels. |
|
@bechols97 , can you add an example in the examples folder? I have a use case where this may be handy. In my application if a thread gets a negative value, I want to take note and output it at the end of the kernel by the root rank. Currently I am using printf and every thread that encounters the negative value spews information onto the screen. To double check, would this be a use case? |
|
Hi @artv3, yes that could be a use case for this. I will add an example of something similar to the |
- This allows the message queue to be passed to RAJA kernels
- This also allows the message queue to be allocated with pinned
memory when needed
- Currently, example requires XNACK with HIP. (message queue should be using
pinned memory so need to look into this)
|
@bechols97 so one of the libraries I maintain has a failure macro that on the host side throw an error with some useful error messages associated with it which can provide useful context for users / developers for why something failed. Do you think we could emulate something like with this framework if we could provide the absolute max size of the char array / string literal that we'd like and then at the error site have that passed into your message class? |
|
I think it would be possible to add a fixed capacity string-like object that could be passed through this interface. |
|
@rcarson3 Yes, the original intention behind this idea is to be used as a device side error handler (though left to be more generic in case there are other use cases). As @MrBurmark mentioned, it would be possible to use a fixed-string object with the message handler. For string literals, you would want some type of fixed-string object / char array to store the string that is later passed to a host side callback. In addition to the fixed-string object, any type that is trivially destructible should work as well. |
|
The current state looks good if you're trying to handle a single kind of error. |
|
There are use cases where we might want to handle multiple kinds of errors each with different data in the same loop. Does anyone else have such a use case? What do you think of a slightly more general interface that looks more like this? By moving the signature to the queues we don't know the message sizes upfront. So I'm not sure if it makes sense to do the sizing upfront or later when we know what kinds of messages are possible. |
|
Another use case that I'm considering is having a long lived logger with an allocation. Then I can enqueue multiple types of messages in that while keeping the gpu running and check for messages occasionally to avoid extra synchronizes. |
|
Being able to support multiple error/logging messages within the same loop is definitely a use case that we would want to support. This is something that the library I help maintain uses. There are a couple of concerns with moving the callback to be a parameter of the
Just to show another option with the current interface: (Please note this example is not entirely the same and requires some additional types to be created; however, one could create a type similar to |
| // This is a simplified example fixed string to show how | ||
| // custom types can be used with the message queue. | ||
| template <std::size_t N> | ||
| class my_string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding an example with a "string-like" type in it :) It's helps provide me a road map for how I could replace my current Warning / Error reporting on the GPUs with this framework.
|
@bechols97 , can we bring this up to date? |
| // SPDX-License-Identifier: (BSD-3-Clause) | ||
| //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~// | ||
|
|
||
| #ifndef RAJA_mpsc_queue_HPP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| #ifndef RAJA_mpsc_queue_HPP | |
| #ifndef RAJA_spsc_queue_HPP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for line 20 below and for line 6.
| * | ||
| * \file | ||
| * | ||
| * \brief Header file containing implementation for a MPSC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * \brief Header file containing implementation for a MPSC | |
| * \brief Header file containing implementation for a SPSC |
Would it be worth defining what these acronyms (MPSC/SPSC) mean somewhere?
| { | ||
| if (m_container != nullptr) | ||
| { | ||
| auto local_size = RAJA::atomicInc<auto_atomic>(&(m_container->m_size)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the only difference between MPSC and SPSC the atomic add? I wonder if there's a way to combine these somehow to reduce code duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe if you switch to the seq_atomic policy it will just do an add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the seq_atomic policy will just do an add. My original thought process was that the SPSC policies could have some additional optimizations with storing messages due to knowing that their aren't multiple publishers (though the default policy doesn't show a good example of this). However, I don't know if there would be any use cases for the SPSC policies. If there aren't any use cases at the moment for the SPSC, then these can be removed.
The MPSC policy was looking at using auto_atomic to support this with builds that have parallel sections with both GPUs and OpenMP in the same build.
- This now supports multiple message types with the same
buffer. Currently, only supports basic queue.
- Fixes some typos and adds to documentation
Summary
This PR adds a feature for device messages (i.e. store function arguments on device to be handled by a host callback at a later time). The original idea for the feature is to have better error handling on the device with handling any output on with a host callback. This moves the code from Camp to RAJA due dependency on atomic operations.
Design review
For the design, there are some open questions. (regarding these open questions please see the
Design notesbelow)Design notes
Based on discussion from a meeting:
message_handlerclass that stores the callback should be move-only to prevent accidental copies to a lambda with supporting a view-like queue to copy to device kernels (as mentioned below by @MrBurmark).