Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Exploring the idea of exposing ray buffers populating them directly rather than array based versions of the existing API.
I think the array based versions are useful for small test cases but they inherently require host-to-device transfers which does severely limit their actual usability. Any production code on a GPU will never actually want to use these code paths since they end up being very expensive in device transfers.
I was already moving towards an API which exposes ray buffers directly as that is what I had implemented inside of the
ray-benchmarkminiapp. But I think I have landed on something which is a lot nicer for doing so here.I'm trying to make use of a callback method within XDG that exposes the ray buffer directly so a downstream code can provide that callback to populate the buffer. Right now I've only tested this with my miniapp writing its own GPRT callback but it does work.
So the downstream app runs something like:
XDG::ray_fire_prepared()to trace the populated raysI still need to write an equivalent
point_in_volume_prepared()This avoids the unnecessary host-device transfers that the array versions of
ray_fire()andpoint_in_volume()suffer from by allowing users to write directly to XDG's device buffers without any host-side transfers. I don't actually know if the callback approach would play nicely with OpenMP though, we'll have to see...