[WIP] fix inline procs #12
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
no actual fix yet, just defining the problem
the issue does not reproduce on HIP-CPU or SIMPLE backends because they are not picky about device functions. this is expected.
this issue reproduces on both AMD and on NVIDIA (different error messages of course but same underlying trickyness)
I have a few theoretical solutions?
option A: somehow get these inline functions annotated as
__host____device__so they will work anywhere.option B: convince the nim compiler to not use fancy inline functions like this. not sure if there's an easy switch or not.
option C: could we somehow traverse the function tree and automatically annotate functions? this would have to happen near the end of the compiling process. This could tie into the
hippoAutoGlobalpragma that I've wanted. annotate a function as auto global and have it automatically annotate all sub functions as__host____device__inlinefunctions are totally great and work on GPU, they just need to be annotated__device__(or__host____device__)