Skip to content

Conversation

@monofuel
Copy link
Owner

@monofuel monofuel commented Jul 26, 2025

  • no actual fix yet, just defining the problem

  • the issue does not reproduce on HIP-CPU or SIMPLE backends because they are not picky about device functions. this is expected.

  • this issue reproduces on both AMD and on NVIDIA (different error messages of course but same underlying trickyness)

/home/monofuel/.cache/nim/test_dot_product_d/@zgrfg_qbg_cebqhpg.nim.cpp:340:4: error: no matching function for call to 'pluseq___cherZfgehgvyf_u2605'
  340 |                         pluseq___cherZfgehgvyf_u2605((*(&temp_1)), ((NF)(aArray_1[tid_1]) * (NF)(bArray_1[tid_1])));
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/monofuel/.cache/nim/test_dot_product_d/@zgrfg_qbg_cebqhpg.nim.cpp:324:23: note: candidate function not viable: call to __host__ function from __global__ function
  324 | static N_INLINE(void, pluseq___cherZfgehgvyf_u2605)(NF& x_p0, NF y_p1) {
  • I have a few theoretical solutions?

  • option A: somehow get these inline functions annotated as __host__ __device__ so they will work anywhere.

  • option B: convince the nim compiler to not use fancy inline functions like this. not sure if there's an easy switch or not.

  • option C: could we somehow traverse the function tree and automatically annotate functions? this would have to happen near the end of the compiling process. This could tie into the hippoAutoGlobal pragma that I've wanted. annotate a function as auto global and have it automatically annotate all sub functions as __host__ __device__

  • inline functions are totally great and work on GPU, they just need to be annotated __device__ (or __host__ __device__)

@monofuel
Copy link
Owner Author

static N_INLINE(void, pluseq___cherZfgehgvyf_u2605)(NF& x_p0, NF y_p1) {
    nimfr_("+=", "/nix/store/lrxi3xc3sn1hwbv94l4iisa7v8k4cpnl-nim-unwrapped-2.2.4/nim/lib/system/arithmetics.nim");
    nimlf_(330, "/nix/store/lrxi3xc3sn1hwbv94l4iisa7v8k4cpnl-nim-unwrapped-2.2.4/nim/lib/system/arithmetics.nim");    x_p0 = ((NF)(x_p0) + (NF)(y_p1));
    popFrame();
}

https://github.com/nim-lang/Nim/blob/e855019f84e93e01022ad57aa3624c8e4d237486/lib/nimbase.h#L93

@monofuel
Copy link
Owner Author

Maybe one solution could be to modify nimbase.h to automatically annotate inline functions as __host__ __device__ always, as well as inilne? I don't really like that, that sounds super gross...

@monofuel
Copy link
Owner Author

monofuel commented Jul 26, 2025

I got one working solution! we can emit a redefinition of N_INLINE that also annotates as __device__ __host__. this is sloppy but does the job.

{.emit: """
#ifdef __HIPCC__
  // Redefine N_INLINE before nimbase.h uses it
  #ifdef N_INLINE
    #undef N_INLINE
  #endif
  #define N_INLINE(rettype, name) __device__ __host__ inline rettype name
  
  // Also handle N_NOINLINE for completeness
  #ifdef N_NOINLINE  
    #undef N_NOINLINE
  #endif
  #define N_NOINLINE(rettype, name) __device__ __host__ __attribute__((__noinline__)) rettype name
#endif
""".}

HOWEVER this also requires that --stackTrace:off is set when compiling. Some functions try to do some stack trace work, and this makes the compiler unhappy to see device functions accessing host variables (because that's a very bad thing to do)

I'm not totally happy with this as it annotates all inline functions, not limited to only the ones we care about. I also don't like turning off all stacktraces. wonder if we could stub out stacktraces on device functions or something, or have some way to override them, hmmm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants