Hi, I'm trying to run MyoLegs (80-muscle model from musclemimic-models) on an RTX 5080 and hitting a consistent crash in tendon_velocity kernel loading.
What happens
Calling mjw.step() fails with CUDA error 700: illegal memory access during the tendon_velocity module load. Simple models (humanoid, small tendon models) work fine — the issue only shows up with complex spatial tendon models like MyoLegs (80 tendons, nwrap=324).
Setup
- RTX 5080 (Blackwell, sm_120)
- Driver 590.48.01 (CUDA 13.1) — also tried 570.211.01 (CUDA 12.8), same result
- mujoco-warp 3.6.0, warp-lang 1.12.1, mujoco 3.6.0
- Python 3.11.9, Ubuntu 22.04
Minimal repro
import mujoco, mujoco_warp as mjw, warp as wp
import musclemimic_models
mjm = mujoco.MjModel.from_xml_path(str(musclemimic_models.get_xml_path("myofullbody")))
m = mjw.put_model(mjm)
d = mjw.make_data(mjm, nworld=1)
mjw.step(m, d) # crashes here
Module _tendon_velocity_..._8e72f5af load on device 'cuda:0' took 43.99 ms (error)
Exception: Failed to load CUDA module '_tendon_velocity__locals__tendon_velocity_8e72f5af'
What I've found so far
I spent some time digging into this and it seems to be a module loading order issue, not a problem with the kernel itself:
- Calling sub-functions individually (
kinematics(), tendon(), fwd_velocity(), etc.) all work fine
- But when
forward() runs them together, tendon_velocity fails to load after the larger modules (smooth, constraint, CCD) are already loaded
- If I pre-compile
fwd_velocity() before anything else, tendon_velocity passes — but then a different module fails instead
- A simple chain model with the same nv=34 loads 19+ modules successfully, so it's not a hard module count limit
- All built-in test models (test_data/tendon/) pass, but they only have 3-4 tendons
Basically it seems like the combination of many large compiled modules + launch_tiled for tendon_velocity hits some issue specific to sm_120.
| Model |
nv |
ntendon |
nwrap |
Result |
| test_data/tendon/site.xml |
3 |
4 |
0 |
✅ |
| chain model (28 joints) |
34 |
27 |
0 |
✅ |
| chain (5 joints, 80 tendons) |
11 |
80 |
0 |
✅ |
| MyoLegs (80 muscles) |
34 |
80 |
324 |
❌ |
| MyoFullBody (416 muscles) |
128 |
424 |
~1600 |
❌ |
I saw that Blackwell is officially supported (RTX PRO 6000 benchmarks), so I'm guessing this might be an edge case that hasn't been tested with large spatial tendon models on consumer Blackwell GPUs.
Any pointers would be appreciated — happy to provide more info or test patches.
Hi, I'm trying to run MyoLegs (80-muscle model from musclemimic-models) on an RTX 5080 and hitting a consistent crash in
tendon_velocitykernel loading.What happens
Calling
mjw.step()fails withCUDA error 700: illegal memory accessduring thetendon_velocitymodule load. Simple models (humanoid, small tendon models) work fine — the issue only shows up with complex spatial tendon models like MyoLegs (80 tendons, nwrap=324).Setup
Minimal repro
What I've found so far
I spent some time digging into this and it seems to be a module loading order issue, not a problem with the kernel itself:
kinematics(),tendon(),fwd_velocity(), etc.) all work fineforward()runs them together,tendon_velocityfails to load after the larger modules (smooth, constraint, CCD) are already loadedfwd_velocity()before anything else,tendon_velocitypasses — but then a different module fails insteadBasically it seems like the combination of many large compiled modules +
launch_tiledfortendon_velocityhits some issue specific to sm_120.I saw that Blackwell is officially supported (RTX PRO 6000 benchmarks), so I'm guessing this might be an edge case that hasn't been tested with large spatial tendon models on consumer Blackwell GPUs.
Any pointers would be appreciated — happy to provide more info or test patches.