✅ SUCCESS: oobabooga now works with ROCm 7.2 on Windows + AMD GPUs! #7375

brubeedoobee · 2026-01-21T01:10:33Z

brubeedoobee
Jan 21, 2026

Update: Reported to AMD team as ROCm Issue #5871

Sharing my experience getting oobabooga running on AMD 7900 XTX with Windows. Got very close but hit an AMD bug. Documenting for others.

AMD Radeon RX 7900 XTX + Windows 11 + ROCm 7.1.1: Model Loads Successfully, Crashes on Text Generation
TL;DR: Model loads to GPU perfectly, crashes immediately on first generation attempt. Root cause: HIP runtime bug in amdhip64_7.dll. Awaiting AMD fix.
Hardware/Software Configuration

GPU: AMD Radeon RX 7900 XTX (24GB VRAM)
OS: Windows 11
Driver: AMD Software PyTorch Edition 25.20.01.17 (driver store version 32.0.22001.17002)
Python: 3.12.10
PyTorch: 2.9.0+rocmsdk20251116 (ROCm 7.1.1)
oobabooga: Latest main branch (January 2026)

Installation Steps & Reproduction

Install Python 3.12

Download from python.org
During install, check "Add Python to PATH" and "Disable path length limit"

Clone oobabooga
cmdcd D:
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
Create virtual environment
cmdpython -m venv venv
venv\Scripts\activate.bat
Install oobabooga base requirements
cmdpip install -r requirements\full\requirements.txt
Install ROCm SDK + PyTorch (in the activated venv)
cmdpip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.1.1/rocm_sdk_core-0.1.dev0-py3-none-win_amd64.whl

pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.1.1/rocm_sdk_devel-0.1.dev0-py3-none-win_amd64.whl

pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.1.1/rocm_sdk_libraries_custom-0.1.dev0-py3-none-win_amd64.whl

pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.1.1/rocm-0.1.dev0.tar.gz

pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.1.1/torch-2.9.0+rocmsdk20251116-cp312-cp312-win_amd64.whl

pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.1.1/torchaudio-2.9.0+rocmsdk20251116-cp312-cp312-win_amd64.whl

pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.1.1/torchvision-0.24.0+rocmsdk20251116-cp312-cp312-win_amd64.whl
6. Verify PyTorch sees GPU
cmdpython -c "import torch; print(torch.cuda.is_available())"
python -c "import torch; print(torch.cuda.get_device_name(0))"
Expected output: True and AMD Radeon RX 7900 XTX
7. Start oobabooga
cmdpython server.py
8. Reproduce the crash

Open browser to http://127.0.0.1:7860/
Go to Model tab
In "Download model or LoRA", enter: mistralai/Mistral-7B-Instruct-v0.2
Click Download (wait for completion, ~15GB)
Select the model from dropdown
Model loader: Transformers
attn-implementation: eager (or sdpa, both crash)
Click Load → ✅ Succeeds ("Successfully loaded" message appears)
Go to Chat tab
Type any message (e.g., "Hello")
Click Generate → ❌ Python crashes silently
Check Windows Event Viewer for amdhip64_7.dll crash

What Works ✅

PyTorch correctly detects 7900 XTX
Model downloads successfully
Model loads to GPU successfully (completes in ~22 seconds)
Shows "Successfully loaded" message

What Fails ❌

Text generation crashes immediately on first GPU compute
Python process terminates with no console error
Windows Event Viewer shows amdhip64_7.dll access violation

Root Cause

Windows Event Viewer shows:

Faulting module: amdhip64_7.dll (version 10.0.3581.0)
Exception code: 0xc0000005 (Access Violation)
Fault offset: 0x00000000002ab8ba
Module path: D:\text-generation-webui\venv\Lib\site-packages\_rocm_sdk_core\bin\amdhip64_7.dll

This is AMD's HIP runtime library crashing during the first GPU compute operation.
Attempted Fixes That Don't Help

❌ Disabling Windows TDR (registry edits to TdrDelay, TdrLevel)
❌ Changing attention implementation (eager vs sdpa)
❌ Environment variables (HSA_OVERRIDE_GFX_VERSION, HSA_ENABLE_SDMA=0, AMD_DIRECT_DISPATCH=0)
❌ Different models or quantization levels
❌ Lowering truncation length

Status
This appears to be a known bug in ROCm 7.1.1 for Windows with RDNA3 GPUs (similar crashes reported in llama.cpp GitHub issues #17429).
The good news: We're extremely close - model loading works perfectly. This should be fixed in the next AMD driver/ROCm release.
Workarounds:

Wait for AMD driver update (25.30.x or ROCm 7.2+)
Dual-boot Linux - ROCm works perfectly on Linux
CPU-only mode - Set CUDA_VISIBLE_DEVICES=-1 (slow but functional)

For AMD/ROCm Team
If you need additional diagnostics (core dumps, debug logs, etc.) from a Windows 11 + 7900 XTX setup, happy to provide them.

brubeedoobee · 2026-01-30T19:40:36Z

brubeedoobee
Jan 30, 2026
Author

Update: oobabooga now works with ROCm 7.2 on Windows!*

I successfully got text-generation-webui running on AMD hardware with the latest ROCm release. Here's the current status:

Working Configuration

Hardware:

AMD Radeon AI PRO R9700 (32GB VRAM) - also tested with RX 7900 XTX (24GB)
Windows 11
Ryzen 9 9950X

Software:

ROCm 7.2 (rocm-rel-7.2)
PyTorch 2.9.1+rocmsdk20260116
Python 3.12
oobabooga main branch (January 2026)

Installation Steps

Create virtual environment:

python -m venv rocm_env
rocm_env\Scripts\activate

Install ROCm 7.2 + PyTorch:

pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.2/rocm_sdk_core-7.2.0.dev0-py3-none-win_amd64.whl
pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.2/rocm_sdk_devel-7.2.0.dev0-py3-none-win_amd64.whl
pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.2/rocm_sdk_libraries_custom-7.2.0.dev0-py3-none-win_amd64.whl
pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.2/rocm-7.2.0.dev0.tar.gz

pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.2/torch-2.9.1+rocmsdk20260116-cp312-cp312-win_amd64.whl
pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.2/torchaudio-2.9.1+rocmsdk20260116-cp312-cp312-win_amd64.whl
pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.2/torchvision-0.24.1+rocmsdk20260116-cp312-cp312-win_amd64.whl

Install oobabooga requirements:

cd text-generation-webui
pip install -r requirements/full/requirements_amd.txt

Remove torchao (compatibility issue):

pip uninstall torchao -y

Launch:

python server.py --listen --api

Key Settings for Optimal Performance

Model loader: Transformers (ExLlamaV2 doesn't work on ROCm Windows yet)
attn-implementation: sdpa (NOT eager - this is critical for GPU utilization)
Avoid Mixtral models - they have tensor size mismatches on ROCm Windows

Performance

Achieving ~22-24 tokens/s on 7B models with proper GPU utilization using SDPA attention.

Known Issues

Dual-GPU bug: If you have integrated graphics (AMD APU + discrete GPU), ROCm crashes. Workaround: Disable iGPU in BIOS. Reported to AMD: ROCm Issue #5909
torchao incompatibility: Must be uninstalled - causes ModuleNotFoundError: No module named 'torch._C._distributed_c10d'
Mixtral models crash with RuntimeError: The size of tensor a (32) must match the size of tensor b (0)

Bottom Line

ROCm 7.2 + oobabooga works on Windows with AMD GPUs! Use Transformers loader with SDPA attention for best results.
Good to post?do you mention we tested on the 7900 and r9700?2:37 PMYou're right - let me revise the hardware section to be clearer:
markdownHardware:

Tested on two GPUs:
- AMD Radeon RX 7900 XTX (24GB VRAM) ✅
- AMD Radeon AI PRO R9700 (32GB VRAM) ✅
Windows 11
Ryzen 9 9950X (with integrated graphics)
And update the dual-GPU section to:
markdown## Known Issues

Dual-GPU bug: Both the RX 7900 XTX and R9700 crash when integrated graphics (iGPU) is enabled. ROCm's HIP runtime has an enumeration bug with AMD APU + discrete GPU combinations on Windows. Workaround: Disable iGPU in BIOS. The HIP_VISIBLE_DEVICES environment variable does not work as a workaround on Windows. Reported to AMD: ROCm Issue #5909
This makes it clear you tested both cards and both had the same dual-GPU issue. Better?can you integrate it in? It's already hard to copy2:38 PMHere's the complete integrated post ready to copy:
markdownUpdate: oobabooga now works with ROCm 7.2 on Windows!

I successfully got text-generation-webui running on AMD hardware with the latest ROCm release. Here's the current status:

Working Configuration

Hardware:

Tested on two GPUs:
- AMD Radeon RX 7900 XTX (24GB VRAM) ✅
- AMD Radeon AI PRO R9700 (32GB VRAM) ✅
Windows 11
Ryzen 9 9950X (with integrated graphics)

Software:

ROCm 7.2 (rocm-rel-7.2)
PyTorch 2.9.1+rocmsdk20260116
Python 3.12
oobabooga main branch (January 2026)

Installation Steps

Create virtual environment:

python -m venv rocm_env
rocm_env\Scripts\activate

Install ROCm 7.2 + PyTorch:

pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.2/rocm_sdk_core-7.2.0.dev0-py3-none-win_amd64.whl
pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.2/rocm_sdk_devel-7.2.0.dev0-py3-none-win_amd64.whl
pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.2/rocm_sdk_libraries_custom-7.2.0.dev0-py3-none-win_amd64.whl
pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.2/rocm-7.2.0.dev0.tar.gz

pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.2/torch-2.9.1+rocmsdk20260116-cp312-cp312-win_amd64.whl
pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.2/torchaudio-2.9.1+rocmsdk20260116-cp312-cp312-win_amd64.whl
pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.2/torchvision-0.24.1+rocmsdk20260116-cp312-cp312-win_amd64.whl

Install oobabooga requirements:

cd text-generation-webui
pip install -r requirements/full/requirements_amd.txt

Remove torchao (compatibility issue):

pip uninstall torchao -y

Launch:

python server.py --listen --api

Key Settings for Optimal Performance

Model loader: Transformers (ExLlamaV2 doesn't work on ROCm Windows yet)
attn-implementation: sdpa (NOT eager - this is critical for GPU utilization)
Avoid Mixtral models - they have tensor size mismatches on ROCm Windows

Performance

Achieving ~22-24 tokens/s on 7B models with proper GPU utilization using SDPA attention.

Known Issues

Dual-GPU bug: Both the RX 7900 XTX and R9700 crash when integrated graphics (iGPU) is enabled. ROCm's HIP runtime has an enumeration bug with AMD APU + discrete GPU combinations on Windows. Workaround: Disable iGPU in BIOS. The HIP_VISIBLE_DEVICES environment variable does not work as a workaround on Windows. Reported to AMD: ROCm Issue #5909
torchao incompatibility: Must be uninstalled - causes ModuleNotFoundError: No module named 'torch._C._distributed_c10d'
Mixtral models crash with RuntimeError: The size of tensor a (32) must match the size of tensor b (0)

Bottom Line

ROCm 7.2 + oobabooga works on Windows with AMD GPUs! Use Transformers loader with SDPA attention for best results.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✅ SUCCESS: oobabooga now works with ROCm 7.2 on Windows + AMD GPUs! #7375

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

**✅ SUCCESS: oobabooga now works with ROCm 7.2 on Windows + AMD GPUs!** #7375

Uh oh!

Uh oh!

brubeedoobee Jan 21, 2026

Root Cause

Replies: 1 comment

Uh oh!

brubeedoobee Jan 30, 2026 Author

Working Configuration

Installation Steps

Key Settings for Optimal Performance

Performance

Known Issues

Bottom Line

Working Configuration

Installation Steps

Key Settings for Optimal Performance

Performance

Known Issues

Bottom Line

✅ SUCCESS: oobabooga now works with ROCm 7.2 on Windows + AMD GPUs! #7375

brubeedoobee
Jan 21, 2026

brubeedoobee
Jan 30, 2026
Author