Skip to content

Conversation

@zahiqbal
Copy link

@zahiqbal zahiqbal commented Oct 2, 2025

Motivation

Porting All fixes on rocm-jaxlib-v0.7.1 from rocm-jaxlib-v0.6.0

Technical Details

cherry-pick following comits...
#git cherry-pick -x b03cd94 # Rocm jaxlib v0.5.0 warpsize global (#177)
#git cherry-pick -x d3f94e9 # Added support for waves_per_eu function attribute. (#181)
#git cherry-pick -x 109e138 # removed two line change (revert of half of the openxla#25959 commit
#git cherry-pick -x 28f10a0 # Fixes for jax 0.6.0 (#207)
#git cherry-pick -x f814bff # Add hipBLASLt support for gfx11. (#301)
#git cherry-pick -x 510ea06 # Add bf16 starting from gfx11, bugfix & optimize RocmComputeCapability (#303)
#git cherry-pick -x fc9e3c3 # [ROCm] Use bundled bitcode files (#196)
#git cherry-pick -x fb6ddfb # Add MIOPEN_FIND_ENFORCE For ROCm 7 for convolution gemms (#312)
#git cherry-pick -x 13c3de1 # Restore RocmComputeCapability:: gfx11_rx7900() and gfx12_rx8900() methods (#333)
#git cherry-pick -x 8513f2d # Make device_count_ atomic (#343)
#git cherry-pick -x f3e170a # fix hardcoded max registers (#345)
#git cherry-pick -x 9cfa74a # fix hardcoded ecc enabled (#348)
#git cherry-pick -x 0015d0e # remove reserved memory (#349)
#git cherry-pick -x c815420 # Add rocm_dev config for remote caching (#353)
#git cherry-pick -x 85548a7 # added rocm7 support to EnablePeerAccess (#347)
#git cherry-pick -x edab8b2 # [ROCm] Disable Cudnn fusions (#358)
#git cherry-pick -x 910f107 # Fix invalid hash and fix missing dependency for rocm 6.4 (#370)

Test Plan

Test Result

Local test results
Executed 278 out of 281 tests: 278 tests pass and 3 fail to build.

Submission Checklist

@zahiqbal zahiqbal requested a review from hsharsha October 2, 2025 22:23
cj401-amd and others added 2 commits October 3, 2025 09:18
upstream PR: openxla/pull/29769

Squash following commits..
Update rocprofiler-sdk (v3) along with roctracer (v1) for rocm-jaxlib-v0.6.0 (#302)

* update for integration of rocprofiler-sdk (along with roctracer as a backup based on bazel_options from CLI)

(cherry picked from commit 7775dd0)

use VLOG(2) to replace LOG(INFO), so PGLE has no verbose info (#357)

(cherry picked from commit 5950125)

update with kernel details for rocm-7.x (#364)

* update with kernel details for rocm-7.x

(cherry picked from commit 5597c0d)

update to remove previously hard-coded rocprofiler-sdk path (#369)

* update to remove previously hard-coded rocprofiler-sdk path and add skip_rocprofiler_sdk to avoid loading `rocprofiler-sdk`

(cherry picked from commit ff74b5f)
@zahiqbal zahiqbal force-pushed the misc_fixes_porting_from_0.6.0 branch from c311726 to 6743efd Compare October 3, 2025 15:44
---------

Co-authored-by: Pavel Emeliyanenko <[email protected]>
(cherry picked from commit f013645)
(cherry picked from commit b03cd94)

Added support for waves_per_eu function attribute. (#181)

(cherry picked from commit bc1d816)
(cherry picked from commit d3f94e9)

removed two line change (revert of half of the openxla#25959 commit

(cherry picked from commit 109e138)

Fixes for jax 0.6.0 (#207)

* Add fixes for jax plugin 0.6.0

Drop NEEDED linking to unnecessary libs.
These are loaded by amdhipruntime and not us.

Fix missing NEEDED on MIOpen shared object.

* Minor rocblas related changes for rocm 70

(cherry picked from commit 0de7d49)

---------

Co-authored-by: Zoran Jovanovic <[email protected]>
(cherry picked from commit 28f10a0)

Add hipBLASLt support for gfx11. (#301)

(cherry picked from commit f814bff)

Add bf16 starting from gfx11, bugfix & optimize RocmComputeCapability (#303)

* Bugfix and improve device_description.h::RocmComputeCompatibility

* Enable ALG_DOT_BF16* on rocm with HW support

(cherry picked from commit 510ea06)

[ROCm] Use bundled bitcode files (#196)

Also trim bitcode file list to ockl.bc and ocml.bc only.

(cherry picked from commit fc9e3c3)

Add MIOPEN_FIND_ENFORCE For ROCm 7 for convolution gemms (#312)

* Add MIOPEN_FIND_ENFORCE For ROCm 7 for convolution gemms

* Exclude failing CollectiveOpsE2E tests

(cherry picked from commit fb6ddfb)

Restore RocmComputeCapability:: gfx11_rx7900() and gfx12_rx8900() methods (#333)

At least gfx11_rx7900() is still needed for TF build.

(cherry picked from commit 13c3de1)

Make device_count_ atomic (#343)

* Make device_count_ atomic

* Use relaxed memory order

* Fix build error

(cherry picked from commit 8513f2d)

fix hardcoded max registers (#345)

(cherry picked from commit f3e170a)

fix hardcoded ecc enabled (#348)

(cherry picked from commit 9cfa74a)

remove reserved memory (#349)

(cherry picked from commit 0015d0e)

Add rocm_dev config for remote caching (#353)

(cherry picked from commit c815420)

added rocm7 support to EnablePeerAccess (#347)

* added rocm7 support to EnablePeerAccess

* use wrap namespace, clang-format and add comments

(cherry picked from commit 85548a7)

[ROCm] Disable Cudnn fusions (#358)

(cherry picked from commit edab8b2)
@zahiqbal zahiqbal force-pushed the misc_fixes_porting_from_0.6.0 branch from 6743efd to 00b0fa0 Compare October 5, 2025 05:23
@hsharsha hsharsha merged commit 30c0943 into rocm-jaxlib-v0.7.1 Oct 6, 2025
6 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants