Skip to content

Conversation

@amd-songpiao
Copy link

@Arech8
Copy link

Arech8 commented Sep 15, 2025

Thanks, Songlin! I'll test it on my instance.

Did you test it on the ROCm 6.4 ?

@Arech8
Copy link

Arech8 commented Sep 15, 2025

Doesn't crash anymore (build 46), thanks!
Leaving the approval to XLA guys...

What about upstreaming this and the previous change, btw? cc @i-chaochen

@amd-songpiao
Copy link
Author

Thanks, Songlin! I'll test it on my instance.

Did you test it on the ROCm 6.4 ?

Yes, rocm6.4.1 works fine.

Copy link
Collaborator

@i-chaochen i-chaochen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please upstream this change to openxla, thanks!


if (result != hipSuccess && result != hipErrorPeerAccessAlreadyEnabled) {
if (result == hipErrorPeerAccessAlreadyEnabled) {
hipGetLastError();
Copy link
Collaborator

@i-chaochen i-chaochen Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this hipGetLastError(); ok?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a similar implementation in rccl https://github.com/ROCm/rccl/blob/6e45eaf75ed900b1e5409bfbc9722a5a7d898159/src/transport/p2p.cc#L351. hipGetLastError is used to reset per thread error state, as hipGetLastError would get the recent error code since rocm7 even the last call is successful.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

omg... whoever made a getter function to modify the state along with each of approvers of that, must be fired with a life-time prohibition of doing any engineering job...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, rccl has this implementation but I doubt openxla upstream is ok to let this code merge...

@i-chaochen i-chaochen requested review from zoranjovanovic-ns and removed request for draganmladjenovic September 15, 2025 16:18
@i-chaochen
Copy link
Collaborator

i-chaochen commented Sep 15, 2025

In their rocm70 container, could you please run all single and multi gpu UTs w/o your fix, and see which XLA UTs can detect this error? Because I think we need to add UT for this APIs (or it's already covered by some UTs)

@i-chaochen
Copy link
Collaborator

@Arech8 I know this error is detected by TE UT, wondering do you know is any JAX UTs can reproduce this? Or we shall add one UT in JAx side?

@amd-songpiao
Copy link
Author

In their rocm70 container, could you please run all single and multi gpu UTs w/o your fix, and see which XLA UTs can detect this error? Because I think we need to add UT for this APIs (or it's already covered by some UTs)

registry-sc-harbor.amd.com/rocm-ci-images/compute-rocm-rel-7.0:46-ubuntu-22.04

I will test it.

@Arech8
Copy link

Arech8 commented Sep 16, 2025

@i-chaochen , I'm not aware of any other test in JAX and XLA that could replicate that. Given that this fails on MI300 and that we run tests on it fairly regularly, I'd guess that this code path might not be covered by the tests. So, I think, ideally - someone should ensure no XLA test covers it, and, if confirmed, make an explicit test..

@amd-songpiao amd-songpiao force-pushed the ci_add_rocm7_support_peer_access_v0.6.0 branch from 48c506b to c80fa69 Compare September 17, 2025 10:01
@amd-songpiao amd-songpiao requested review from draganmladjenovic and removed request for zoranjovanovic-ns September 17, 2025 10:01
static const char *kName = TO_STR(hipSymbolName); \
void *f; \
auto s = tsl::Env::Default()->GetSymbolFromLibrary( \
auto s = tsl::Env::Default() -> GetSymbolFromLibrary( \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is tsl::Env::Default() is guaranteed to always return a valid (non nullptr) pointer? If so, then why it doesn't return a reference instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just clang-format fix. tsl::Env::Default() has been used in many places already as a pointer in whole xla, maybe google will do refactoring later. I don't see any issue at the moment.

@amd-songpiao amd-songpiao force-pushed the ci_add_rocm7_support_peer_access_v0.6.0 branch from c80fa69 to e23576b Compare September 18, 2025 06:37
@amd-songpiao amd-songpiao merged commit 85548a7 into rocm-jaxlib-v0.6.0 Sep 18, 2025
8 of 9 checks passed
zahiqbal pushed a commit that referenced this pull request Oct 2, 2025
---------

Co-authored-by: Pavel Emeliyanenko <[email protected]>
(cherry picked from commit f013645)
(cherry picked from commit b03cd94)

Added support for waves_per_eu function attribute. (#181)

(cherry picked from commit bc1d816)
(cherry picked from commit d3f94e9)

removed two line change (revert of half of the openxla#25959 commit

(cherry picked from commit 109e138)

Fixes for jax 0.6.0 (#207)

* Add fixes for jax plugin 0.6.0

Drop NEEDED linking to unnecessary libs.
These are loaded by amdhipruntime and not us.

Fix missing NEEDED on MIOpen shared object.

* Minor rocblas related changes for rocm 70

(cherry picked from commit 0de7d49)

---------

Co-authored-by: Zoran Jovanovic <[email protected]>
(cherry picked from commit 28f10a0)

Add hipBLASLt support for gfx11. (#301)

(cherry picked from commit f814bff)

Add bf16 starting from gfx11, bugfix & optimize RocmComputeCapability (#303)

* Bugfix and improve device_description.h::RocmComputeCompatibility

* Enable ALG_DOT_BF16* on rocm with HW support

(cherry picked from commit 510ea06)

[ROCm] Use bundled bitcode files (#196)

Also trim bitcode file list to ockl.bc and ocml.bc only.

(cherry picked from commit fc9e3c3)

Add MIOPEN_FIND_ENFORCE For ROCm 7 for convolution gemms (#312)

* Add MIOPEN_FIND_ENFORCE For ROCm 7 for convolution gemms

* Exclude failing CollectiveOpsE2E tests

(cherry picked from commit fb6ddfb)

Restore RocmComputeCapability:: gfx11_rx7900() and gfx12_rx8900() methods (#333)

At least gfx11_rx7900() is still needed for TF build.

(cherry picked from commit 13c3de1)

Make device_count_ atomic (#343)

* Make device_count_ atomic

* Use relaxed memory order

* Fix build error

(cherry picked from commit 8513f2d)

fix hardcoded max registers (#345)

(cherry picked from commit f3e170a)

fix hardcoded ecc enabled (#348)

(cherry picked from commit 9cfa74a)

remove reserved memory (#349)

(cherry picked from commit 0015d0e)

Add rocm_dev config for remote caching (#353)

(cherry picked from commit c815420)

added rocm7 support to EnablePeerAccess (#347)

* added rocm7 support to EnablePeerAccess

* use wrap namespace, clang-format and add comments

(cherry picked from commit 85548a7)

[ROCm] Disable Cudnn fusions (#358)

(cherry picked from commit edab8b2)
zahiqbal pushed a commit that referenced this pull request Oct 2, 2025
---------

Co-authored-by: Pavel Emeliyanenko <[email protected]>
(cherry picked from commit f013645)
(cherry picked from commit b03cd94)

Added support for waves_per_eu function attribute. (#181)

(cherry picked from commit bc1d816)
(cherry picked from commit d3f94e9)

removed two line change (revert of half of the openxla#25959 commit

(cherry picked from commit 109e138)

Fixes for jax 0.6.0 (#207)

* Add fixes for jax plugin 0.6.0

Drop NEEDED linking to unnecessary libs.
These are loaded by amdhipruntime and not us.

Fix missing NEEDED on MIOpen shared object.

* Minor rocblas related changes for rocm 70

(cherry picked from commit 0de7d49)

---------

Co-authored-by: Zoran Jovanovic <[email protected]>
(cherry picked from commit 28f10a0)

Add hipBLASLt support for gfx11. (#301)

(cherry picked from commit f814bff)

Add bf16 starting from gfx11, bugfix & optimize RocmComputeCapability (#303)

* Bugfix and improve device_description.h::RocmComputeCompatibility

* Enable ALG_DOT_BF16* on rocm with HW support

(cherry picked from commit 510ea06)

[ROCm] Use bundled bitcode files (#196)

Also trim bitcode file list to ockl.bc and ocml.bc only.

(cherry picked from commit fc9e3c3)

Add MIOPEN_FIND_ENFORCE For ROCm 7 for convolution gemms (#312)

* Add MIOPEN_FIND_ENFORCE For ROCm 7 for convolution gemms

* Exclude failing CollectiveOpsE2E tests

(cherry picked from commit fb6ddfb)

Restore RocmComputeCapability:: gfx11_rx7900() and gfx12_rx8900() methods (#333)

At least gfx11_rx7900() is still needed for TF build.

(cherry picked from commit 13c3de1)

Make device_count_ atomic (#343)

* Make device_count_ atomic

* Use relaxed memory order

* Fix build error

(cherry picked from commit 8513f2d)

fix hardcoded max registers (#345)

(cherry picked from commit f3e170a)

fix hardcoded ecc enabled (#348)

(cherry picked from commit 9cfa74a)

remove reserved memory (#349)

(cherry picked from commit 0015d0e)

Add rocm_dev config for remote caching (#353)

(cherry picked from commit c815420)

added rocm7 support to EnablePeerAccess (#347)

* added rocm7 support to EnablePeerAccess

* use wrap namespace, clang-format and add comments

(cherry picked from commit 85548a7)

[ROCm] Disable Cudnn fusions (#358)

(cherry picked from commit edab8b2)
@zahiqbal zahiqbal mentioned this pull request Oct 2, 2025
1 task
zahiqbal pushed a commit that referenced this pull request Oct 3, 2025
---------

Co-authored-by: Pavel Emeliyanenko <[email protected]>
(cherry picked from commit f013645)
(cherry picked from commit b03cd94)

Added support for waves_per_eu function attribute. (#181)

(cherry picked from commit bc1d816)
(cherry picked from commit d3f94e9)

removed two line change (revert of half of the openxla#25959 commit

(cherry picked from commit 109e138)

Fixes for jax 0.6.0 (#207)

* Add fixes for jax plugin 0.6.0

Drop NEEDED linking to unnecessary libs.
These are loaded by amdhipruntime and not us.

Fix missing NEEDED on MIOpen shared object.

* Minor rocblas related changes for rocm 70

(cherry picked from commit 0de7d49)

---------

Co-authored-by: Zoran Jovanovic <[email protected]>
(cherry picked from commit 28f10a0)

Add hipBLASLt support for gfx11. (#301)

(cherry picked from commit f814bff)

Add bf16 starting from gfx11, bugfix & optimize RocmComputeCapability (#303)

* Bugfix and improve device_description.h::RocmComputeCompatibility

* Enable ALG_DOT_BF16* on rocm with HW support

(cherry picked from commit 510ea06)

[ROCm] Use bundled bitcode files (#196)

Also trim bitcode file list to ockl.bc and ocml.bc only.

(cherry picked from commit fc9e3c3)

Add MIOPEN_FIND_ENFORCE For ROCm 7 for convolution gemms (#312)

* Add MIOPEN_FIND_ENFORCE For ROCm 7 for convolution gemms

* Exclude failing CollectiveOpsE2E tests

(cherry picked from commit fb6ddfb)

Restore RocmComputeCapability:: gfx11_rx7900() and gfx12_rx8900() methods (#333)

At least gfx11_rx7900() is still needed for TF build.

(cherry picked from commit 13c3de1)

Make device_count_ atomic (#343)

* Make device_count_ atomic

* Use relaxed memory order

* Fix build error

(cherry picked from commit 8513f2d)

fix hardcoded max registers (#345)

(cherry picked from commit f3e170a)

fix hardcoded ecc enabled (#348)

(cherry picked from commit 9cfa74a)

remove reserved memory (#349)

(cherry picked from commit 0015d0e)

Add rocm_dev config for remote caching (#353)

(cherry picked from commit c815420)

added rocm7 support to EnablePeerAccess (#347)

* added rocm7 support to EnablePeerAccess

* use wrap namespace, clang-format and add comments

(cherry picked from commit 85548a7)

[ROCm] Disable Cudnn fusions (#358)

(cherry picked from commit edab8b2)
zahiqbal pushed a commit that referenced this pull request Oct 5, 2025
---------

Co-authored-by: Pavel Emeliyanenko <[email protected]>
(cherry picked from commit f013645)
(cherry picked from commit b03cd94)

Added support for waves_per_eu function attribute. (#181)

(cherry picked from commit bc1d816)
(cherry picked from commit d3f94e9)

removed two line change (revert of half of the openxla#25959 commit

(cherry picked from commit 109e138)

Fixes for jax 0.6.0 (#207)

* Add fixes for jax plugin 0.6.0

Drop NEEDED linking to unnecessary libs.
These are loaded by amdhipruntime and not us.

Fix missing NEEDED on MIOpen shared object.

* Minor rocblas related changes for rocm 70

(cherry picked from commit 0de7d49)

---------

Co-authored-by: Zoran Jovanovic <[email protected]>
(cherry picked from commit 28f10a0)

Add hipBLASLt support for gfx11. (#301)

(cherry picked from commit f814bff)

Add bf16 starting from gfx11, bugfix & optimize RocmComputeCapability (#303)

* Bugfix and improve device_description.h::RocmComputeCompatibility

* Enable ALG_DOT_BF16* on rocm with HW support

(cherry picked from commit 510ea06)

[ROCm] Use bundled bitcode files (#196)

Also trim bitcode file list to ockl.bc and ocml.bc only.

(cherry picked from commit fc9e3c3)

Add MIOPEN_FIND_ENFORCE For ROCm 7 for convolution gemms (#312)

* Add MIOPEN_FIND_ENFORCE For ROCm 7 for convolution gemms

* Exclude failing CollectiveOpsE2E tests

(cherry picked from commit fb6ddfb)

Restore RocmComputeCapability:: gfx11_rx7900() and gfx12_rx8900() methods (#333)

At least gfx11_rx7900() is still needed for TF build.

(cherry picked from commit 13c3de1)

Make device_count_ atomic (#343)

* Make device_count_ atomic

* Use relaxed memory order

* Fix build error

(cherry picked from commit 8513f2d)

fix hardcoded max registers (#345)

(cherry picked from commit f3e170a)

fix hardcoded ecc enabled (#348)

(cherry picked from commit 9cfa74a)

remove reserved memory (#349)

(cherry picked from commit 0015d0e)

Add rocm_dev config for remote caching (#353)

(cherry picked from commit c815420)

added rocm7 support to EnablePeerAccess (#347)

* added rocm7 support to EnablePeerAccess

* use wrap namespace, clang-format and add comments

(cherry picked from commit 85548a7)

[ROCm] Disable Cudnn fusions (#358)

(cherry picked from commit edab8b2)
hsharsha pushed a commit that referenced this pull request Oct 6, 2025
* rocprof-sdk addition,
upstream PR: openxla/pull/29769

Squash following commits..
Update rocprofiler-sdk (v3) along with roctracer (v1) for rocm-jaxlib-v0.6.0 (#302)

* update for integration of rocprofiler-sdk (along with roctracer as a backup based on bazel_options from CLI)

(cherry picked from commit 7775dd0)

use VLOG(2) to replace LOG(INFO), so PGLE has no verbose info (#357)

(cherry picked from commit 5950125)

update with kernel details for rocm-7.x (#364)

* update with kernel details for rocm-7.x

(cherry picked from commit 5597c0d)

update to remove previously hard-coded rocprofiler-sdk path (#369)

* update to remove previously hard-coded rocprofiler-sdk path and add skip_rocprofiler_sdk to avoid loading `rocprofiler-sdk`

(cherry picked from commit ff74b5f)

* fixed buffer comparator test

* misc fixes ported from rocm-jaxlib-v0.6.0

---------

Co-authored-by: Pavel Emeliyanenko <[email protected]>
(cherry picked from commit f013645)
(cherry picked from commit b03cd94)

Added support for waves_per_eu function attribute. (#181)

(cherry picked from commit bc1d816)
(cherry picked from commit d3f94e9)

removed two line change (revert of half of the openxla#25959 commit

(cherry picked from commit 109e138)

Fixes for jax 0.6.0 (#207)

* Add fixes for jax plugin 0.6.0

Drop NEEDED linking to unnecessary libs.
These are loaded by amdhipruntime and not us.

Fix missing NEEDED on MIOpen shared object.

* Minor rocblas related changes for rocm 70

(cherry picked from commit 0de7d49)

---------

Co-authored-by: Zoran Jovanovic <[email protected]>
(cherry picked from commit 28f10a0)

Add hipBLASLt support for gfx11. (#301)

(cherry picked from commit f814bff)

Add bf16 starting from gfx11, bugfix & optimize RocmComputeCapability (#303)

* Bugfix and improve device_description.h::RocmComputeCompatibility

* Enable ALG_DOT_BF16* on rocm with HW support

(cherry picked from commit 510ea06)

[ROCm] Use bundled bitcode files (#196)

Also trim bitcode file list to ockl.bc and ocml.bc only.

(cherry picked from commit fc9e3c3)

Add MIOPEN_FIND_ENFORCE For ROCm 7 for convolution gemms (#312)

* Add MIOPEN_FIND_ENFORCE For ROCm 7 for convolution gemms

* Exclude failing CollectiveOpsE2E tests

(cherry picked from commit fb6ddfb)

Restore RocmComputeCapability:: gfx11_rx7900() and gfx12_rx8900() methods (#333)

At least gfx11_rx7900() is still needed for TF build.

(cherry picked from commit 13c3de1)

Make device_count_ atomic (#343)

* Make device_count_ atomic

* Use relaxed memory order

* Fix build error

(cherry picked from commit 8513f2d)

fix hardcoded max registers (#345)

(cherry picked from commit f3e170a)

fix hardcoded ecc enabled (#348)

(cherry picked from commit 9cfa74a)

remove reserved memory (#349)

(cherry picked from commit 0015d0e)

Add rocm_dev config for remote caching (#353)

(cherry picked from commit c815420)

added rocm7 support to EnablePeerAccess (#347)

* added rocm7 support to EnablePeerAccess

* use wrap namespace, clang-format and add comments

(cherry picked from commit 85548a7)

[ROCm] Disable Cudnn fusions (#358)

(cherry picked from commit edab8b2)

---------

Co-authored-by: Chunyu Jin <[email protected]>
Co-authored-by: zoranjovanovic-ns <[email protected]>
hsharsha pushed a commit that referenced this pull request Oct 6, 2025
* rocprof-sdk addition,
upstream PR: openxla/pull/29769

Squash following commits..
Update rocprofiler-sdk (v3) along with roctracer (v1) for rocm-jaxlib-v0.6.0 (#302)

* update for integration of rocprofiler-sdk (along with roctracer as a backup based on bazel_options from CLI)

(cherry picked from commit 7775dd0)

use VLOG(2) to replace LOG(INFO), so PGLE has no verbose info (#357)

(cherry picked from commit 5950125)

update with kernel details for rocm-7.x (#364)

* update with kernel details for rocm-7.x

(cherry picked from commit 5597c0d)

update to remove previously hard-coded rocprofiler-sdk path (#369)

* update to remove previously hard-coded rocprofiler-sdk path and add skip_rocprofiler_sdk to avoid loading `rocprofiler-sdk`

(cherry picked from commit ff74b5f)

* fixed buffer comparator test

* misc fixes ported from rocm-jaxlib-v0.6.0

---------

Co-authored-by: Pavel Emeliyanenko <[email protected]>
(cherry picked from commit f013645)
(cherry picked from commit b03cd94)

Added support for waves_per_eu function attribute. (#181)

(cherry picked from commit bc1d816)
(cherry picked from commit d3f94e9)

removed two line change (revert of half of the openxla#25959 commit

(cherry picked from commit 109e138)

Fixes for jax 0.6.0 (#207)

* Add fixes for jax plugin 0.6.0

Drop NEEDED linking to unnecessary libs.
These are loaded by amdhipruntime and not us.

Fix missing NEEDED on MIOpen shared object.

* Minor rocblas related changes for rocm 70

(cherry picked from commit 0de7d49)

---------

Co-authored-by: Zoran Jovanovic <[email protected]>
(cherry picked from commit 28f10a0)

Add hipBLASLt support for gfx11. (#301)

(cherry picked from commit f814bff)

Add bf16 starting from gfx11, bugfix & optimize RocmComputeCapability (#303)

* Bugfix and improve device_description.h::RocmComputeCompatibility

* Enable ALG_DOT_BF16* on rocm with HW support

(cherry picked from commit 510ea06)

[ROCm] Use bundled bitcode files (#196)

Also trim bitcode file list to ockl.bc and ocml.bc only.

(cherry picked from commit fc9e3c3)

Add MIOPEN_FIND_ENFORCE For ROCm 7 for convolution gemms (#312)

* Add MIOPEN_FIND_ENFORCE For ROCm 7 for convolution gemms

* Exclude failing CollectiveOpsE2E tests

(cherry picked from commit fb6ddfb)

Restore RocmComputeCapability:: gfx11_rx7900() and gfx12_rx8900() methods (#333)

At least gfx11_rx7900() is still needed for TF build.

(cherry picked from commit 13c3de1)

Make device_count_ atomic (#343)

* Make device_count_ atomic

* Use relaxed memory order

* Fix build error

(cherry picked from commit 8513f2d)

fix hardcoded max registers (#345)

(cherry picked from commit f3e170a)

fix hardcoded ecc enabled (#348)

(cherry picked from commit 9cfa74a)

remove reserved memory (#349)

(cherry picked from commit 0015d0e)

Add rocm_dev config for remote caching (#353)

(cherry picked from commit c815420)

added rocm7 support to EnablePeerAccess (#347)

* added rocm7 support to EnablePeerAccess

* use wrap namespace, clang-format and add comments

(cherry picked from commit 85548a7)

[ROCm] Disable Cudnn fusions (#358)

(cherry picked from commit edab8b2)

* Ported all triton related changes from v0.6.0 to v0.7.1

(cherry picked from commit 1851bcc)

Disable softmax triton fusion if triton gemm is off (#281)

* Disable softmax rewriter triton if triton gemm is disabled

* Add specific flag to enable triton softmax fusion

* Address review comments

(cherry picked from commit 51a7f4b)

[ROCm][Triton] Disable transposed load in certain conditions

(cherry picked from commit 50860e9)

Enable unit tests that pass after fixing some Triton related issues. (#285)

* Enable unit tests that pass after fixing some Triton related issues.

* fusion_emitter_device_legacy_test still fails on MI200

(cherry picked from commit 97dd565)

Rocm jaxlib v0.6.0 triton support ut (#279)

* Fixed triton/support_test - no fmfa.

* Fix issue with rounding mode in accelerate amd matmul.

* Fixed issues with usage of mfma in support_test.

(cherry picked from commit 44f7d87)

Restore gpu_triton_custom_call_test (#262)

(cherry picked from commit 32eafa4)

Skipped CanNotEmitTritonCustomCallOnPreAmpereGpu test for ROCM.

(cherry picked from commit 56ec7ec)
(cherry picked from commit b1f3e9f)

fixed createTritonAMDGPULowerInstructionSchedHintsPass (#179)

(cherry picked from commit 8517a3a)
(cherry picked from commit c62e47d)

fixed bazel build issue

---------

Co-authored-by: Chunyu Jin <[email protected]>
Co-authored-by: zoranjovanovic-ns <[email protected]>
Co-authored-by: Alex <[email protected]>
draganmladjenovic pushed a commit that referenced this pull request Nov 5, 2025
* added rocm7 support to EnablePeerAccess

* use wrap namespace, clang-format and add comments
draganmladjenovic pushed a commit that referenced this pull request Nov 19, 2025
* added rocm7 support to EnablePeerAccess

* use wrap namespace, clang-format and add comments
hsharsha pushed a commit that referenced this pull request Nov 20, 2025
* added rocm7 support to EnablePeerAccess

* use wrap namespace, clang-format and add comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants