Skip to content

[ROCM[ Remove padding for gemms#33854

Closed
pemeliya wants to merge 1 commit into
openxla:mainfrom
ROCm:pemeliya/remove-pad-for-gemms-12.11.25
Closed

[ROCM[ Remove padding for gemms#33854
pemeliya wants to merge 1 commit into
openxla:mainfrom
ROCm:pemeliya/remove-pad-for-gemms-12.11.25

Conversation

@pemeliya
Copy link
Copy Markdown
Member

📝 Summary of Changes
Disabled padding for bf16/fp16 gemms on ROCM

🎯 Justification
Padding for gemms was added for parity with NVidia, but upon closer look, it turns out that there is no any advantage of it on ROCM platform. Furthermore, CublasPadForGemms wraps any padded dot op into pad / slice clauses which may prevent certain optimizations (e.g. gemm_rewriter epilogue fusion).

🚀 Kind of Contribution
♻️ Cleanup

🧪 Unit Tests:
Adapted the existing gemm_rewriter_test to account for padding on ROCM

@xla-rotation could you have a look please ?

copybara-service Bot pushed a commit that referenced this pull request Nov 13, 2025
Imported from GitHub PR #33854

📝 Summary of Changes
Disabled padding for bf16/fp16 gemms on ROCM

🎯 Justification
Padding for gemms was added for parity with NVidia, but upon closer look, it turns out that there is no any advantage of it on ROCM platform. Furthermore, CublasPadForGemms wraps any padded dot op into pad / slice clauses which may prevent certain optimizations (e.g. gemm_rewriter epilogue fusion).

🚀 Kind of Contribution
 ♻️ Cleanup

🧪 Unit Tests:
Adapted the existing gemm_rewriter_test to account for padding on ROCM

@xla-rotation could you have a look please ?
Copybara import of the project:

--
8d52de9 by Pavel Emeliyanenko <[email protected]>:

remove pad for gemms

Merging this change closes #33854

FUTURE_COPYBARA_INTEGRATE_REVIEW=#33854 from ROCm:pemeliya/remove-pad-for-gemms-12.11.25 8d52de9
PiperOrigin-RevId: 831724544
copybara-service Bot pushed a commit to tensorflow/tensorflow that referenced this pull request Nov 13, 2025
Imported from GitHub PR openxla/xla#33854

📝 Summary of Changes
Disabled padding for bf16/fp16 gemms on ROCM

🎯 Justification
Padding for gemms was added for parity with NVidia, but upon closer look, it turns out that there is no any advantage of it on ROCM platform. Furthermore, CublasPadForGemms wraps any padded dot op into pad / slice clauses which may prevent certain optimizations (e.g. gemm_rewriter epilogue fusion).

🚀 Kind of Contribution
 ♻️ Cleanup

🧪 Unit Tests:
Adapted the existing gemm_rewriter_test to account for padding on ROCM

@xla-rotation could you have a look please ?
Copybara import of the project:

--
8d52de94e4fee78d999cdc0c28082b09ffc472cb by Pavel Emeliyanenko <[email protected]>:

remove pad for gemms

Merging this change closes #33854

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#33854 from ROCm:pemeliya/remove-pad-for-gemms-12.11.25 8d52de94e4fee78d999cdc0c28082b09ffc472cb
PiperOrigin-RevId: 831724544
copybara-service Bot pushed a commit that referenced this pull request Nov 13, 2025
Imported from GitHub PR #33854

📝 Summary of Changes
Disabled padding for bf16/fp16 gemms on ROCM

🎯 Justification
Padding for gemms was added for parity with NVidia, but upon closer look, it turns out that there is no any advantage of it on ROCM platform. Furthermore, CublasPadForGemms wraps any padded dot op into pad / slice clauses which may prevent certain optimizations (e.g. gemm_rewriter epilogue fusion).

🚀 Kind of Contribution
 ♻️ Cleanup

🧪 Unit Tests:
Adapted the existing gemm_rewriter_test to account for padding on ROCM

@xla-rotation could you have a look please ?
Copybara import of the project:

--
8d52de9 by Pavel Emeliyanenko <[email protected]>:

remove pad for gemms

Merging this change closes #33854

FUTURE_COPYBARA_INTEGRATE_REVIEW=#33854 from ROCm:pemeliya/remove-pad-for-gemms-12.11.25 8d52de9
PiperOrigin-RevId: 831724544
copybara-service Bot pushed a commit to tensorflow/tensorflow that referenced this pull request Nov 13, 2025
Imported from GitHub PR openxla/xla#33854

📝 Summary of Changes
Disabled padding for bf16/fp16 gemms on ROCM

🎯 Justification
Padding for gemms was added for parity with NVidia, but upon closer look, it turns out that there is no any advantage of it on ROCM platform. Furthermore, CublasPadForGemms wraps any padded dot op into pad / slice clauses which may prevent certain optimizations (e.g. gemm_rewriter epilogue fusion).

🚀 Kind of Contribution
 ♻️ Cleanup

🧪 Unit Tests:
Adapted the existing gemm_rewriter_test to account for padding on ROCM

@xla-rotation could you have a look please ?
Copybara import of the project:

--
8d52de94e4fee78d999cdc0c28082b09ffc472cb by Pavel Emeliyanenko <[email protected]>:

remove pad for gemms

Merging this change closes #33854

PiperOrigin-RevId: 831750726
hsharsha added a commit to ROCm/xla that referenced this pull request Jan 8, 2026
copybara-service Bot pushed a commit that referenced this pull request Jan 13, 2026
Imported from GitHub PR #36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
472cd54 by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
3bb9422 by Harsha HS <[email protected]>:

Fix tests which started to fail due to #33854

--
850d955 by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
b504a7e by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
1e4e57a by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
ce1241c by Harsha HS <[email protected]>:

Address review comments

Merging this change closes #36046

FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c
PiperOrigin-RevId: 855607651
copybara-service Bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 13, 2026
Imported from GitHub PR openxla/xla#36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
472cd546318131ffc064e5996e489d9c9b2e40c2 by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
3bb942282579aed77f409e0730cd66bc1cd33665 by Harsha HS <[email protected]>:

Fix tests which started to fail due to openxla/xla#33854

--
850d95516c8921ef41fd1107195f7e6d6d032981 by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
b504a7ec6639e6ffedd595405cb7ebde4ade602c by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
1e4e57ab9474217a2b99939acc1b9ec69642e663 by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
ce1241c38c4f753c27658cc3f237161ae420307b by Harsha HS <[email protected]>:

Address review comments

Merging this change closes #36046

Reverts bc8de59

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c38c4f753c27658cc3f237161ae420307b
PiperOrigin-RevId: 855607651
copybara-service Bot pushed a commit that referenced this pull request Jan 13, 2026
Imported from GitHub PR #36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
472cd54 by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
3bb9422 by Harsha HS <[email protected]>:

Fix tests which started to fail due to #33854

--
850d955 by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
b504a7e by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
1e4e57a by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
ce1241c by Harsha HS <[email protected]>:

Address review comments

Merging this change closes #36046

FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c
PiperOrigin-RevId: 855607651
copybara-service Bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 13, 2026
Imported from GitHub PR openxla/xla#36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
472cd546318131ffc064e5996e489d9c9b2e40c2 by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
3bb942282579aed77f409e0730cd66bc1cd33665 by Harsha HS <[email protected]>:

Fix tests which started to fail due to openxla/xla#33854

--
850d95516c8921ef41fd1107195f7e6d6d032981 by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
b504a7ec6639e6ffedd595405cb7ebde4ade602c by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
1e4e57ab9474217a2b99939acc1b9ec69642e663 by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
ce1241c38c4f753c27658cc3f237161ae420307b by Harsha HS <[email protected]>:

Address review comments

Merging this change closes #36046

Reverts 7e8aa5e

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c38c4f753c27658cc3f237161ae420307b
PiperOrigin-RevId: 855607651
copybara-service Bot pushed a commit that referenced this pull request Jan 13, 2026
Imported from GitHub PR #36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
472cd54 by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
3bb9422 by Harsha HS <[email protected]>:

Fix tests which started to fail due to #33854

--
850d955 by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
b504a7e by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
1e4e57a by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
ce1241c by Harsha HS <[email protected]>:

Address review comments

Merging this change closes #36046

FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c
PiperOrigin-RevId: 855607651
copybara-service Bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 13, 2026
Imported from GitHub PR openxla/xla#36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
472cd546318131ffc064e5996e489d9c9b2e40c2 by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
3bb942282579aed77f409e0730cd66bc1cd33665 by Harsha HS <[email protected]>:

Fix tests which started to fail due to openxla/xla#33854

--
850d95516c8921ef41fd1107195f7e6d6d032981 by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
b504a7ec6639e6ffedd595405cb7ebde4ade602c by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
1e4e57ab9474217a2b99939acc1b9ec69642e663 by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
ce1241c38c4f753c27658cc3f237161ae420307b by Harsha HS <[email protected]>:

Address review comments

Merging this change closes #36046

Reverts 7e8aa5e

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c38c4f753c27658cc3f237161ae420307b
PiperOrigin-RevId: 855607651
copybara-service Bot pushed a commit that referenced this pull request Jan 13, 2026
Imported from GitHub PR #36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
472cd54 by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
3bb9422 by Harsha HS <[email protected]>:

Fix tests which started to fail due to #33854

--
850d955 by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
b504a7e by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
1e4e57a by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
ce1241c by Harsha HS <[email protected]>:

Address review comments

Merging this change closes #36046

FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c
PiperOrigin-RevId: 855607651
copybara-service Bot pushed a commit that referenced this pull request Jan 13, 2026
Imported from GitHub PR #36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
472cd54 by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
3bb9422 by Harsha HS <[email protected]>:

Fix tests which started to fail due to #33854

--
850d955 by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
b504a7e by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
1e4e57a by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
ce1241c by Harsha HS <[email protected]>:

Address review comments

Merging this change closes #36046

FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c
PiperOrigin-RevId: 855607651
copybara-service Bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 13, 2026
Imported from GitHub PR openxla/xla#36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
472cd546318131ffc064e5996e489d9c9b2e40c2 by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
3bb942282579aed77f409e0730cd66bc1cd33665 by Harsha HS <[email protected]>:

Fix tests which started to fail due to openxla/xla#33854

--
850d95516c8921ef41fd1107195f7e6d6d032981 by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
b504a7ec6639e6ffedd595405cb7ebde4ade602c by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
1e4e57ab9474217a2b99939acc1b9ec69642e663 by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
ce1241c38c4f753c27658cc3f237161ae420307b by Harsha HS <[email protected]>:

Address review comments

Merging this change closes #36046

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c38c4f753c27658cc3f237161ae420307b
PiperOrigin-RevId: 855607651
copybara-service Bot pushed a commit that referenced this pull request Jan 13, 2026
Imported from GitHub PR #36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
472cd54 by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
3bb9422 by Harsha HS <[email protected]>:

Fix tests which started to fail due to #33854

--
850d955 by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
b504a7e by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
1e4e57a by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
ce1241c by Harsha HS <[email protected]>:

Address review comments

Merging this change closes #36046

FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c
PiperOrigin-RevId: 855607651
copybara-service Bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 13, 2026
Imported from GitHub PR openxla/xla#36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
472cd546318131ffc064e5996e489d9c9b2e40c2 by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
3bb942282579aed77f409e0730cd66bc1cd33665 by Harsha HS <[email protected]>:

Fix tests which started to fail due to openxla/xla#33854

--
850d95516c8921ef41fd1107195f7e6d6d032981 by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
b504a7ec6639e6ffedd595405cb7ebde4ade602c by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
1e4e57ab9474217a2b99939acc1b9ec69642e663 by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
ce1241c38c4f753c27658cc3f237161ae420307b by Harsha HS <[email protected]>:

Address review comments

Merging this change closes #36046

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c38c4f753c27658cc3f237161ae420307b
PiperOrigin-RevId: 855607651
copybara-service Bot pushed a commit that referenced this pull request Jan 13, 2026
Imported from GitHub PR #36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
472cd54 by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
3bb9422 by Harsha HS <[email protected]>:

Fix tests which started to fail due to #33854

--
850d955 by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
b504a7e by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
1e4e57a by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
ce1241c by Harsha HS <[email protected]>:

Address review comments

Merging this change closes #36046

FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c
PiperOrigin-RevId: 855607651
copybara-service Bot pushed a commit that referenced this pull request Jan 14, 2026
Imported from GitHub PR #36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
472cd54 by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
3bb9422 by Harsha HS <[email protected]>:

Fix tests which started to fail due to #33854

--
850d955 by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
b504a7e by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
1e4e57a by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
ce1241c by Harsha HS <[email protected]>:

Address review comments

Merging this change closes #36046

FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c
PiperOrigin-RevId: 856068530
copybara-service Bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 14, 2026
Imported from GitHub PR openxla/xla#36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
472cd546318131ffc064e5996e489d9c9b2e40c2 by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
3bb942282579aed77f409e0730cd66bc1cd33665 by Harsha HS <[email protected]>:

Fix tests which started to fail due to openxla/xla#33854

--
850d95516c8921ef41fd1107195f7e6d6d032981 by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
b504a7ec6639e6ffedd595405cb7ebde4ade602c by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
1e4e57ab9474217a2b99939acc1b9ec69642e663 by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
ce1241c38c4f753c27658cc3f237161ae420307b by Harsha HS <[email protected]>:

Address review comments

Merging this change closes #36046

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c38c4f753c27658cc3f237161ae420307b
PiperOrigin-RevId: 856068530
copybara-service Bot pushed a commit that referenced this pull request Jan 19, 2026
Imported from GitHub PR #36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
472cd54 by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
3bb9422 by Harsha HS <[email protected]>:

Fix tests which started to fail due to #33854

--
850d955 by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
b504a7e by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
1e4e57a by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
ce1241c by Harsha HS <[email protected]>:

Address review comments

Merging this change closes #36046

FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c
PiperOrigin-RevId: 856068530
copybara-service Bot pushed a commit that referenced this pull request Jan 19, 2026
Imported from GitHub PR #36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
b6c6902 by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
c05a4a4 by Harsha HS <[email protected]>:

Fix tests which started to fail due to #33854

--
0ce9520 by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
a0fe014 by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
28569eb by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
d782553 by Harsha HS <[email protected]>:

Address review comments

--
8c23b89 by Harsha HS <[email protected]>:

Revert tf_cuda_test_tags to its original form to resolve CI errors

Merging this change closes #36046

FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89
PiperOrigin-RevId: 855607651
copybara-service Bot pushed a commit that referenced this pull request Jan 19, 2026
Imported from GitHub PR #36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
b6c6902 by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
c05a4a4 by Harsha HS <[email protected]>:

Fix tests which started to fail due to #33854

--
0ce9520 by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
a0fe014 by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
28569eb by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
d782553 by Harsha HS <[email protected]>:

Address review comments

--
8c23b89 by Harsha HS <[email protected]>:

Revert tf_cuda_test_tags to its original form to resolve CI errors

Merging this change closes #36046

FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89
PiperOrigin-RevId: 855607651
copybara-service Bot pushed a commit that referenced this pull request Jan 19, 2026
Imported from GitHub PR #36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
b6c6902 by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
c05a4a4 by Harsha HS <[email protected]>:

Fix tests which started to fail due to #33854

--
0ce9520 by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
a0fe014 by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
28569eb by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
d782553 by Harsha HS <[email protected]>:

Address review comments

--
8c23b89 by Harsha HS <[email protected]>:

Revert tf_cuda_test_tags to its original form to resolve CI errors

Merging this change closes #36046

FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89
PiperOrigin-RevId: 855607651
copybara-service Bot pushed a commit that referenced this pull request Jan 20, 2026
Imported from GitHub PR #36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
b6c6902 by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
c05a4a4 by Harsha HS <[email protected]>:

Fix tests which started to fail due to #33854

--
0ce9520 by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
a0fe014 by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
28569eb by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
d782553 by Harsha HS <[email protected]>:

Address review comments

--
8c23b89 by Harsha HS <[email protected]>:

Revert tf_cuda_test_tags to its original form to resolve CI errors

Merging this change closes #36046

FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89
PiperOrigin-RevId: 855607651
copybara-service Bot pushed a commit that referenced this pull request Jan 20, 2026
Imported from GitHub PR #36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
b6c6902 by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
c05a4a4 by Harsha HS <[email protected]>:

Fix tests which started to fail due to #33854

--
0ce9520 by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
a0fe014 by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
28569eb by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
d782553 by Harsha HS <[email protected]>:

Address review comments

--
8c23b89 by Harsha HS <[email protected]>:

Revert tf_cuda_test_tags to its original form to resolve CI errors

Merging this change closes #36046

FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89
PiperOrigin-RevId: 855607651
copybara-service Bot pushed a commit that referenced this pull request Jan 20, 2026
Imported from GitHub PR #36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
b6c6902 by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
c05a4a4 by Harsha HS <[email protected]>:

Fix tests which started to fail due to #33854

--
0ce9520 by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
a0fe014 by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
28569eb by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
d782553 by Harsha HS <[email protected]>:

Address review comments

--
8c23b89 by Harsha HS <[email protected]>:

Revert tf_cuda_test_tags to its original form to resolve CI errors

Merging this change closes #36046

FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89
PiperOrigin-RevId: 855607651
copybara-service Bot pushed a commit that referenced this pull request Jan 21, 2026
Imported from GitHub PR #36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
b6c6902 by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
c05a4a4 by Harsha HS <[email protected]>:

Fix tests which started to fail due to #33854

--
0ce9520 by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
a0fe014 by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
28569eb by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
d782553 by Harsha HS <[email protected]>:

Address review comments

--
8c23b89 by Harsha HS <[email protected]>:

Revert tf_cuda_test_tags to its original form to resolve CI errors

Merging this change closes #36046

FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89
PiperOrigin-RevId: 855607651
copybara-service Bot pushed a commit that referenced this pull request Jan 21, 2026
Imported from GitHub PR #36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
b6c6902 by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
c05a4a4 by Harsha HS <[email protected]>:

Fix tests which started to fail due to #33854

--
0ce9520 by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
a0fe014 by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
28569eb by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
d782553 by Harsha HS <[email protected]>:

Address review comments

--
8c23b89 by Harsha HS <[email protected]>:

Revert tf_cuda_test_tags to its original form to resolve CI errors

Merging this change closes #36046

FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89
PiperOrigin-RevId: 855607651
copybara-service Bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 21, 2026
Imported from GitHub PR openxla/xla#36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
b6c69025388e3528c1ff7a3ca2bef2389816973c by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
c05a4a4bacf1ecf813f056cc690b9dda1df6aafe by Harsha HS <[email protected]>:

Fix tests which started to fail due to openxla/xla#33854

--
0ce9520211fbaafadb42a0a41219351ca889354a by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
a0fe0145935d830a0265a7a3bbdac9a0bfd2f056 by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
28569eb82b09fef0767be4b6fb25c44cee613fb1 by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
d7825531f4d3be3c0cf44001103cf1ac35732b38 by Harsha HS <[email protected]>:

Address review comments

--
8c23b89f1a696fdf304d86c182ed70ffb2c4dc69 by Harsha HS <[email protected]>:

Revert tf_cuda_test_tags to its original form to resolve CI errors

Merging this change closes #36046

Reverts changelist 858710443

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89f1a696fdf304d86c182ed70ffb2c4dc69
PiperOrigin-RevId: 855607651
copybara-service Bot pushed a commit that referenced this pull request Jan 21, 2026
Imported from GitHub PR #36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
b6c6902 by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
c05a4a4 by Harsha HS <[email protected]>:

Fix tests which started to fail due to #33854

--
0ce9520 by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
a0fe014 by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
28569eb by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
d782553 by Harsha HS <[email protected]>:

Address review comments

--
8c23b89 by Harsha HS <[email protected]>:

Revert tf_cuda_test_tags to its original form to resolve CI errors

Merging this change closes #36046

FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89
PiperOrigin-RevId: 855607651
copybara-service Bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 21, 2026
Imported from GitHub PR openxla/xla#36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
b6c69025388e3528c1ff7a3ca2bef2389816973c by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
c05a4a4bacf1ecf813f056cc690b9dda1df6aafe by Harsha HS <[email protected]>:

Fix tests which started to fail due to openxla/xla#33854

--
0ce9520211fbaafadb42a0a41219351ca889354a by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
a0fe0145935d830a0265a7a3bbdac9a0bfd2f056 by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
28569eb82b09fef0767be4b6fb25c44cee613fb1 by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
d7825531f4d3be3c0cf44001103cf1ac35732b38 by Harsha HS <[email protected]>:

Address review comments

--
8c23b89f1a696fdf304d86c182ed70ffb2c4dc69 by Harsha HS <[email protected]>:

Revert tf_cuda_test_tags to its original form to resolve CI errors

Merging this change closes #36046

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89f1a696fdf304d86c182ed70ffb2c4dc69
PiperOrigin-RevId: 855607651
copybara-service Bot pushed a commit that referenced this pull request Jan 21, 2026
Imported from GitHub PR #36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
b6c6902 by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
c05a4a4 by Harsha HS <[email protected]>:

Fix tests which started to fail due to #33854

--
0ce9520 by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
a0fe014 by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
28569eb by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
d782553 by Harsha HS <[email protected]>:

Address review comments

--
8c23b89 by Harsha HS <[email protected]>:

Revert tf_cuda_test_tags to its original form to resolve CI errors

Merging this change closes #36046

FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89
PiperOrigin-RevId: 855607651
copybara-service Bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 21, 2026
Imported from GitHub PR openxla/xla#36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
b6c69025388e3528c1ff7a3ca2bef2389816973c by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
c05a4a4bacf1ecf813f056cc690b9dda1df6aafe by Harsha HS <[email protected]>:

Fix tests which started to fail due to openxla/xla#33854

--
0ce9520211fbaafadb42a0a41219351ca889354a by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
a0fe0145935d830a0265a7a3bbdac9a0bfd2f056 by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
28569eb82b09fef0767be4b6fb25c44cee613fb1 by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
d7825531f4d3be3c0cf44001103cf1ac35732b38 by Harsha HS <[email protected]>:

Address review comments

--
8c23b89f1a696fdf304d86c182ed70ffb2c4dc69 by Harsha HS <[email protected]>:

Revert tf_cuda_test_tags to its original form to resolve CI errors

Merging this change closes #36046

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89f1a696fdf304d86c182ed70ffb2c4dc69
PiperOrigin-RevId: 855607651
copybara-service Bot pushed a commit that referenced this pull request Jan 21, 2026
Imported from GitHub PR #36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
b6c6902 by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
c05a4a4 by Harsha HS <[email protected]>:

Fix tests which started to fail due to #33854

--
0ce9520 by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
a0fe014 by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
28569eb by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
d782553 by Harsha HS <[email protected]>:

Address review comments

--
8c23b89 by Harsha HS <[email protected]>:

Revert tf_cuda_test_tags to its original form to resolve CI errors

Merging this change closes #36046

FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89
PiperOrigin-RevId: 855607651
copybara-service Bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 21, 2026
Imported from GitHub PR openxla/xla#36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
b6c69025388e3528c1ff7a3ca2bef2389816973c by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
c05a4a4bacf1ecf813f056cc690b9dda1df6aafe by Harsha HS <[email protected]>:

Fix tests which started to fail due to openxla/xla#33854

--
0ce9520211fbaafadb42a0a41219351ca889354a by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
a0fe0145935d830a0265a7a3bbdac9a0bfd2f056 by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
28569eb82b09fef0767be4b6fb25c44cee613fb1 by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
d7825531f4d3be3c0cf44001103cf1ac35732b38 by Harsha HS <[email protected]>:

Address review comments

--
8c23b89f1a696fdf304d86c182ed70ffb2c4dc69 by Harsha HS <[email protected]>:

Revert tf_cuda_test_tags to its original form to resolve CI errors

Merging this change closes #36046

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89f1a696fdf304d86c182ed70ffb2c4dc69
PiperOrigin-RevId: 855607651
copybara-service Bot pushed a commit that referenced this pull request Jan 21, 2026
Imported from GitHub PR #36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
b6c6902 by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
c05a4a4 by Harsha HS <[email protected]>:

Fix tests which started to fail due to #33854

--
0ce9520 by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
a0fe014 by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
28569eb by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
d782553 by Harsha HS <[email protected]>:

Address review comments

--
8c23b89 by Harsha HS <[email protected]>:

Revert tf_cuda_test_tags to its original form to resolve CI errors

--
95f983a by Harsha H S <[email protected]>:

Update BUILD with absl/strings dependency

Merging this change closes #36046

FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 95f983a
PiperOrigin-RevId: 855607651
copybara-service Bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 21, 2026
Imported from GitHub PR openxla/xla#36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
b6c69025388e3528c1ff7a3ca2bef2389816973c by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
c05a4a4bacf1ecf813f056cc690b9dda1df6aafe by Harsha HS <[email protected]>:

Fix tests which started to fail due to openxla/xla#33854

--
0ce9520211fbaafadb42a0a41219351ca889354a by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
a0fe0145935d830a0265a7a3bbdac9a0bfd2f056 by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
28569eb82b09fef0767be4b6fb25c44cee613fb1 by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
d7825531f4d3be3c0cf44001103cf1ac35732b38 by Harsha HS <[email protected]>:

Address review comments

--
8c23b89f1a696fdf304d86c182ed70ffb2c4dc69 by Harsha HS <[email protected]>:

Revert tf_cuda_test_tags to its original form to resolve CI errors

--
95f983a876d2c08e0f6cb71f7f74e92a6aa316c0 by Harsha H S <[email protected]>:

Update BUILD with absl/strings dependency

Merging this change closes #36046

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 95f983a876d2c08e0f6cb71f7f74e92a6aa316c0
PiperOrigin-RevId: 855607651
copybara-service Bot pushed a commit that referenced this pull request Jan 22, 2026
Imported from GitHub PR #36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
b6c6902 by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
c05a4a4 by Harsha HS <[email protected]>:

Fix tests which started to fail due to #33854

--
0ce9520 by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
a0fe014 by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
28569eb by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
d782553 by Harsha HS <[email protected]>:

Address review comments

--
8c23b89 by Harsha HS <[email protected]>:

Revert tf_cuda_test_tags to its original form to resolve CI errors

--
95f983a by Harsha H S <[email protected]>:

Update BUILD with absl/strings dependency

Merging this change closes #36046

FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 95f983a
PiperOrigin-RevId: 855607651
copybara-service Bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 22, 2026
Imported from GitHub PR openxla/xla#36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
b6c69025388e3528c1ff7a3ca2bef2389816973c by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
c05a4a4bacf1ecf813f056cc690b9dda1df6aafe by Harsha HS <[email protected]>:

Fix tests which started to fail due to openxla/xla#33854

--
0ce9520211fbaafadb42a0a41219351ca889354a by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
a0fe0145935d830a0265a7a3bbdac9a0bfd2f056 by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
28569eb82b09fef0767be4b6fb25c44cee613fb1 by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
d7825531f4d3be3c0cf44001103cf1ac35732b38 by Harsha HS <[email protected]>:

Address review comments

--
8c23b89f1a696fdf304d86c182ed70ffb2c4dc69 by Harsha HS <[email protected]>:

Revert tf_cuda_test_tags to its original form to resolve CI errors

--
95f983a876d2c08e0f6cb71f7f74e92a6aa316c0 by Harsha H S <[email protected]>:

Update BUILD with absl/strings dependency

Merging this change closes #36046

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 95f983a876d2c08e0f6cb71f7f74e92a6aa316c0
PiperOrigin-RevId: 855607651
copybara-service Bot pushed a commit that referenced this pull request Jan 22, 2026
Imported from GitHub PR #36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
b6c6902 by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
c05a4a4 by Harsha HS <[email protected]>:

Fix tests which started to fail due to #33854

--
0ce9520 by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
a0fe014 by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
28569eb by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
d782553 by Harsha HS <[email protected]>:

Address review comments

--
8c23b89 by Harsha HS <[email protected]>:

Revert tf_cuda_test_tags to its original form to resolve CI errors

--
95f983a by Harsha H S <[email protected]>:

Update BUILD with absl/strings dependency

Merging this change closes #36046

COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 95f983a
PiperOrigin-RevId: 859532960
copybara-service Bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 22, 2026
Imported from GitHub PR openxla/xla#36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform

🚀 Kind of Contribution
 🐛 Bug Fix, 🧪 Tests

Copybara import of the project:

--
b6c69025388e3528c1ff7a3ca2bef2389816973c by Harsha HS <[email protected]>:

[ROCm] Fix failing unit tests on ROCm platform

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return
conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.

--
c05a4a4bacf1ecf813f056cc690b9dda1df6aafe by Harsha HS <[email protected]>:

Fix tests which started to fail due to openxla/xla#33854

--
0ce9520211fbaafadb42a0a41219351ca889354a by Harsha HS <[email protected]>:

HIP now respects highter launch dimension similar to CUDA

--
a0fe0145935d830a0265a7a3bbdac9a0bfd2f056 by Harsha HS <[email protected]>:

Make dynamic_shared_memory_test cuda only

--
28569eb82b09fef0767be4b6fb25c44cee613fb1 by Harsha HS <[email protected]>:

Add arch specific checks to sorting.hlo

--
d7825531f4d3be3c0cf44001103cf1ac35732b38 by Harsha HS <[email protected]>:

Address review comments

--
8c23b89f1a696fdf304d86c182ed70ffb2c4dc69 by Harsha HS <[email protected]>:

Revert tf_cuda_test_tags to its original form to resolve CI errors

--
95f983a876d2c08e0f6cb71f7f74e92a6aa316c0 by Harsha H S <[email protected]>:

Update BUILD with absl/strings dependency

Merging this change closes #36046

PiperOrigin-RevId: 859532960
nurmukhametov pushed a commit to ROCm/xla that referenced this pull request Jan 22, 2026
Imported from GitHub PR openxla#36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla#33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform
nurmukhametov pushed a commit to ROCm/xla that referenced this pull request Jan 22, 2026
Imported from GitHub PR openxla#36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla#33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform
nurmukhametov pushed a commit to ROCm/xla that referenced this pull request Jan 23, 2026
Imported from GitHub PR openxla#36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla#33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform
@pemeliya pemeliya deleted the pemeliya/remove-pad-for-gemms-12.11.25 branch January 23, 2026 13:09
nurmukhametov pushed a commit to ROCm/xla that referenced this pull request Jan 23, 2026
Imported from GitHub PR openxla#36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla#33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform
nurmukhametov pushed a commit to ROCm/xla that referenced this pull request Jan 23, 2026
Imported from GitHub PR openxla#36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla#33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform
nurmukhametov pushed a commit to ROCm/xla that referenced this pull request Jan 23, 2026
Imported from GitHub PR openxla#36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla#33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform
nurmukhametov pushed a commit to ROCm/xla that referenced this pull request Jan 23, 2026
Imported from GitHub PR openxla#36046

📝 Summary of Changes

- layout_assignment tests are marked cuda-only.
- sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest.
- buffer_debug_log test is made gpu agnostic by using cannonical gpu name.
- cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla#33854
- gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime
- Mark dynamic_shared_memory_test as cuda-only
- Add arch specific checks for barriers to sorting.hlo

🎯 Justification
Fixes failing unit tests on ROCm platform
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants