[ROCM[ Remove padding for gemms by pemeliya · Pull Request #33854 · openxla/xla

pemeliya · 2025-11-12T13:06:06Z

📝 Summary of Changes
Disabled padding for bf16/fp16 gemms on ROCM

🎯 Justification
Padding for gemms was added for parity with NVidia, but upon closer look, it turns out that there is no any advantage of it on ROCM platform. Furthermore, CublasPadForGemms wraps any padded dot op into pad / slice clauses which may prevent certain optimizations (e.g. gemm_rewriter epilogue fusion).

🚀 Kind of Contribution
♻️ Cleanup

🧪 Unit Tests:
Adapted the existing gemm_rewriter_test to account for padding on ROCM

@xla-rotation could you have a look please ?

@xla-rotation

Imported from GitHub PR #33854 📝 Summary of Changes Disabled padding for bf16/fp16 gemms on ROCM 🎯 Justification Padding for gemms was added for parity with NVidia, but upon closer look, it turns out that there is no any advantage of it on ROCM platform. Furthermore, CublasPadForGemms wraps any padded dot op into pad / slice clauses which may prevent certain optimizations (e.g. gemm_rewriter epilogue fusion). 🚀 Kind of Contribution ♻️ Cleanup 🧪 Unit Tests: Adapted the existing gemm_rewriter_test to account for padding on ROCM @xla-rotation could you have a look please ? Copybara import of the project: -- 8d52de9 by Pavel Emeliyanenko <[email protected]>: remove pad for gemms Merging this change closes #33854 FUTURE_COPYBARA_INTEGRATE_REVIEW=#33854 from ROCm:pemeliya/remove-pad-for-gemms-12.11.25 8d52de9 PiperOrigin-RevId: 831724544

@xla-rotation

Imported from GitHub PR openxla/xla#33854 📝 Summary of Changes Disabled padding for bf16/fp16 gemms on ROCM 🎯 Justification Padding for gemms was added for parity with NVidia, but upon closer look, it turns out that there is no any advantage of it on ROCM platform. Furthermore, CublasPadForGemms wraps any padded dot op into pad / slice clauses which may prevent certain optimizations (e.g. gemm_rewriter epilogue fusion). 🚀 Kind of Contribution ♻️ Cleanup 🧪 Unit Tests: Adapted the existing gemm_rewriter_test to account for padding on ROCM @xla-rotation could you have a look please ? Copybara import of the project: -- 8d52de94e4fee78d999cdc0c28082b09ffc472cb by Pavel Emeliyanenko <[email protected]>: remove pad for gemms Merging this change closes #33854 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#33854 from ROCm:pemeliya/remove-pad-for-gemms-12.11.25 8d52de94e4fee78d999cdc0c28082b09ffc472cb PiperOrigin-RevId: 831724544

@xla-rotation

Imported from GitHub PR #33854 📝 Summary of Changes Disabled padding for bf16/fp16 gemms on ROCM 🎯 Justification Padding for gemms was added for parity with NVidia, but upon closer look, it turns out that there is no any advantage of it on ROCM platform. Furthermore, CublasPadForGemms wraps any padded dot op into pad / slice clauses which may prevent certain optimizations (e.g. gemm_rewriter epilogue fusion). 🚀 Kind of Contribution ♻️ Cleanup 🧪 Unit Tests: Adapted the existing gemm_rewriter_test to account for padding on ROCM @xla-rotation could you have a look please ? Copybara import of the project: -- 8d52de9 by Pavel Emeliyanenko <[email protected]>: remove pad for gemms Merging this change closes #33854 FUTURE_COPYBARA_INTEGRATE_REVIEW=#33854 from ROCm:pemeliya/remove-pad-for-gemms-12.11.25 8d52de9 PiperOrigin-RevId: 831724544

@xla-rotation

Imported from GitHub PR openxla/xla#33854 📝 Summary of Changes Disabled padding for bf16/fp16 gemms on ROCM 🎯 Justification Padding for gemms was added for parity with NVidia, but upon closer look, it turns out that there is no any advantage of it on ROCM platform. Furthermore, CublasPadForGemms wraps any padded dot op into pad / slice clauses which may prevent certain optimizations (e.g. gemm_rewriter epilogue fusion). 🚀 Kind of Contribution ♻️ Cleanup 🧪 Unit Tests: Adapted the existing gemm_rewriter_test to account for padding on ROCM @xla-rotation could you have a look please ? Copybara import of the project: -- 8d52de94e4fee78d999cdc0c28082b09ffc472cb by Pavel Emeliyanenko <[email protected]>: remove pad for gemms Merging this change closes #33854 PiperOrigin-RevId: 831750726

Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- 472cd54 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- 3bb9422 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 850d955 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- b504a7e by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 1e4e57a by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- ce1241c by Harsha HS <[email protected]>: Address review comments Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c PiperOrigin-RevId: 855607651

Imported from GitHub PR openxla/xla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- 472cd546318131ffc064e5996e489d9c9b2e40c2 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- 3bb942282579aed77f409e0730cd66bc1cd33665 by Harsha HS <[email protected]>: Fix tests which started to fail due to openxla/xla#33854 -- 850d95516c8921ef41fd1107195f7e6d6d032981 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- b504a7ec6639e6ffedd595405cb7ebde4ade602c by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 1e4e57ab9474217a2b99939acc1b9ec69642e663 by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- ce1241c38c4f753c27658cc3f237161ae420307b by Harsha HS <[email protected]>: Address review comments Merging this change closes #36046 Reverts bc8de59 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c38c4f753c27658cc3f237161ae420307b PiperOrigin-RevId: 855607651

Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- 472cd54 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- 3bb9422 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 850d955 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- b504a7e by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 1e4e57a by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- ce1241c by Harsha HS <[email protected]>: Address review comments Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c PiperOrigin-RevId: 855607651

Imported from GitHub PR openxla/xla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- 472cd546318131ffc064e5996e489d9c9b2e40c2 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- 3bb942282579aed77f409e0730cd66bc1cd33665 by Harsha HS <[email protected]>: Fix tests which started to fail due to openxla/xla#33854 -- 850d95516c8921ef41fd1107195f7e6d6d032981 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- b504a7ec6639e6ffedd595405cb7ebde4ade602c by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 1e4e57ab9474217a2b99939acc1b9ec69642e663 by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- ce1241c38c4f753c27658cc3f237161ae420307b by Harsha HS <[email protected]>: Address review comments Merging this change closes #36046 Reverts 7e8aa5e FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c38c4f753c27658cc3f237161ae420307b PiperOrigin-RevId: 855607651

Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- 472cd54 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- 3bb9422 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 850d955 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- b504a7e by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 1e4e57a by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- ce1241c by Harsha HS <[email protected]>: Address review comments Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c PiperOrigin-RevId: 855607651

Imported from GitHub PR openxla/xla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- 472cd546318131ffc064e5996e489d9c9b2e40c2 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- 3bb942282579aed77f409e0730cd66bc1cd33665 by Harsha HS <[email protected]>: Fix tests which started to fail due to openxla/xla#33854 -- 850d95516c8921ef41fd1107195f7e6d6d032981 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- b504a7ec6639e6ffedd595405cb7ebde4ade602c by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 1e4e57ab9474217a2b99939acc1b9ec69642e663 by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- ce1241c38c4f753c27658cc3f237161ae420307b by Harsha HS <[email protected]>: Address review comments Merging this change closes #36046 Reverts 7e8aa5e FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c38c4f753c27658cc3f237161ae420307b PiperOrigin-RevId: 855607651

Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- 472cd54 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- 3bb9422 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 850d955 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- b504a7e by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 1e4e57a by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- ce1241c by Harsha HS <[email protected]>: Address review comments Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c PiperOrigin-RevId: 855607651

Imported from GitHub PR openxla/xla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- 472cd546318131ffc064e5996e489d9c9b2e40c2 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- 3bb942282579aed77f409e0730cd66bc1cd33665 by Harsha HS <[email protected]>: Fix tests which started to fail due to openxla/xla#33854 -- 850d95516c8921ef41fd1107195f7e6d6d032981 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- b504a7ec6639e6ffedd595405cb7ebde4ade602c by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 1e4e57ab9474217a2b99939acc1b9ec69642e663 by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- ce1241c38c4f753c27658cc3f237161ae420307b by Harsha HS <[email protected]>: Address review comments Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c38c4f753c27658cc3f237161ae420307b PiperOrigin-RevId: 855607651

Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- 472cd54 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- 3bb9422 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 850d955 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- b504a7e by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 1e4e57a by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- ce1241c by Harsha HS <[email protected]>: Address review comments Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c PiperOrigin-RevId: 855607651

Imported from GitHub PR openxla/xla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- 472cd546318131ffc064e5996e489d9c9b2e40c2 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- 3bb942282579aed77f409e0730cd66bc1cd33665 by Harsha HS <[email protected]>: Fix tests which started to fail due to openxla/xla#33854 -- 850d95516c8921ef41fd1107195f7e6d6d032981 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- b504a7ec6639e6ffedd595405cb7ebde4ade602c by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 1e4e57ab9474217a2b99939acc1b9ec69642e663 by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- ce1241c38c4f753c27658cc3f237161ae420307b by Harsha HS <[email protected]>: Address review comments Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c38c4f753c27658cc3f237161ae420307b PiperOrigin-RevId: 855607651

Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- 472cd54 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- 3bb9422 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 850d955 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- b504a7e by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 1e4e57a by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- ce1241c by Harsha HS <[email protected]>: Address review comments Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c PiperOrigin-RevId: 855607651

Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- 472cd54 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- 3bb9422 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 850d955 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- b504a7e by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 1e4e57a by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- ce1241c by Harsha HS <[email protected]>: Address review comments Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c PiperOrigin-RevId: 856068530

Imported from GitHub PR openxla/xla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- 472cd546318131ffc064e5996e489d9c9b2e40c2 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- 3bb942282579aed77f409e0730cd66bc1cd33665 by Harsha HS <[email protected]>: Fix tests which started to fail due to openxla/xla#33854 -- 850d95516c8921ef41fd1107195f7e6d6d032981 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- b504a7ec6639e6ffedd595405cb7ebde4ade602c by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 1e4e57ab9474217a2b99939acc1b9ec69642e663 by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- ce1241c38c4f753c27658cc3f237161ae420307b by Harsha HS <[email protected]>: Address review comments Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c38c4f753c27658cc3f237161ae420307b PiperOrigin-RevId: 856068530

Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- 472cd54 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- 3bb9422 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 850d955 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- b504a7e by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 1e4e57a by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- ce1241c by Harsha HS <[email protected]>: Address review comments Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c PiperOrigin-RevId: 856068530

Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c6902 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 0ce9520 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe014 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d782553 by Harsha HS <[email protected]>: Address review comments -- 8c23b89 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89 PiperOrigin-RevId: 855607651

Imported from GitHub PR openxla/xla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c69025388e3528c1ff7a3ca2bef2389816973c by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4bacf1ecf813f056cc690b9dda1df6aafe by Harsha HS <[email protected]>: Fix tests which started to fail due to openxla/xla#33854 -- 0ce9520211fbaafadb42a0a41219351ca889354a by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe0145935d830a0265a7a3bbdac9a0bfd2f056 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb82b09fef0767be4b6fb25c44cee613fb1 by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d7825531f4d3be3c0cf44001103cf1ac35732b38 by Harsha HS <[email protected]>: Address review comments -- 8c23b89f1a696fdf304d86c182ed70ffb2c4dc69 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors Merging this change closes #36046 Reverts changelist 858710443 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89f1a696fdf304d86c182ed70ffb2c4dc69 PiperOrigin-RevId: 855607651

Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c6902 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 0ce9520 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe014 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d782553 by Harsha HS <[email protected]>: Address review comments -- 8c23b89 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89 PiperOrigin-RevId: 855607651

Imported from GitHub PR openxla/xla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c69025388e3528c1ff7a3ca2bef2389816973c by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4bacf1ecf813f056cc690b9dda1df6aafe by Harsha HS <[email protected]>: Fix tests which started to fail due to openxla/xla#33854 -- 0ce9520211fbaafadb42a0a41219351ca889354a by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe0145935d830a0265a7a3bbdac9a0bfd2f056 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb82b09fef0767be4b6fb25c44cee613fb1 by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d7825531f4d3be3c0cf44001103cf1ac35732b38 by Harsha HS <[email protected]>: Address review comments -- 8c23b89f1a696fdf304d86c182ed70ffb2c4dc69 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89f1a696fdf304d86c182ed70ffb2c4dc69 PiperOrigin-RevId: 855607651

Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c6902 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 0ce9520 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe014 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d782553 by Harsha HS <[email protected]>: Address review comments -- 8c23b89 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89 PiperOrigin-RevId: 855607651

Imported from GitHub PR openxla/xla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c69025388e3528c1ff7a3ca2bef2389816973c by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4bacf1ecf813f056cc690b9dda1df6aafe by Harsha HS <[email protected]>: Fix tests which started to fail due to openxla/xla#33854 -- 0ce9520211fbaafadb42a0a41219351ca889354a by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe0145935d830a0265a7a3bbdac9a0bfd2f056 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb82b09fef0767be4b6fb25c44cee613fb1 by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d7825531f4d3be3c0cf44001103cf1ac35732b38 by Harsha HS <[email protected]>: Address review comments -- 8c23b89f1a696fdf304d86c182ed70ffb2c4dc69 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89f1a696fdf304d86c182ed70ffb2c4dc69 PiperOrigin-RevId: 855607651

Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c6902 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 0ce9520 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe014 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d782553 by Harsha HS <[email protected]>: Address review comments -- 8c23b89 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89 PiperOrigin-RevId: 855607651

Imported from GitHub PR openxla/xla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c69025388e3528c1ff7a3ca2bef2389816973c by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4bacf1ecf813f056cc690b9dda1df6aafe by Harsha HS <[email protected]>: Fix tests which started to fail due to openxla/xla#33854 -- 0ce9520211fbaafadb42a0a41219351ca889354a by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe0145935d830a0265a7a3bbdac9a0bfd2f056 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb82b09fef0767be4b6fb25c44cee613fb1 by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d7825531f4d3be3c0cf44001103cf1ac35732b38 by Harsha HS <[email protected]>: Address review comments -- 8c23b89f1a696fdf304d86c182ed70ffb2c4dc69 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89f1a696fdf304d86c182ed70ffb2c4dc69 PiperOrigin-RevId: 855607651

Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c6902 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 0ce9520 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe014 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d782553 by Harsha HS <[email protected]>: Address review comments -- 8c23b89 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors -- 95f983a by Harsha H S <[email protected]>: Update BUILD with absl/strings dependency Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 95f983a PiperOrigin-RevId: 855607651

Imported from GitHub PR openxla/xla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c69025388e3528c1ff7a3ca2bef2389816973c by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4bacf1ecf813f056cc690b9dda1df6aafe by Harsha HS <[email protected]>: Fix tests which started to fail due to openxla/xla#33854 -- 0ce9520211fbaafadb42a0a41219351ca889354a by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe0145935d830a0265a7a3bbdac9a0bfd2f056 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb82b09fef0767be4b6fb25c44cee613fb1 by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d7825531f4d3be3c0cf44001103cf1ac35732b38 by Harsha HS <[email protected]>: Address review comments -- 8c23b89f1a696fdf304d86c182ed70ffb2c4dc69 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors -- 95f983a876d2c08e0f6cb71f7f74e92a6aa316c0 by Harsha H S <[email protected]>: Update BUILD with absl/strings dependency Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 95f983a876d2c08e0f6cb71f7f74e92a6aa316c0 PiperOrigin-RevId: 855607651

Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c6902 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 0ce9520 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe014 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d782553 by Harsha HS <[email protected]>: Address review comments -- 8c23b89 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors -- 95f983a by Harsha H S <[email protected]>: Update BUILD with absl/strings dependency Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 95f983a PiperOrigin-RevId: 855607651

Imported from GitHub PR openxla/xla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c69025388e3528c1ff7a3ca2bef2389816973c by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4bacf1ecf813f056cc690b9dda1df6aafe by Harsha HS <[email protected]>: Fix tests which started to fail due to openxla/xla#33854 -- 0ce9520211fbaafadb42a0a41219351ca889354a by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe0145935d830a0265a7a3bbdac9a0bfd2f056 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb82b09fef0767be4b6fb25c44cee613fb1 by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d7825531f4d3be3c0cf44001103cf1ac35732b38 by Harsha HS <[email protected]>: Address review comments -- 8c23b89f1a696fdf304d86c182ed70ffb2c4dc69 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors -- 95f983a876d2c08e0f6cb71f7f74e92a6aa316c0 by Harsha H S <[email protected]>: Update BUILD with absl/strings dependency Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 95f983a876d2c08e0f6cb71f7f74e92a6aa316c0 PiperOrigin-RevId: 855607651

Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c6902 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 0ce9520 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe014 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d782553 by Harsha HS <[email protected]>: Address review comments -- 8c23b89 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors -- 95f983a by Harsha H S <[email protected]>: Update BUILD with absl/strings dependency Merging this change closes #36046 COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 95f983a PiperOrigin-RevId: 859532960

Imported from GitHub PR openxla/xla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c69025388e3528c1ff7a3ca2bef2389816973c by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4bacf1ecf813f056cc690b9dda1df6aafe by Harsha HS <[email protected]>: Fix tests which started to fail due to openxla/xla#33854 -- 0ce9520211fbaafadb42a0a41219351ca889354a by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe0145935d830a0265a7a3bbdac9a0bfd2f056 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb82b09fef0767be4b6fb25c44cee613fb1 by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d7825531f4d3be3c0cf44001103cf1ac35732b38 by Harsha HS <[email protected]>: Address review comments -- 8c23b89f1a696fdf304d86c182ed70ffb2c4dc69 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors -- 95f983a876d2c08e0f6cb71f7f74e92a6aa316c0 by Harsha H S <[email protected]>: Update BUILD with absl/strings dependency Merging this change closes #36046 PiperOrigin-RevId: 859532960

Imported from GitHub PR openxla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform

remove pad for gemms

8d52de9

dimitar-asenov approved these changes Nov 13, 2025

View reviewed changes

copybara-service Bot mentioned this pull request Nov 13, 2025

PR #33854: [ROCM[ Remove padding for gemms #33889

Closed

copybara-service Bot mentioned this pull request Nov 13, 2025

PR #33854: [ROCM[ Remove padding for gemms tensorflow/tensorflow#104280

Merged

copybara-service Bot closed this in 381af20 Nov 13, 2025

hsharsha added a commit to ROCm/xla that referenced this pull request Jan 8, 2026

Fix tests which started to fail due to openxla#33854

3bb9422

hsharsha mentioned this pull request Jan 8, 2026

[ROCm] Fix failing unit tests on ROCm platform #36046

Closed

copybara-service Bot mentioned this pull request Jan 13, 2026

PR #36046: [ROCm] Fix failing unit tests on ROCm platform #36330

Closed

copybara-service Bot mentioned this pull request Jan 13, 2026

PR #36046: [ROCm] Fix failing unit tests on ROCm platform tensorflow/tensorflow#108157

Merged

copybara-service Bot mentioned this pull request Jan 14, 2026

PR #36046: [ROCm] Fix failing unit tests on ROCm platform #36397

Closed

copybara-service Bot mentioned this pull request Jan 14, 2026

PR #36046: [ROCm] Fix failing unit tests on ROCm platform tensorflow/tensorflow#108257

Closed

pemeliya deleted the pemeliya/remove-pad-for-gemms-12.11.25 branch January 23, 2026 13:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ROCM[ Remove padding for gemms#33854

[ROCM[ Remove padding for gemms#33854
pemeliya wants to merge 1 commit into
openxla:mainfrom
ROCm:pemeliya/remove-pad-for-gemms-12.11.25

pemeliya commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pemeliya commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants