[ROCM[ Remove padding for gemms#33854
Closed
pemeliya wants to merge 1 commit into
Closed
Conversation
dimitar-asenov
approved these changes
Nov 13, 2025
copybara-service Bot
pushed a commit
that referenced
this pull request
Nov 13, 2025
Imported from GitHub PR #33854 📝 Summary of Changes Disabled padding for bf16/fp16 gemms on ROCM 🎯 Justification Padding for gemms was added for parity with NVidia, but upon closer look, it turns out that there is no any advantage of it on ROCM platform. Furthermore, CublasPadForGemms wraps any padded dot op into pad / slice clauses which may prevent certain optimizations (e.g. gemm_rewriter epilogue fusion). 🚀 Kind of Contribution ♻️ Cleanup 🧪 Unit Tests: Adapted the existing gemm_rewriter_test to account for padding on ROCM @xla-rotation could you have a look please ? Copybara import of the project: -- 8d52de9 by Pavel Emeliyanenko <[email protected]>: remove pad for gemms Merging this change closes #33854 FUTURE_COPYBARA_INTEGRATE_REVIEW=#33854 from ROCm:pemeliya/remove-pad-for-gemms-12.11.25 8d52de9 PiperOrigin-RevId: 831724544
copybara-service Bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Nov 13, 2025
Imported from GitHub PR openxla/xla#33854 📝 Summary of Changes Disabled padding for bf16/fp16 gemms on ROCM 🎯 Justification Padding for gemms was added for parity with NVidia, but upon closer look, it turns out that there is no any advantage of it on ROCM platform. Furthermore, CublasPadForGemms wraps any padded dot op into pad / slice clauses which may prevent certain optimizations (e.g. gemm_rewriter epilogue fusion). 🚀 Kind of Contribution ♻️ Cleanup 🧪 Unit Tests: Adapted the existing gemm_rewriter_test to account for padding on ROCM @xla-rotation could you have a look please ? Copybara import of the project: -- 8d52de94e4fee78d999cdc0c28082b09ffc472cb by Pavel Emeliyanenko <[email protected]>: remove pad for gemms Merging this change closes #33854 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#33854 from ROCm:pemeliya/remove-pad-for-gemms-12.11.25 8d52de94e4fee78d999cdc0c28082b09ffc472cb PiperOrigin-RevId: 831724544
copybara-service Bot
pushed a commit
that referenced
this pull request
Nov 13, 2025
Imported from GitHub PR #33854 📝 Summary of Changes Disabled padding for bf16/fp16 gemms on ROCM 🎯 Justification Padding for gemms was added for parity with NVidia, but upon closer look, it turns out that there is no any advantage of it on ROCM platform. Furthermore, CublasPadForGemms wraps any padded dot op into pad / slice clauses which may prevent certain optimizations (e.g. gemm_rewriter epilogue fusion). 🚀 Kind of Contribution ♻️ Cleanup 🧪 Unit Tests: Adapted the existing gemm_rewriter_test to account for padding on ROCM @xla-rotation could you have a look please ? Copybara import of the project: -- 8d52de9 by Pavel Emeliyanenko <[email protected]>: remove pad for gemms Merging this change closes #33854 FUTURE_COPYBARA_INTEGRATE_REVIEW=#33854 from ROCm:pemeliya/remove-pad-for-gemms-12.11.25 8d52de9 PiperOrigin-RevId: 831724544
copybara-service Bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Nov 13, 2025
Imported from GitHub PR openxla/xla#33854 📝 Summary of Changes Disabled padding for bf16/fp16 gemms on ROCM 🎯 Justification Padding for gemms was added for parity with NVidia, but upon closer look, it turns out that there is no any advantage of it on ROCM platform. Furthermore, CublasPadForGemms wraps any padded dot op into pad / slice clauses which may prevent certain optimizations (e.g. gemm_rewriter epilogue fusion). 🚀 Kind of Contribution ♻️ Cleanup 🧪 Unit Tests: Adapted the existing gemm_rewriter_test to account for padding on ROCM @xla-rotation could you have a look please ? Copybara import of the project: -- 8d52de94e4fee78d999cdc0c28082b09ffc472cb by Pavel Emeliyanenko <[email protected]>: remove pad for gemms Merging this change closes #33854 PiperOrigin-RevId: 831750726
hsharsha
added a commit
to ROCm/xla
that referenced
this pull request
Jan 8, 2026
copybara-service Bot
pushed a commit
that referenced
this pull request
Jan 13, 2026
Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- 472cd54 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- 3bb9422 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 850d955 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- b504a7e by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 1e4e57a by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- ce1241c by Harsha HS <[email protected]>: Address review comments Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c PiperOrigin-RevId: 855607651
copybara-service Bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Jan 13, 2026
Imported from GitHub PR openxla/xla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- 472cd546318131ffc064e5996e489d9c9b2e40c2 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- 3bb942282579aed77f409e0730cd66bc1cd33665 by Harsha HS <[email protected]>: Fix tests which started to fail due to openxla/xla#33854 -- 850d95516c8921ef41fd1107195f7e6d6d032981 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- b504a7ec6639e6ffedd595405cb7ebde4ade602c by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 1e4e57ab9474217a2b99939acc1b9ec69642e663 by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- ce1241c38c4f753c27658cc3f237161ae420307b by Harsha HS <[email protected]>: Address review comments Merging this change closes #36046 Reverts bc8de59 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c38c4f753c27658cc3f237161ae420307b PiperOrigin-RevId: 855607651
copybara-service Bot
pushed a commit
that referenced
this pull request
Jan 13, 2026
Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- 472cd54 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- 3bb9422 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 850d955 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- b504a7e by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 1e4e57a by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- ce1241c by Harsha HS <[email protected]>: Address review comments Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c PiperOrigin-RevId: 855607651
copybara-service Bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Jan 13, 2026
Imported from GitHub PR openxla/xla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- 472cd546318131ffc064e5996e489d9c9b2e40c2 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- 3bb942282579aed77f409e0730cd66bc1cd33665 by Harsha HS <[email protected]>: Fix tests which started to fail due to openxla/xla#33854 -- 850d95516c8921ef41fd1107195f7e6d6d032981 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- b504a7ec6639e6ffedd595405cb7ebde4ade602c by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 1e4e57ab9474217a2b99939acc1b9ec69642e663 by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- ce1241c38c4f753c27658cc3f237161ae420307b by Harsha HS <[email protected]>: Address review comments Merging this change closes #36046 Reverts 7e8aa5e FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c38c4f753c27658cc3f237161ae420307b PiperOrigin-RevId: 855607651
copybara-service Bot
pushed a commit
that referenced
this pull request
Jan 13, 2026
Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- 472cd54 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- 3bb9422 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 850d955 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- b504a7e by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 1e4e57a by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- ce1241c by Harsha HS <[email protected]>: Address review comments Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c PiperOrigin-RevId: 855607651
copybara-service Bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Jan 13, 2026
Imported from GitHub PR openxla/xla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- 472cd546318131ffc064e5996e489d9c9b2e40c2 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- 3bb942282579aed77f409e0730cd66bc1cd33665 by Harsha HS <[email protected]>: Fix tests which started to fail due to openxla/xla#33854 -- 850d95516c8921ef41fd1107195f7e6d6d032981 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- b504a7ec6639e6ffedd595405cb7ebde4ade602c by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 1e4e57ab9474217a2b99939acc1b9ec69642e663 by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- ce1241c38c4f753c27658cc3f237161ae420307b by Harsha HS <[email protected]>: Address review comments Merging this change closes #36046 Reverts 7e8aa5e FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c38c4f753c27658cc3f237161ae420307b PiperOrigin-RevId: 855607651
copybara-service Bot
pushed a commit
that referenced
this pull request
Jan 13, 2026
Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- 472cd54 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- 3bb9422 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 850d955 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- b504a7e by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 1e4e57a by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- ce1241c by Harsha HS <[email protected]>: Address review comments Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c PiperOrigin-RevId: 855607651
copybara-service Bot
pushed a commit
that referenced
this pull request
Jan 13, 2026
Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- 472cd54 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- 3bb9422 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 850d955 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- b504a7e by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 1e4e57a by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- ce1241c by Harsha HS <[email protected]>: Address review comments Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c PiperOrigin-RevId: 855607651
copybara-service Bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Jan 13, 2026
Imported from GitHub PR openxla/xla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- 472cd546318131ffc064e5996e489d9c9b2e40c2 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- 3bb942282579aed77f409e0730cd66bc1cd33665 by Harsha HS <[email protected]>: Fix tests which started to fail due to openxla/xla#33854 -- 850d95516c8921ef41fd1107195f7e6d6d032981 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- b504a7ec6639e6ffedd595405cb7ebde4ade602c by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 1e4e57ab9474217a2b99939acc1b9ec69642e663 by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- ce1241c38c4f753c27658cc3f237161ae420307b by Harsha HS <[email protected]>: Address review comments Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c38c4f753c27658cc3f237161ae420307b PiperOrigin-RevId: 855607651
copybara-service Bot
pushed a commit
that referenced
this pull request
Jan 13, 2026
Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- 472cd54 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- 3bb9422 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 850d955 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- b504a7e by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 1e4e57a by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- ce1241c by Harsha HS <[email protected]>: Address review comments Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c PiperOrigin-RevId: 855607651
copybara-service Bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Jan 13, 2026
Imported from GitHub PR openxla/xla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- 472cd546318131ffc064e5996e489d9c9b2e40c2 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- 3bb942282579aed77f409e0730cd66bc1cd33665 by Harsha HS <[email protected]>: Fix tests which started to fail due to openxla/xla#33854 -- 850d95516c8921ef41fd1107195f7e6d6d032981 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- b504a7ec6639e6ffedd595405cb7ebde4ade602c by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 1e4e57ab9474217a2b99939acc1b9ec69642e663 by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- ce1241c38c4f753c27658cc3f237161ae420307b by Harsha HS <[email protected]>: Address review comments Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c38c4f753c27658cc3f237161ae420307b PiperOrigin-RevId: 855607651
copybara-service Bot
pushed a commit
that referenced
this pull request
Jan 13, 2026
Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- 472cd54 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- 3bb9422 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 850d955 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- b504a7e by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 1e4e57a by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- ce1241c by Harsha HS <[email protected]>: Address review comments Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c PiperOrigin-RevId: 855607651
copybara-service Bot
pushed a commit
that referenced
this pull request
Jan 14, 2026
Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- 472cd54 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- 3bb9422 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 850d955 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- b504a7e by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 1e4e57a by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- ce1241c by Harsha HS <[email protected]>: Address review comments Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c PiperOrigin-RevId: 856068530
copybara-service Bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Jan 14, 2026
Imported from GitHub PR openxla/xla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- 472cd546318131ffc064e5996e489d9c9b2e40c2 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- 3bb942282579aed77f409e0730cd66bc1cd33665 by Harsha HS <[email protected]>: Fix tests which started to fail due to openxla/xla#33854 -- 850d95516c8921ef41fd1107195f7e6d6d032981 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- b504a7ec6639e6ffedd595405cb7ebde4ade602c by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 1e4e57ab9474217a2b99939acc1b9ec69642e663 by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- ce1241c38c4f753c27658cc3f237161ae420307b by Harsha HS <[email protected]>: Address review comments Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c38c4f753c27658cc3f237161ae420307b PiperOrigin-RevId: 856068530
copybara-service Bot
pushed a commit
that referenced
this pull request
Jan 19, 2026
Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- 472cd54 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- 3bb9422 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 850d955 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- b504a7e by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 1e4e57a by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- ce1241c by Harsha HS <[email protected]>: Address review comments Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 ce1241c PiperOrigin-RevId: 856068530
copybara-service Bot
pushed a commit
that referenced
this pull request
Jan 19, 2026
Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c6902 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 0ce9520 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe014 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d782553 by Harsha HS <[email protected]>: Address review comments -- 8c23b89 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89 PiperOrigin-RevId: 855607651
copybara-service Bot
pushed a commit
that referenced
this pull request
Jan 19, 2026
Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c6902 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 0ce9520 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe014 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d782553 by Harsha HS <[email protected]>: Address review comments -- 8c23b89 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89 PiperOrigin-RevId: 855607651
copybara-service Bot
pushed a commit
that referenced
this pull request
Jan 19, 2026
Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c6902 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 0ce9520 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe014 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d782553 by Harsha HS <[email protected]>: Address review comments -- 8c23b89 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89 PiperOrigin-RevId: 855607651
copybara-service Bot
pushed a commit
that referenced
this pull request
Jan 20, 2026
Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c6902 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 0ce9520 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe014 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d782553 by Harsha HS <[email protected]>: Address review comments -- 8c23b89 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89 PiperOrigin-RevId: 855607651
copybara-service Bot
pushed a commit
that referenced
this pull request
Jan 20, 2026
Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c6902 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 0ce9520 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe014 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d782553 by Harsha HS <[email protected]>: Address review comments -- 8c23b89 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89 PiperOrigin-RevId: 855607651
copybara-service Bot
pushed a commit
that referenced
this pull request
Jan 20, 2026
Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c6902 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 0ce9520 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe014 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d782553 by Harsha HS <[email protected]>: Address review comments -- 8c23b89 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89 PiperOrigin-RevId: 855607651
copybara-service Bot
pushed a commit
that referenced
this pull request
Jan 21, 2026
Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c6902 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 0ce9520 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe014 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d782553 by Harsha HS <[email protected]>: Address review comments -- 8c23b89 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89 PiperOrigin-RevId: 855607651
copybara-service Bot
pushed a commit
that referenced
this pull request
Jan 21, 2026
Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c6902 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 0ce9520 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe014 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d782553 by Harsha HS <[email protected]>: Address review comments -- 8c23b89 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89 PiperOrigin-RevId: 855607651
copybara-service Bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Jan 21, 2026
Imported from GitHub PR openxla/xla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c69025388e3528c1ff7a3ca2bef2389816973c by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4bacf1ecf813f056cc690b9dda1df6aafe by Harsha HS <[email protected]>: Fix tests which started to fail due to openxla/xla#33854 -- 0ce9520211fbaafadb42a0a41219351ca889354a by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe0145935d830a0265a7a3bbdac9a0bfd2f056 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb82b09fef0767be4b6fb25c44cee613fb1 by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d7825531f4d3be3c0cf44001103cf1ac35732b38 by Harsha HS <[email protected]>: Address review comments -- 8c23b89f1a696fdf304d86c182ed70ffb2c4dc69 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors Merging this change closes #36046 Reverts changelist 858710443 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89f1a696fdf304d86c182ed70ffb2c4dc69 PiperOrigin-RevId: 855607651
copybara-service Bot
pushed a commit
that referenced
this pull request
Jan 21, 2026
Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c6902 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 0ce9520 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe014 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d782553 by Harsha HS <[email protected]>: Address review comments -- 8c23b89 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89 PiperOrigin-RevId: 855607651
copybara-service Bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Jan 21, 2026
Imported from GitHub PR openxla/xla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c69025388e3528c1ff7a3ca2bef2389816973c by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4bacf1ecf813f056cc690b9dda1df6aafe by Harsha HS <[email protected]>: Fix tests which started to fail due to openxla/xla#33854 -- 0ce9520211fbaafadb42a0a41219351ca889354a by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe0145935d830a0265a7a3bbdac9a0bfd2f056 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb82b09fef0767be4b6fb25c44cee613fb1 by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d7825531f4d3be3c0cf44001103cf1ac35732b38 by Harsha HS <[email protected]>: Address review comments -- 8c23b89f1a696fdf304d86c182ed70ffb2c4dc69 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89f1a696fdf304d86c182ed70ffb2c4dc69 PiperOrigin-RevId: 855607651
copybara-service Bot
pushed a commit
that referenced
this pull request
Jan 21, 2026
Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c6902 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 0ce9520 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe014 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d782553 by Harsha HS <[email protected]>: Address review comments -- 8c23b89 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89 PiperOrigin-RevId: 855607651
copybara-service Bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Jan 21, 2026
Imported from GitHub PR openxla/xla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c69025388e3528c1ff7a3ca2bef2389816973c by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4bacf1ecf813f056cc690b9dda1df6aafe by Harsha HS <[email protected]>: Fix tests which started to fail due to openxla/xla#33854 -- 0ce9520211fbaafadb42a0a41219351ca889354a by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe0145935d830a0265a7a3bbdac9a0bfd2f056 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb82b09fef0767be4b6fb25c44cee613fb1 by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d7825531f4d3be3c0cf44001103cf1ac35732b38 by Harsha HS <[email protected]>: Address review comments -- 8c23b89f1a696fdf304d86c182ed70ffb2c4dc69 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89f1a696fdf304d86c182ed70ffb2c4dc69 PiperOrigin-RevId: 855607651
copybara-service Bot
pushed a commit
that referenced
this pull request
Jan 21, 2026
Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c6902 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 0ce9520 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe014 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d782553 by Harsha HS <[email protected]>: Address review comments -- 8c23b89 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89 PiperOrigin-RevId: 855607651
copybara-service Bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Jan 21, 2026
Imported from GitHub PR openxla/xla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c69025388e3528c1ff7a3ca2bef2389816973c by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4bacf1ecf813f056cc690b9dda1df6aafe by Harsha HS <[email protected]>: Fix tests which started to fail due to openxla/xla#33854 -- 0ce9520211fbaafadb42a0a41219351ca889354a by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe0145935d830a0265a7a3bbdac9a0bfd2f056 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb82b09fef0767be4b6fb25c44cee613fb1 by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d7825531f4d3be3c0cf44001103cf1ac35732b38 by Harsha HS <[email protected]>: Address review comments -- 8c23b89f1a696fdf304d86c182ed70ffb2c4dc69 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 8c23b89f1a696fdf304d86c182ed70ffb2c4dc69 PiperOrigin-RevId: 855607651
copybara-service Bot
pushed a commit
that referenced
this pull request
Jan 21, 2026
Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c6902 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 0ce9520 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe014 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d782553 by Harsha HS <[email protected]>: Address review comments -- 8c23b89 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors -- 95f983a by Harsha H S <[email protected]>: Update BUILD with absl/strings dependency Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 95f983a PiperOrigin-RevId: 855607651
copybara-service Bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Jan 21, 2026
Imported from GitHub PR openxla/xla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c69025388e3528c1ff7a3ca2bef2389816973c by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4bacf1ecf813f056cc690b9dda1df6aafe by Harsha HS <[email protected]>: Fix tests which started to fail due to openxla/xla#33854 -- 0ce9520211fbaafadb42a0a41219351ca889354a by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe0145935d830a0265a7a3bbdac9a0bfd2f056 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb82b09fef0767be4b6fb25c44cee613fb1 by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d7825531f4d3be3c0cf44001103cf1ac35732b38 by Harsha HS <[email protected]>: Address review comments -- 8c23b89f1a696fdf304d86c182ed70ffb2c4dc69 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors -- 95f983a876d2c08e0f6cb71f7f74e92a6aa316c0 by Harsha H S <[email protected]>: Update BUILD with absl/strings dependency Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 95f983a876d2c08e0f6cb71f7f74e92a6aa316c0 PiperOrigin-RevId: 855607651
copybara-service Bot
pushed a commit
that referenced
this pull request
Jan 22, 2026
Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c6902 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 0ce9520 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe014 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d782553 by Harsha HS <[email protected]>: Address review comments -- 8c23b89 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors -- 95f983a by Harsha H S <[email protected]>: Update BUILD with absl/strings dependency Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 95f983a PiperOrigin-RevId: 855607651
copybara-service Bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Jan 22, 2026
Imported from GitHub PR openxla/xla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c69025388e3528c1ff7a3ca2bef2389816973c by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4bacf1ecf813f056cc690b9dda1df6aafe by Harsha HS <[email protected]>: Fix tests which started to fail due to openxla/xla#33854 -- 0ce9520211fbaafadb42a0a41219351ca889354a by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe0145935d830a0265a7a3bbdac9a0bfd2f056 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb82b09fef0767be4b6fb25c44cee613fb1 by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d7825531f4d3be3c0cf44001103cf1ac35732b38 by Harsha HS <[email protected]>: Address review comments -- 8c23b89f1a696fdf304d86c182ed70ffb2c4dc69 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors -- 95f983a876d2c08e0f6cb71f7f74e92a6aa316c0 by Harsha H S <[email protected]>: Update BUILD with absl/strings dependency Merging this change closes #36046 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#36046 from ROCm:ci_fix_upstream_ut_20260107 95f983a876d2c08e0f6cb71f7f74e92a6aa316c0 PiperOrigin-RevId: 855607651
copybara-service Bot
pushed a commit
that referenced
this pull request
Jan 22, 2026
Imported from GitHub PR #36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in #33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c6902 by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4 by Harsha HS <[email protected]>: Fix tests which started to fail due to #33854 -- 0ce9520 by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe014 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d782553 by Harsha HS <[email protected]>: Address review comments -- 8c23b89 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors -- 95f983a by Harsha H S <[email protected]>: Update BUILD with absl/strings dependency Merging this change closes #36046 COPYBARA_INTEGRATE_REVIEW=#36046 from ROCm:ci_fix_upstream_ut_20260107 95f983a PiperOrigin-RevId: 859532960
copybara-service Bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Jan 22, 2026
Imported from GitHub PR openxla/xla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla/xla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform 🚀 Kind of Contribution 🐛 Bug Fix, 🧪 Tests Copybara import of the project: -- b6c69025388e3528c1ff7a3ca2bef2389816973c by Harsha HS <[email protected]>: [ROCm] Fix failing unit tests on ROCm platform - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. -- c05a4a4bacf1ecf813f056cc690b9dda1df6aafe by Harsha HS <[email protected]>: Fix tests which started to fail due to openxla/xla#33854 -- 0ce9520211fbaafadb42a0a41219351ca889354a by Harsha HS <[email protected]>: HIP now respects highter launch dimension similar to CUDA -- a0fe0145935d830a0265a7a3bbdac9a0bfd2f056 by Harsha HS <[email protected]>: Make dynamic_shared_memory_test cuda only -- 28569eb82b09fef0767be4b6fb25c44cee613fb1 by Harsha HS <[email protected]>: Add arch specific checks to sorting.hlo -- d7825531f4d3be3c0cf44001103cf1ac35732b38 by Harsha HS <[email protected]>: Address review comments -- 8c23b89f1a696fdf304d86c182ed70ffb2c4dc69 by Harsha HS <[email protected]>: Revert tf_cuda_test_tags to its original form to resolve CI errors -- 95f983a876d2c08e0f6cb71f7f74e92a6aa316c0 by Harsha H S <[email protected]>: Update BUILD with absl/strings dependency Merging this change closes #36046 PiperOrigin-RevId: 859532960
nurmukhametov
pushed a commit
to ROCm/xla
that referenced
this pull request
Jan 22, 2026
Imported from GitHub PR openxla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform
nurmukhametov
pushed a commit
to ROCm/xla
that referenced
this pull request
Jan 22, 2026
Imported from GitHub PR openxla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform
nurmukhametov
pushed a commit
to ROCm/xla
that referenced
this pull request
Jan 23, 2026
Imported from GitHub PR openxla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform
nurmukhametov
pushed a commit
to ROCm/xla
that referenced
this pull request
Jan 23, 2026
Imported from GitHub PR openxla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform
nurmukhametov
pushed a commit
to ROCm/xla
that referenced
this pull request
Jan 23, 2026
Imported from GitHub PR openxla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform
nurmukhametov
pushed a commit
to ROCm/xla
that referenced
this pull request
Jan 23, 2026
Imported from GitHub PR openxla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform
nurmukhametov
pushed a commit
to ROCm/xla
that referenced
this pull request
Jan 23, 2026
Imported from GitHub PR openxla#36046 📝 Summary of Changes - layout_assignment tests are marked cuda-only. - sample_file_test needs higher autotuner level for MIOpen to return conv algorithm. Earlier this was coming from GetDebugOptionsForTest. - buffer_debug_log test is made gpu agnostic by using cannonical gpu name. - cublas_gemm_rewriter_test_amdgpu_any fix unit test to remove padding for ROCm as introduced in openxla#33854 - gpu_kernel_tiling_test_amdgpu_any is updated to respect higher launch dimensions now supported by hipruntime - Mark dynamic_shared_memory_test as cuda-only - Add arch specific checks for barriers to sorting.hlo 🎯 Justification Fixes failing unit tests on ROCm platform
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📝 Summary of Changes
Disabled padding for bf16/fp16 gemms on ROCM
🎯 Justification
Padding for gemms was added for parity with NVidia, but upon closer look, it turns out that there is no any advantage of it on ROCM platform. Furthermore, CublasPadForGemms wraps any padded dot op into pad / slice clauses which may prevent certain optimizations (e.g. gemm_rewriter epilogue fusion).
🚀 Kind of Contribution
♻️ Cleanup
🧪 Unit Tests:
Adapted the existing gemm_rewriter_test to account for padding on ROCM
@xla-rotation could you have a look please ?