[ROCm] Relax tolerances for FP8 unit test for fp16 and bf16 cases #7655

rraminen · 2025-10-28T17:12:52Z

Relaxing the tolerance values to enable the below two unit testa, with FP16 and BF16 data types on ROCm

unit/runtime/half_precision/test_fp8.py::TestFp8ComposabilityAcrossZero::test[bf16]
unit/runtime/half_precision/test_fp8.py::TestFp8ComposabilityAcrossZero::test[fp32]

jithunnair-amd · 2025-10-30T19:52:22Z

tests/unit/runtime/half_precision/test_fp8.py

-        if is_rocm_pytorch() and model_dtype == torch.float16:
-            rtol, atol = 3e-07, 3e-05
+        if is_rocm_pytorch() and base_datatype in ["fp16", "bf16"]:
+            rtol, atol = 1e-07, 1e-04


@rraminen Suggest figuring out if this is HW-specific e.g. MI200/MI300?

For ROCm, fp8 features are only supported on MI300. This workaround is needed for MI300 onwards.

also bf16 disabled for rocm now, PR for enabling it still waiting review -> https://github.com/deepspeedai/DeepSpeed/pull/7448/files#diff-56a6081d5f623bd0c17de268c74239976682acd6386a8b83639181aa3f5a39c4

Relax tolerance

21e6ecf

rraminen requested review from loadams, tjruwase and tohtana as code owners October 28, 2025 17:12

jithunnair-amd reviewed Oct 30, 2025

View reviewed changes

rraminen marked this pull request as draft October 30, 2025 20:04

Merge branch 'master' into relax_tol_testFP8_ROCm

1f4ad38

rraminen marked this pull request as ready for review November 10, 2025 17:09

rraminen marked this pull request as draft November 14, 2025 17:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm] Relax tolerances for FP8 unit test for fp16 and bf16 cases #7655

[ROCm] Relax tolerances for FP8 unit test for fp16 and bf16 cases #7655

rraminen commented Oct 28, 2025

Uh oh!

jithunnair-amd Oct 30, 2025

Uh oh!

rraminen Nov 10, 2025 •

edited

Loading

Uh oh!

k-artem Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[ROCm] Relax tolerances for FP8 unit test for fp16 and bf16 cases #7655

Are you sure you want to change the base?

[ROCm] Relax tolerances for FP8 unit test for fp16 and bf16 cases #7655

Conversation

rraminen commented Oct 28, 2025

Uh oh!

jithunnair-amd Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

rraminen Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

k-artem Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rraminen Nov 10, 2025 •

edited

Loading