Fix failure on DeBERTa(base/v2/sew_d) fp16 training with ONNX Runtime #18585

JingyaHuang · 2022-08-11T15:35:31Z

Context

It was reported in optimum huggingface/optimum#305 that the mixed-precision training on DeBERTa with optimum.onnxruntime.ORTTrainer is broken.

After investigation, the break comes from mismatched inputs dtype for some Matmul nodes. In #18272, some sqrt results are cast to fp32, and they need to be re-casted to fp16 before Matmul ops, and this PR is supposed to correct the dtype.

Besides, this PR also fix the tracing of DeBERTa which haven't been fixed in #18272

Fixes #huggingface/optimum#305
Fixes #18199

Who can review?

@michaelbenayoun @LysandreJik @sgugger

HuggingFaceDocBuilderDev · 2022-08-11T15:47:20Z

The documentation is not available anymore as the PR was closed or merged.

sgugger

Thanks for fixing!

lewtun

Thanks a lot for fixing this @JingyaHuang - the changes also LGTM 🚀 !

JingyaHuang and others added 2 commits August 11, 2022 15:29

Fix matmul inputs dtype

499c4a8

Merge branch 'huggingface:main' into fix-deberta-fp16

cb08f67

sgugger approved these changes Aug 11, 2022

View reviewed changes

JingyaHuang marked this pull request as ready for review August 11, 2022 16:16

Merge branch 'huggingface:main' into fix-deberta-fp16

71cc747

LysandreJik requested a review from lewtun August 16, 2022 07:29

lewtun approved these changes Aug 17, 2022

View reviewed changes

lewtun merged commit 86d0b26 into huggingface:main Aug 17, 2022

JingyaHuang deleted the fix-deberta-fp16 branch August 22, 2022 08:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix failure on DeBERTa(base/v2/sew_d) fp16 training with ONNX Runtime #18585

Fix failure on DeBERTa(base/v2/sew_d) fp16 training with ONNX Runtime #18585

Uh oh!

JingyaHuang commented Aug 11, 2022 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Aug 11, 2022 •

edited

Loading

Uh oh!

sgugger left a comment

Uh oh!

lewtun left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix failure on DeBERTa(base/v2/sew_d) fp16 training with ONNX Runtime #18585

Fix failure on DeBERTa(base/v2/sew_d) fp16 training with ONNX Runtime #18585

Uh oh!

Conversation

JingyaHuang commented Aug 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Uh oh!

HuggingFaceDocBuilderDev commented Aug 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

lewtun left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JingyaHuang commented Aug 11, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Aug 11, 2022 •

edited

Loading