Skip to content

remove autocast() block; eval in fp16#167

Merged
taoleicn merged 1 commit into3.0.0-devfrom
3.0.0-dev-tao
Mar 3, 2021
Merged

remove autocast() block; eval in fp16#167
taoleicn merged 1 commit into3.0.0-devfrom
3.0.0-dev-tao

Conversation

@taoleicn
Copy link
Copy Markdown
Contributor

@taoleicn taoleicn commented Mar 3, 2021

This PR removes the torch.amp.autocast(enabled=False) block in sru.ops.elementwise_recurrence_gpu. This would simplify the code and make SRU compatible with both Pytorch Native AMP and apex.

The amp_recurrence_fp16 controls if the elementwise recurrence kernel can be run in fp16 or not. After some testing, we believe this can be made default True, which gives additional speed-up and memory footprint reduction.

In addition, we made minor changes to the SRUpp experimental code to let dev evaluation running in fp16.

@taoleicn taoleicn requested a review from hpasapp March 3, 2021 00:23
@taoleicn
Copy link
Copy Markdown
Contributor Author

taoleicn commented Mar 3, 2021

Added torch.clear_autocast_cache to avoid OOM related bugs:
huggingface/transformers#8403
pytorch/pytorch#48049

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants