Skip to content

Conversation

@sadra-barikbin
Copy link
Collaborator

@sadra-barikbin sadra-barikbin commented Dec 11, 2022

By adding an assertion to check recomputing metric does not change its result, _test_distrib_integration fails.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Dec 11, 2022

Seems like there is a bug here:

if ws > 1 and not self._is_reduced:
# All gather across all processes
_prediction_tensor = cast(torch.Tensor, idist.all_gather(_prediction_tensor))
_target_tensor = cast(torch.Tensor, idist.all_gather(_target_tensor))
self._is_reduced = True

_prediction_tensor and _target_tensor are temporary tensors but self._is_reduced is set to True as for other metrics where @sync_all_reduce decorator updates metric's attributes and thus we avoid duplicated all_reduce calls.

Thanks for catching that @sadra-barikbin !

@sadra-barikbin sadra-barikbin force-pushed the Refactor-EpochMetric-and-make-it-idempotent branch from d3779e8 to 1f05f23 Compare February 17, 2023 13:57
@sadra-barikbin sadra-barikbin force-pushed the Refactor-EpochMetric-and-make-it-idempotent branch from 10d7542 to af77ead Compare February 17, 2023 15:13
Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @sadra-barikbin

@vfdev-5 vfdev-5 enabled auto-merge (squash) February 17, 2023 16:55
@vfdev-5 vfdev-5 merged commit 5d8d6bf into pytorch:master Feb 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants