As mentioned in #2552 it would be nice to improve the error message when a dataset fails to build because there are duplicate example keys.
The current one is
datasets.keyhash.DuplicatedKeysError: FAILURE TO GENERATE DATASET !
Found duplicate Key: 48
Keys should be unique and deterministic in nature
and we could have something that guides the user to debugging the issue:
DuplicateKeysError: both 42th and 1337th examples have the same keys `48`.
Please fix the dataset script at <path/to/the/dataset/script>