Skip to content

Conversation

@lhoestq
Copy link
Member

@lhoestq lhoestq commented Jun 22, 2022

Arrow accepts both pd.Timestamp and datetime.datetime objects to create timestamp arrays.
However a timestamp array is always converted to datetime.datetime objects.

This created an inconsistency between streaming in non-streaming. e.g. the ett dataset outputs datetime.datetime objects in non-streaming but pd.timestamp in streaming.

I fixed this by always converting pd.Timestamp to datetime.datetime during the example encoding step.
I fixed the same issue for pd.Timedelta as well. Finally I added an extra step of conversion for Series and DataFrame to take this into account in case such data are passed as Series or DataFrame.

Fix #4533
Related to huggingface/dataset-viewer#397

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 22, 2022

The documentation is not available anymore as the PR was closed or merged.

@lhoestq
Copy link
Member Author

lhoestq commented Jun 22, 2022

CI failures are unrelated to this PR, merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Timestamp not returned as datetime objects in streaming mode

3 participants