Skip to content

Conversation

@mariosasko
Copy link
Collaborator

@mariosasko mariosasko commented May 17, 2022

This PR adds table_cast to the packaged loaders to fix casting to the Image/Audio, ArrayND and ClassLabel types. If these types are not present in the builder.config.features dictionary, the built-in pa.Table.cast is used for better performance. Additionally, this PR adds cast_storage to ClassLabel to support the string to int conversion in table_cast and ensure that integer labels are in a valid range.

Fix #4210

This PR is also a solution for these (popular) discussions: https://discuss.huggingface.co/t/converting-string-label-to-int/2816 and https://discuss.huggingface.co/t/class-labels-for-custom-datasets/15130/2

TODO:

  • tests

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented May 17, 2022

The documentation is not available anymore as the PR was closed or merged.

@mariosasko mariosasko changed the title Add support for complex feature types to packaged loaders Support complex feature types as features in packaged loaders May 18, 2022
@mariosasko mariosasko marked this pull request as ready for review May 20, 2022 09:13
Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks ! Added a few comments

@mariosasko mariosasko requested a review from lhoestq May 23, 2022 16:14
Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks ! I just added more comments about pa.Table.from_arrays

The rest looks all good to me :)

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks ! LGTM :)

@mariosasko mariosasko merged commit bad842c into master May 31, 2022
@mariosasko mariosasko deleted the loaders-table-cast branch May 31, 2022 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TypeError: Cannot cast array data from dtype('O') to dtype('int64') according to the rule 'safe'

4 participants