-
Notifications
You must be signed in to change notification settings - Fork 3k
Using a registry instead of calling globals for fetching feature types #6727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using a registry instead of calling globals for fetching feature types #6727
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
…tasets into fetch-features-from-registry
lhoestq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool ! Though for now I'd keep this feature experimental if it's fine for you.
I just added a few comments:
|
looks like some files are missing in your google storage |
…tasets into fetch-features-from-registry
|
cc @mariosasko is it related to #6474 ? The files should ideally not move for backward compatibility anyway |
|
@lhoestq All the files are still there. The problem is that the @psmyth94 This has been fixed on git remote add upstream https://github.com/huggingface/datasets.git
git pull upstream main
git push |
|
Thank you @mariosasko ! I'm updating this branch if you don't mind @psmyth94 |
lhoestq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks all good now, thanks !
Show benchmarksPyArrow==8.0.0 Show updated benchmarks!Benchmark: benchmark_array_xd.json
Benchmark: benchmark_getitem_100B.json
Benchmark: benchmark_indices_mapping.json
Benchmark: benchmark_iterating.json
Benchmark: benchmark_map_filter.json
Show updated benchmarks!Benchmark: benchmark_array_xd.json
Benchmark: benchmark_getitem_100B.json
Benchmark: benchmark_indices_mapping.json
Benchmark: benchmark_iterating.json
Benchmark: benchmark_map_filter.json
|

Hello,
When working with bio-data, each feature often has metadata associated with it (e.g. species, lineage, snp position, etc). To store this, I like to use the feature classes with the added
metadataattribute. However, when saving or loading with custom features, you get an error since that class doesn't exist in the global namespace indatasets.features.features. Take for example,We can avoid this by having a registry (like formatters) and doing
and loading from disk returns with all metadata information