Dataset loading #406
Zhouxx0101
started this conversation in
General
Replies: 1 comment 1 reply
-
|
The question is probably best directed to https://github.com/huggingface/datasets. There are multiple options:
There are also other advanced options with You can also only load certain columns from the dataset (e.g. just the target), which will reduce the time/memory footprint. Each option has some pros and cons, you need to weigh them based on your constraints and requirements. I recommend discussing this with ChatGPT or an LLM of your choice - it will provide more detailed and high-quality suggestions. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I hope to select some datasets for pre-training. Could you please tell me if the datasets on the autogluon/chronos_datasets can be read locally by using the load_from_disk function? Or is there any other way to load the data sets stored on the disk? If I need to use multiple datasets, do I have to convert their formats separately? Thanks!
If the format needs to be converted to arrow, how should it be done for large datasets, such as weatherbench_hourly?
Beta Was this translation helpful? Give feedback.
All reactions