Skip to content

load_from_disk and save_to_disk are not compatible with each other #2424

@roholazandie

Description

@roholazandie

Describe the bug

load_from_disk and save_to_disk are not compatible. When I use save_to_disk to save a dataset to disk it works perfectly but given the same directory load_from_disk throws an error that it can't find state.json. looks like the load_from_disk only works on one split

Steps to reproduce the bug

from datasets import load_dataset
dataset = load_dataset("art")
dataset.save_to_disk("mydir")
d = Dataset.load_from_disk("mydir")

Expected results

It is expected that these two functions be the reverse of each other without more manipulation

Actual results

FileNotFoundError: [Errno 2] No such file or directory: 'mydir/art/state.json'

Environment info

  • datasets version: 1.6.2
  • Platform: Linux-5.4.0-73-generic-x86_64-with-Ubuntu-18.04-bionic
  • Python version: 3.7.10
  • PyTorch version (GPU?): 1.8.1+cu102 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions