Skip to content

Unable to load JSON saved using to_json #6891

@DarshanDeshpande

Description

@DarshanDeshpande

Describe the bug

Datasets stored in the JSON format cannot be loaded using json.load()

Steps to reproduce the bug

import json
from datasets import load_dataset

dataset = load_dataset("squad")
train_dataset, test_dataset = dataset["train"], dataset["validation"]
test_dataset.to_json("full_dataset.json")

# This works
loaded_test = load_dataset("json", data_files="full_dataset.json")

# This fails
loaded_test = json.load(open("full_dataset.json", "r"))

Expected behavior

The JSON should be correctly formatted when writing so that it can be loaded using json.load().

Environment info

Colab: https://colab.research.google.com/drive/1st1iStFUVgu9ZPvnzSzL4vDeYWDwYpUm?usp=sharing

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions