Skip to content

Dataset JSON is incorrect #2743

@severo

Description

@severo

Describe the bug

The JSON file generated for https://github.com/huggingface/datasets/blob/573f3d35081cee239d1b962878206e9abe6cde91/datasets/journalists_questions/journalists_questions.py is https://github.com/huggingface/datasets/blob/573f3d35081cee239d1b962878206e9abe6cde91/datasets/journalists_questions/dataset_infos.json.

The only config should be plain_text, but the first key in the JSON is journalists_questions (the dataset id) instead.

{
  "journalists_questions": {
    "description": "The journalists_questions corpus (version 1.0) is a collection of 10K human-written Arabic\ntweets manually labeled for question identification over Arabic tweets posted by journalists.\n",
    ...

Steps to reproduce the bug

Look at the files.

Expected results

The first key should be plain_text:

{
  "plain_text": {
    "description": "The journalists_questions corpus (version 1.0) is a collection of 10K human-written Arabic\ntweets manually labeled for question identification over Arabic tweets posted by journalists.\n",
    ...

Actual results

{
  "journalists_questions": {
    "description": "The journalists_questions corpus (version 1.0) is a collection of 10K human-written Arabic\ntweets manually labeled for question identification over Arabic tweets posted by journalists.\n",
    ...

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions