Added the HLGD dataset #2325

tingofurro · 2021-05-05T16:53:29Z

Added the Headline Grouping Dataset (HLGD), from the NAACL2021 paper: News Headline Grouping as a Challenging NLU Task
Dataset Link: https://github.com/tingofurro/headline_grouping
Paper link: https://people.eecs.berkeley.edu/~phillab/pdfs/NAACL2021_HLG.pdf

bhavitvyamalik

Looks really cool! I've commented a few changes that might help you pass the tests. Also, please add dummy_data and make sure this dataset passes real data and dummy data tests. You can find instructions for the same here.

bhavitvyamalik · 2021-05-06T08:26:46Z

datasets/hlgd/README.md

@@ -0,0 +1,192 @@
+---
+YAML tags:


Can you please remove YAML tags:. Removing this will pass your check_code_quality test

Alright it should be gone!

I've also added the dummy_data and run tests both with real and dummy data!

bhavitvyamalik · 2021-05-06T08:28:57Z

datasets/hlgd/hlgd.py

+        # This method handles input defined in _split_generators to yield (key, example) tuples from the dataset.
+        # The `key` is here for legacy reason (tfds) and is not important in itself.
+
+        with open(filepath, "r") as f:


Please add encoding as well when you read the json file. This also causes certain tests to fail.
with open(filepath, encoding="utf-8") as f:

Got it I've added the encoding!

datasets/hlgd/hlgd.py

… dummy data

bhavitvyamalik · 2021-05-06T18:13:33Z

datasets/hlgd/README.md

+extended:
+- original


I think you should remove this part here

Suggested change

extended:

- original

extended:

- original

Alright I've removed it but I had build this YAML using this tool: https://huggingface.co/datasets/tagging/
Is it a problem of different versions of the YAML formats?

In any case, it seems to have solved the problem, so thank you for the help figuring it out!

I use this for dataset tagging

tingofurro · 2021-05-10T15:44:02Z

Is there anything else needed from my end?

lhoestq

Excellent thank you !
Good job with the dataset script and the dataset card, they are really good.

I just left three comments:

datasets/hlgd/README.md

datasets/hlgd/hlgd.py

lhoestq

Thanks !

Merging since the CI error is unrelated to this PR and fixed on master

tingofurro · 2021-05-12T14:55:13Z

Thanks Bhavitvya and Quentin, this was very streamlined!

Added the HLGD dataset (huggingface#2325)

tingofurro added 5 commits May 5, 2021 11:59

Adding README and loading script for HLGD dataset

f19111f

Update README.md

78bedd1

Ran make style and flake8

991d2e0

Merge remote-tracking branch 'upstream/master' into hlgd

0aa5fd4

Added example data instance and data fields

ad5806d

bhavitvyamalik suggested changes May 6, 2021

View reviewed changes

[HLGD] Fixing bugs in dataset class, fixing typo in README and adding…

5358c23

… dummy data

bhavitvyamalik reviewed May 6, 2021

View reviewed changes

Update README.md

be66eb6

lhoestq reviewed May 10, 2021

View reviewed changes

datasets/hlgd/README.md Outdated Show resolved Hide resolved

datasets/hlgd/README.md Outdated Show resolved Hide resolved

datasets/hlgd/hlgd.py Outdated Show resolved Hide resolved

[HLGD] Changed label names, README cleanup

1151f6d

lhoestq approved these changes May 12, 2021

View reviewed changes

lhoestq merged commit 6c5742c into huggingface:master May 12, 2021

JayantGoel001 added a commit to JayantGoel001/datasets-1 that referenced this pull request May 12, 2021

Merge pull request #12 from huggingface/master

df23281

Added the HLGD dataset (huggingface#2325)

Added the HLGD dataset #2325

Added the HLGD dataset #2325

Uh oh!

Conversation

tingofurro commented May 5, 2021

Uh oh!

bhavitvyamalik left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tingofurro commented May 10, 2021

Uh oh!

lhoestq left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lhoestq left a comment

Choose a reason for hiding this comment

Uh oh!

tingofurro commented May 12, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bhavitvyamalik left a comment •

edited

Loading

lhoestq left a comment •

edited

Loading