Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 64 additions & 7 deletions datasets/recipe_nlg/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,29 +66,63 @@ pretty_name: RecipeNLG

### Dataset Summary

[More Information Needed]
RecipeNLG: A Cooking Recipes Dataset for Semi-Structured Text Generation.

While the RecipeNLG dataset is based on the Recipe1M+ dataset, it greatly expands the number of recipes available.
The new dataset provides over 1 million new, preprocessed and deduplicated recipes on top of the Recipe1M+ dataset.

### Supported Tasks and Leaderboards

[More Information Needed]

### Languages

[More Information Needed]
The dataset is in English.

## Dataset Structure

### Data Instances

[More Information Needed]
```
{'id': 0,
'title': 'No-Bake Nut Cookies',
'ingredients': ['1 c. firmly packed brown sugar',
'1/2 c. evaporated milk',
'1/2 tsp. vanilla',
'1/2 c. broken nuts (pecans)',
'2 Tbsp. butter or margarine',
'3 1/2 c. bite size shredded rice biscuits'],
'directions': ['In a heavy 2-quart saucepan, mix brown sugar, nuts, evaporated milk and butter or margarine.',
'Stir over medium heat until mixture bubbles all over top.',
'Boil and stir 5 minutes more. Take off heat.',
'Stir in vanilla and cereal; mix well.',
'Using 2 teaspoons, drop and shape into 30 clusters on wax paper.',
'Let stand until firm, about 30 minutes.'],
'link': 'www.cookbooks.com/Recipe-Details.aspx?id=44874',
'source': 0,
'ner': ['brown sugar',
'milk',
'vanilla',
'nuts',
'butter',
'bite size shredded rice biscuits']}
```

### Data Fields

[More Information Needed]
- `id` (`int`): ID.
- `title` (`str`): Title of the recipe.
- `ingredients` (`list` of `str`): Ingredients.
- `directions` (`list` of `str`): Instruction steps.
- `link` (`str`): URL link.
- `source` (`ClassLabel`): Origin of each recipe record, with possible value {"Gathered", "Recipes1M"}:
- "Gathered" (0): Additional recipes gathered from multiple cooking web pages, using automated scripts in a web scraping process.
- "Recipes1M" (1): Recipes from "Recipe1M+" dataset.
- `ner` (`list` of `str`): NER food entities.

### Data Splits

[More Information Needed]
The dataset contains a single `train` split.

## Dataset Creation

Expand Down Expand Up @@ -146,11 +180,34 @@ pretty_name: RecipeNLG

### Licensing Information

[More Information Needed]
I (the "Researcher") have requested permission to use the RecipeNLG dataset (the "Dataset") at Poznań University of Technology (PUT). In exchange for such permission, Researcher hereby agrees to the following terms and conditions:

1. Researcher shall use the Dataset only for non-commercial research and educational purposes.
2. PUT makes no representations or warranties regarding the Dataset, including but not limited to warranties of non-infringement or fitness for a particular purpose.
3. Researcher accepts full responsibility for his or her use of the Dataset and shall defend and indemnify PUT, including its employees, Trustees, officers and agents, against any and all claims arising from Researcher's use of the Dataset including but not limited to Researcher's use of any copies of copyrighted images or text that he or she may create from the Dataset.
4. Researcher may provide research associates and colleagues with access to the Dataset provided that they first agree to be bound by these terms and conditions.
5. If Researcher is employed by a for-profit, commercial entity, Researcher's employer shall also be bound by these terms and conditions, and Researcher hereby represents that he or she is fully authorized to enter into this agreement on behalf of such employer.

### Citation Information

[More Information Needed]
```bibtex
@inproceedings{bien-etal-2020-recipenlg,
title = "{R}ecipe{NLG}: A Cooking Recipes Dataset for Semi-Structured Text Generation",
author = "Bie{\'n}, Micha{\l} and
Gilski, Micha{\l} and
Maciejewska, Martyna and
Taisner, Wojciech and
Wisniewski, Dawid and
Lawrynowicz, Agnieszka",
booktitle = "Proceedings of the 13th International Conference on Natural Language Generation",
month = dec,
year = "2020",
address = "Dublin, Ireland",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.inlg-1.4",
pages = "22--28",
}
```

### Contributions

Expand Down