Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions datasets/wiki_auto/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ The `auto` config shows a pair of an English and corresponding Simple English Wi
'simple_article_url': 'https://simple.wikipedia.org/wiki?curid=702227'}}
```

Finally, the `auto_acl`, the `auto_full_no_split`, and the `auto_full_with_split` configs were obtained by selecting the aligned pairs of sentences from `auto` to provide a ready-to-go aligned dataset to train a sequence-to-sequence system. While `auto_acl` corresponds to the filtered version of the data used to train the systems in the paper, `auto_full_no_split` and `auto_full_with_split` correspond to the unfiltered versions with and without sentence splits respectively. In the `auto_full_with_split` config, we join the sentences in the simple article mapped to the same sentence in the complex article to capture sentence splitting. Split sentences are seperated by a `<SEP>` token. In the `auto_full_no_split` config, we do not join the splits and treat them as seperate pairs. An instance is a single pair of sentences:
Finally, the `auto_acl`, the `auto_full_no_split`, and the `auto_full_with_split` configs were obtained by selecting the aligned pairs of sentences from `auto` to provide a ready-to-go aligned dataset to train a sequence-to-sequence system. While `auto_acl` corresponds to the filtered version of the data used to train the systems in the paper, `auto_full_no_split` and `auto_full_with_split` correspond to the unfiltered versions with and without sentence splits respectively. In the `auto_full_with_split` config, we join the sentences in the simple article mapped to the same sentence in the complex article to capture sentence splitting. Split sentences are separated by a `<SEP>` token. In the `auto_full_no_split` config, we do not join the splits and treat them as separate pairs. An instance is a single pair of sentences:
```
{'normal_sentence': 'In early work , Rutherford discovered the concept of radioactive half-life , the radioactive element radon , and differentiated and named alpha and beta radiation .\n',
'simple_sentence': 'Rutherford discovered the radioactive half-life , and the three parts of radiation which he named Alpha , Beta , and Gamma .\n'}
Expand Down Expand Up @@ -240,4 +240,4 @@ You can cite the paper presenting the dataset as:

### Contributions

Thanks to [@yjernite](https://github.com/yjernite), [@mounicam](https://github.com/mounicam) for adding this dataset.
Thanks to [@yjernite](https://github.com/yjernite), [@mounicam](https://github.com/mounicam) for adding this dataset.