huggingface
diff --git a/‎.circleci/config.yml‎
Lines changed: 4 additions & 4 deletions b/‎.circleci/config.yml‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎.github/ISSUE_TEMPLATE/add-dataset.md‎
Lines changed: 1 addition & 1 deletion b/‎.github/ISSUE_TEMPLATE/add-dataset.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.github/workflows/benchmarks.yaml‎
Lines changed: 2 additions & 2 deletions b/‎.github/workflows/benchmarks.yaml‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎.github/workflows/build_documentation.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/build_documentation.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.github/workflows/test-audio.yml‎
Lines changed: 2 additions & 2 deletions b/‎.github/workflows/test-audio.yml‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎.github/workflows/update-hub-repositories.yaml‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/update-hub-repositories.yaml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎ADD_NEW_DATASET.md‎
Lines changed: 24 additions & 24 deletions b/‎ADD_NEW_DATASET.md‎
Lines changed: 24 additions & 24 deletions
diff --git a/‎CONTRIBUTING.md‎
Lines changed: 6 additions & 6 deletions b/‎CONTRIBUTING.md‎
Lines changed: 6 additions & 6 deletions
@@ -19,7 +19,7 @@ jobs:
             - run: pip install .[tests]
             - run: pip install -r additional-tests-requirements.txt --no-deps
             - run: pip install pyarrow --upgrade
-            - run: HF_SCRIPTS_VERSION=master HF_ALLOW_CODE_EVAL=1 python -m pytest -d --tx 2*popen//python=python3.6 --dist loadfile -sv ./tests/
+            - run: HF_SCRIPTS_VERSION=main HF_ALLOW_CODE_EVAL=1 python -m pytest -d --tx 2*popen//python=python3.6 --dist loadfile -sv ./tests/
 
     run_dataset_script_tests_pyarrow_6:
         working_directory: ~/datasets
@@ -36,7 +36,7 @@ jobs:
             - run: pip install .[tests]
             - run: pip install -r additional-tests-requirements.txt --no-deps
             - run: pip install pyarrow==6.0.0
-            - run: HF_SCRIPTS_VERSION=master HF_ALLOW_CODE_EVAL=1 python -m pytest -d --tx 2*popen//python=python3.6 --dist loadfile -sv ./tests/
+            - run: HF_SCRIPTS_VERSION=main HF_ALLOW_CODE_EVAL=1 python -m pytest -d --tx 2*popen//python=python3.6 --dist loadfile -sv ./tests/
 
     run_dataset_script_tests_pyarrow_latest_WIN:
         working_directory: ~/datasets
@@ -56,7 +56,7 @@ jobs:
                 pip install pyarrow --upgrade
             - run: |
                 conda activate py37
-                $env:HF_SCRIPTS_VERSION="master"
+                $env:HF_SCRIPTS_VERSION="main"
                 python -m pytest -n 2 --dist loadfile -sv ./tests/
 
     run_dataset_script_tests_pyarrow_6_WIN:
@@ -77,7 +77,7 @@ jobs:
                 pip install pyarrow==6.0.0
             - run: |
                 conda activate py37
-                $env:HF_SCRIPTS_VERSION="master"
+                $env:HF_SCRIPTS_VERSION="main"
                 python -m pytest -n 2 --dist loadfile -sv ./tests/
 
     check_code_quality:
 
@@ -14,4 +14,4 @@ assignees: ''
 - **Data:** *link to the Github repository or current dataset location*
 - **Motivation:** *what are some good reasons to have this dataset*
 
-Instructions to add a new dataset can be found [here](https://github.com/huggingface/datasets/blob/master/ADD_NEW_DATASET.md).
+Instructions to add a new dataset can be found [here](https://github.com/huggingface/datasets/blob/main/ADD_NEW_DATASET.md).
@@ -22,7 +22,7 @@ jobs:
           dvc repro --force
 
           git fetch --prune
-          dvc metrics diff --show-json master > report.json
+          dvc metrics diff --show-json main > report.json
 
           python ./benchmarks/format.py report.json report.md
 
@@ -35,7 +35,7 @@ jobs:
           dvc repro --force
 
           git fetch --prune
-          dvc metrics diff --show-json master > report.json
+          dvc metrics diff --show-json main > report.json
 
           python ./benchmarks/format.py report.json report.md
 
 
@@ -3,7 +3,7 @@ name: Build documentation
 on:
   push:
     branches:
-      - master
+      - main
       - doc-builder*
       - v*-release
 
 
@@ -3,7 +3,7 @@ name: Test audio
 on:
   pull_request:
     branches:
-    - master
+    - main
 
 jobs:
   test:
@@ -27,4 +27,4 @@ jobs:
           pip install pyarrow --upgrade
       - name: Test audio with pytest
         run: |
-          HF_SCRIPTS_VERSION=master python -m pytest -n 2 -sv ./tests/features/test_audio.py
+          HF_SCRIPTS_VERSION=main python -m pytest -n 2 -sv ./tests/features/test_audio.py
@@ -3,7 +3,7 @@ name: Update Hub repositories
 on:
   push:
     branches:
-      - master
+      - main
 
 jobs:
   update-hub-repositories:
 
@@ -70,11 +70,11 @@ You are now ready to start the process of adding the dataset. We will create the
 
 	```bash
 	git fetch upstream
-	git rebase upstream/master
+	git rebase upstream/main
 	git checkout -b a-descriptive-name-for-my-changes
 	```
 
-	**Do not** work on the `master` branch.
+	**Do not** work on the `main` branch.
 
 3. Create your dataset folder under `datasets/<your_dataset_name>`:
 
@@ -96,9 +96,9 @@ You are now ready to start the process of adding the dataset. We will create the
 	- Download/open the data to see how it looks like
 	- While you explore and read about the dataset, you can complete some sections of the dataset card (the online form or the one you have just created at `./datasets/<your_dataset_name>/README.md`). You can just copy the information you meet in your readings in the relevant sections of the dataset card (typically in `Dataset Description`, `Dataset Structure` and `Dataset Creation`).
 
-		If you need more information on a section of the dataset card, a detailed guide is in the `README_guide.md` here: https://github.com/huggingface/datasets/blob/master/templates/README_guide.md.
+		If you need more information on a section of the dataset card, a detailed guide is in the `README_guide.md` here: https://github.com/huggingface/datasets/blob/main/templates/README_guide.md.
 
-		There is a also a (very detailed) example here: https://github.com/huggingface/datasets/tree/master/datasets/eli5.
+		There is a also a (very detailed) example here: https://github.com/huggingface/datasets/tree/main/datasets/eli5.
 
 		Don't spend too much time completing the dataset card, just copy what you find when exploring the dataset documentation. If you can't find all the information it's ok. You can always spend more time completing the dataset card while we are reviewing your PR (see below) and the dataset card will be open for everybody to complete them afterwards. If you don't know what to write in a section, just leave the `[More Information Needed]` text.
 
@@ -109,31 +109,31 @@ Now let's get coding :-)
 
 The dataset script is the main entry point to load and process the data. It is a python script under `datasets/<your_dataset_name>/<your_dataset_name>.py`.
 
-There is a detailed explanation on how the library and scripts are organized [here](https://huggingface.co/docs/datasets/master/about_dataset_load.html).
+There is a detailed explanation on how the library and scripts are organized [here](https://huggingface.co/docs/datasets/main/about_dataset_load.html).
 
 Note on naming: the dataset class should be camel case, while the dataset short_name is its snake case equivalent (ex: `class BookCorpus` for the dataset `book_corpus`).
 
-To add a new dataset, you can start from the empty template which is [in the `templates` folder](https://github.com/huggingface/datasets/blob/master/templates/new_dataset_script.py):
+To add a new dataset, you can start from the empty template which is [in the `templates` folder](https://github.com/huggingface/datasets/blob/main/templates/new_dataset_script.py):
 
 ```bash
 cp ./templates/new_dataset_script.py ./datasets/<your_dataset_name>/<your_dataset_name>.py
 ```
 
-And then go progressively through all the `TODO` in the template 🙂. If it's your first dataset addition and you are a bit lost among the information to fill in, you can take some time to read the [detailed explanation here](https://huggingface.co/docs/datasets/master/dataset_script.html).
+And then go progressively through all the `TODO` in the template 🙂. If it's your first dataset addition and you are a bit lost among the information to fill in, you can take some time to read the [detailed explanation here](https://huggingface.co/docs/datasets/main/dataset_script.html).
 
 You can also start (or copy any part) from one of the datasets of reference listed below. The main criteria for choosing among these reference dataset is the format of the data files (JSON/JSONL/CSV/TSV/text) and whether you need or don't need several configurations (see above explanations on configurations). Feel free to reuse any parts of the following examples and adapt them to your case:
 
-- question-answering: [squad](https://github.com/huggingface/datasets/blob/master/datasets/squad/squad.py) (original data are in json)
-- natural language inference: [snli](https://github.com/huggingface/datasets/blob/master/datasets/snli/snli.py) (original data are in text files with tab separated columns)
-- POS/NER: [conll2003](https://github.com/huggingface/datasets/blob/master/datasets/conll2003/conll2003.py) (original data are in text files with one token per line)
-- sentiment analysis: [allocine](https://github.com/huggingface/datasets/blob/master/datasets/allocine/allocine.py) (original data are in jsonl files)
-- text classification: [ag_news](https://github.com/huggingface/datasets/blob/master/datasets/ag_news/ag_news.py) (original data are in csv files)
-- translation: [flores](https://github.com/huggingface/datasets/blob/master/datasets/flores/flores.py) (original data come from text files - one per language)
-- summarization: [billsum](https://github.com/huggingface/datasets/blob/master/datasets/billsum/billsum.py) (original data are in json files)
-- benchmark: [glue](https://github.com/huggingface/datasets/blob/master/datasets/glue/glue.py) (original data are various formats)
-- multilingual: [xquad](https://github.com/huggingface/datasets/blob/master/datasets/xquad/xquad.py) (original data are in json)
-- multitask: [matinf](https://github.com/huggingface/datasets/blob/master/datasets/matinf/matinf.py) (original data need to be downloaded by the user because it requires authentication)
-- speech recognition: [librispeech_asr](https://github.com/huggingface/datasets/blob/master/datasets/librispeech_asr/librispeech_asr.py) (original data is in .flac format)
+- question-answering: [squad](https://github.com/huggingface/datasets/blob/main/datasets/squad/squad.py) (original data are in json)
+- natural language inference: [snli](https://github.com/huggingface/datasets/blob/main/datasets/snli/snli.py) (original data are in text files with tab separated columns)
+- POS/NER: [conll2003](https://github.com/huggingface/datasets/blob/main/datasets/conll2003/conll2003.py) (original data are in text files with one token per line)
+- sentiment analysis: [allocine](https://github.com/huggingface/datasets/blob/main/datasets/allocine/allocine.py) (original data are in jsonl files)
+- text classification: [ag_news](https://github.com/huggingface/datasets/blob/main/datasets/ag_news/ag_news.py) (original data are in csv files)
+- translation: [flores](https://github.com/huggingface/datasets/blob/main/datasets/flores/flores.py) (original data come from text files - one per language)
+- summarization: [billsum](https://github.com/huggingface/datasets/blob/main/datasets/billsum/billsum.py) (original data are in json files)
+- benchmark: [glue](https://github.com/huggingface/datasets/blob/main/datasets/glue/glue.py) (original data are various formats)
+- multilingual: [xquad](https://github.com/huggingface/datasets/blob/main/datasets/xquad/xquad.py) (original data are in json)
+- multitask: [matinf](https://github.com/huggingface/datasets/blob/main/datasets/matinf/matinf.py) (original data need to be downloaded by the user because it requires authentication)
+- speech recognition: [librispeech_asr](https://github.com/huggingface/datasets/blob/main/datasets/librispeech_asr/librispeech_asr.py) (original data is in .flac format)
 
 While you are developing the dataset script you can list test it by opening a python interpreter and running the script (the script is dynamically updated each time you modify it):
 
@@ -286,18 +286,18 @@ Here are the step to open the Pull-Request on the main repo.
 	It is a good idea to sync your copy of the code with the original
 	repository regularly. This way you can quickly account for changes:
 
-	- If you haven't pushed your branch yet, you can rebase on upstream/master:
+	- If you haven't pushed your branch yet, you can rebase on upstream/main:
 
 	  ```bash
 	  git fetch upstream
-	  git rebase upstream/master
+	  git rebase upstream/main
 	  ```
 	  
 	- If you have already pushed your branch, do not rebase but merge instead:
 
 	  ```bash
 	  git fetch upstream
-	  git merge upstream/master
+	  git merge upstream/main
 	  ```
 
    Push the changes to your account using:
@@ -334,7 +334,7 @@ Creating the dataset card goes in two steps:
 
    - **Very important as well:** On the right side of the tagging app, you will also find an expandable section called **Show Markdown Data Fields**. This gives you a starting point for the description of the fields in your dataset: you should paste it into the **Data Fields** section of the [online form](https://huggingface.co/datasets/card-creator/) (or your local README.md), then modify the description as needed. Briefly describe each of the fields and indicate if they have a default value (e.g. when there is no label). If the data has span indices, describe their attributes (character level or word level, contiguous or not, etc). If the datasets contains example IDs, state whether they have an inherent meaning, such as a mapping to other datasets or pointing to relationships between data points.
 
-        Example from the [ELI5 card](https://github.com/huggingface/datasets/tree/master/datasets/eli5#data-fields):
+        Example from the [ELI5 card](https://github.com/huggingface/datasets/tree/main/datasets/eli5#data-fields):
 
             Data Fields:
                 - q_id: a string question identifier for each example, corresponding to its ID in the Pushshift.io Reddit submission dumps.
@@ -343,9 +343,9 @@ Creating the dataset card goes in two steps:
                 - title_urls: list of the extracted URLs, the nth element of the list was replaced by URL_n
 
 
-   - **Very nice to have but optional for now:** Complete all you can find in the dataset card using the detailed instructions for completed it which are in the `README_guide.md` here: https://github.com/huggingface/datasets/blob/master/templates/README_guide.md.
+   - **Very nice to have but optional for now:** Complete all you can find in the dataset card using the detailed instructions for completed it which are in the `README_guide.md` here: https://github.com/huggingface/datasets/blob/main/templates/README_guide.md.
 
-		Here is a completed example: https://github.com/huggingface/datasets/tree/master/datasets/eli5 for inspiration
+		Here is a completed example: https://github.com/huggingface/datasets/tree/main/datasets/eli5 for inspiration
 
 		If you don't know what to write in a field and can find it, write: `[More Information Needed]`
 
 
@@ -41,7 +41,7 @@ If you would like to work on any of the open Issues:
 	git checkout -b a-descriptive-name-for-my-changes
 	```
 
-	**do not** work on the `master` branch.
+	**do not** work on the `main` branch.
 
 4. Set up a development environment by running the following command in a virtual environment:
 
@@ -73,7 +73,7 @@ If you would like to work on any of the open Issues:
 
 	```bash
 	git fetch upstream
-	git rebase upstream/master
+	git rebase upstream/main
     ```
 
    Push the changes to your account using:
@@ -97,15 +97,15 @@ Improving the documentation of datasets is an ever increasing effort and we invi
 
 If you see that a dataset card is missing information that you are in a position to provide (as an author of the dataset or as an experienced user), the best thing you can do is to open a Pull Request on the Hugging Face Hub. To to do, go to the "Files and versions" tab of the dataset page and edit the `README.md` file. We provide:
 
-* a [template](https://github.com/huggingface/datasets/blob/master/templates/README.md)
-* a [guide](https://github.com/huggingface/datasets/blob/master/templates/README_guide.md) describing what information should go into each of the paragraphs
-* and if you need inspiration, we recommend looking through a [completed example](https://github.com/huggingface/datasets/blob/master/datasets/eli5/README.md)
+* a [template](https://github.com/huggingface/datasets/blob/main/templates/README.md)
+* a [guide](https://github.com/huggingface/datasets/blob/main/templates/README_guide.md) describing what information should go into each of the paragraphs
+* and if you need inspiration, we recommend looking through a [completed example](https://github.com/huggingface/datasets/blob/main/datasets/eli5/README.md)
 
 Note that datasets that are outside of a namespace (`squad`, `imagenet-1k`, etc.) are maintained on GitHub. In this case you have to open a Pull request on GitHub to edit the file at `datasets/<dataset-name>/README.md`.
 
 If you are a **dataset author**... you know what to do, it is your dataset after all ;) ! We would especially appreciate if you could help us fill in information about the process of creating the dataset, and take a moment to reflect on its social impact and possible limitations if you haven't already done so in the dataset paper or in another data statement.
 
-If you are a **user of a dataset**, the main source of information should be the dataset paper if it is available: we recommend pulling information from there into the relevant paragraphs of the template. We also eagerly welcome discussions on the [Considerations for Using the Data](https://github.com/huggingface/datasets/blob/master/templates/README_guide.md#considerations-for-using-the-data) based on existing scholarship or personal experience that would benefit the whole community.
+If you are a **user of a dataset**, the main source of information should be the dataset paper if it is available: we recommend pulling information from there into the relevant paragraphs of the template. We also eagerly welcome discussions on the [Considerations for Using the Data](https://github.com/huggingface/datasets/blob/main/templates/README_guide.md#considerations-for-using-the-data) based on existing scholarship or personal experience that would benefit the whole community.
 
 Finally, if you want more information on the how and why of dataset cards, we strongly recommend reading the foundational works [Datasheets for Datasets](https://arxiv.org/abs/1803.09010) and [Data Statements for NLP](https://www.aclweb.org/anthology/Q18-1041/).