diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index f53a530e967b..6c7bf90b1dea 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -65,7 +65,8 @@ Awesome! Please provide the following information: If you are willing to contribute the model yourself, let us know so we can best guide you. -We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder. +We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them +in the [`templates`](https://github.com/huggingface/transformers/templates) folder. ### Do you want a new feature (that is not a model)? @@ -86,7 +87,9 @@ A world-class feature request addresses the following points: If your issue is well written we're already 80% of the way there by the time you post it. -We have added **templates** to guide you in the process of adding a new example script for training or testing the models in the library. You can find them in the [`templates`](./templates) folder. +We have added **templates** to guide you in the process of adding a new example script for training or testing the +models in the library. You can find them in the [`templates`](https://github.com/huggingface/transformers/templates) +folder. ## Start contributing! (Pull Requests) @@ -206,15 +209,21 @@ Follow these steps to start contributing: to be merged; 4. Make sure existing tests pass; 5. Add high-coverage tests. No quality testing = no merge. - - If you are adding a new model, make sure that you use `ModelTester.all_model_classes = (MyModel, MyModelWithLMHead,...)`, which triggers the common tests. - - If you are adding new `@slow` tests, make sure they pass using `RUN_SLOW=1 python -m pytest tests/test_my_new_model.py`. - - If you are adding a new tokenizer, write tests, and make sure `RUN_SLOW=1 python -m pytest tests/test_tokenization_{your_model_name}.py` passes. -CircleCI does not run them. -6. All public methods must have informative docstrings that work nicely with sphinx. See `modeling_ctrl.py` for an example. + - If you are adding a new model, make sure that you use + `ModelTester.all_model_classes = (MyModel, MyModelWithLMHead,...)`, which triggers the common tests. + - If you are adding new `@slow` tests, make sure they pass using + `RUN_SLOW=1 python -m pytest tests/test_my_new_model.py`. + - If you are adding a new tokenizer, write tests, and make sure + `RUN_SLOW=1 python -m pytest tests/test_tokenization_{your_model_name}.py` passes. + CircleCI does not run the slow tests. +6. All public methods must have informative docstrings that work nicely with sphinx. See `modeling_ctrl.py` for an + example. ### Tests -You can run 🤗 Transformers tests with `unittest` or `pytest`. +An extensive test suite is included to test the library behavior and several examples. Library tests can be found in +the [tests folder](https://github.com/huggingface/transformers/tree/master/tests) and examples tests in the +[examples folder](https://github.com/huggingface/transformers/tree/master/examples). We like `pytest` and `pytest-xdist` because it's faster. From the root of the repository, here's how to run tests with `pytest` for the library: @@ -261,7 +270,8 @@ $ python -m unittest discover -s examples -t examples -v ### Style guide -For documentation strings, `transformers` follows the [google -style](https://google.github.io/styleguide/pyguide.html). +For documentation strings, `transformers` follows the [google style](https://google.github.io/styleguide/pyguide.html). +Check our [documentation writing guide](https://github.com/huggingface/transformers/tree/master/docs#writing-documentation---specification) +for more information. #### This guide was heavily inspired by the awesome [scikit-learn guide to contributing](https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md) diff --git a/docs/README.md b/docs/README.md index c4360f430e24..1cfd8e01e4ba 100644 --- a/docs/README.md +++ b/docs/README.md @@ -42,20 +42,14 @@ pip install recommonmark ## Building the documentation -Make sure that there is a symlink from the `example` file (in /examples) inside the source folder. Run the following -command to generate it: - -```bash -ln -s ../../examples/README.md examples.md -``` - Once you have setup `sphinx`, you can build the documentation by running the following command in the `/docs` folder: ```bash make html ``` -A folder called ``_build/html`` should have been created. You can now open the file ``_build/html/index.html`` in your browser. +A folder called ``_build/html`` should have been created. You can now open the file ``_build/html/index.html`` in your +browser. --- **NOTE** @@ -132,8 +126,8 @@ XXXConfig :members: ``` -This will include every public method of the configuration. If for some reason you wish for a method not to be displayed -in the documentation, you can do so by specifying which methods should be in the docs: +This will include every public method of the configuration. If for some reason you wish for a method not to be +displayed in the documentation, you can do so by specifying which methods should be in the docs: ``` XXXTokenizer @@ -147,8 +141,8 @@ XXXTokenizer ### Writing source documentation -Values that should be put in `code` should either be surrounded by double backticks: \`\`like so\`\` or be written as an object -using the :obj: syntax: :obj:\`like so\`. +Values that should be put in `code` should either be surrounded by double backticks: \`\`like so\`\` or be written as +an object using the :obj: syntax: :obj:\`like so\`. When mentionning a class, it is recommended to use the :class: syntax as the mentioned class will be automatically linked by Sphinx: :class:\`transformers.XXXClass\` diff --git a/docs/source/contributing.md b/docs/source/contributing.md new file mode 120000 index 000000000000..f939e75f21a8 --- /dev/null +++ b/docs/source/contributing.md @@ -0,0 +1 @@ +../../CONTRIBUTING.md \ No newline at end of file diff --git a/docs/source/index.rst b/docs/source/index.rst index bc94c6d5f7ca..e1f5902861e9 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -142,6 +142,7 @@ conversion utilities for the following models: converting_tensorflow_models migration torchscript + contributing .. toctree:: :maxdepth: 2 diff --git a/docs/source/installation.md b/docs/source/installation.md index 1c01696b767f..c80bd44fc68f 100644 --- a/docs/source/installation.md +++ b/docs/source/installation.md @@ -1,69 +1,102 @@ # Installation -Transformers is tested on Python 3.6+ and PyTorch 1.1.0 +🤗 Transformers is tested on Python 3.6+, and PyTorch 1.1.0+ or TensorFlow 2.0+. -## With pip +You should install 🤗 Transformers in a [virtual environment](https://docs.python.org/3/library/venv.html). If you're +unfamiliar with Python virtual environments, check out the [user guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/). Create a virtual environment with the version of Python you're going +to use and activate it. -PyTorch Transformers can be installed using pip as follows: +Now, if you want to use 🤗 Transformers, you can install it with pip. If you'd like to play with the examples, you +must install it from source. -``` bash +## Installation with pip + +First you need to install one of, or both, TensorFlow 2.0 and PyTorch. +Please refer to [TensorFlow installation page](https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available) +and/or [PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) regarding the specific +install command for your platform. + +When TensorFlow 2.0 and/or PyTorch has been installed, 🤗 Transformers can be installed using pip as follows: + +```bash pip install transformers ``` -## From source +Alternatively, for CPU-support only, you can install 🤗 Transformers and PyTorch in one line with + +```bash +pip install transformers[torch] +``` + +or 🤗 Transformers and TensorFlow 2.0 in one line with + +```bash +pip install transformers[tf-cpu] +``` + +To check 🤗 Transformers is properly installed, run the following command: + +```bash +python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('I hate you'))" +``` + +It should download a pretrained model then print something like + +```bash +[{'label': 'NEGATIVE', 'score': 0.9991129040718079}] +``` + +(Note that TensorFlow will print additional stuff before that last statement.) + +## Installing from source -To install from source, clone the repository and install with: +To install from source, clone the repository and install with the following commands: ``` bash git clone https://github.com/huggingface/transformers.git cd transformers -pip install . +pip install -e . +``` + +Again, you can run + +```bash +python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('I hate you'))" ``` +to check 🤗 Transformers is properly installed. + ## Caching models This library provides pretrained models that will be downloaded and cached locally. Unless you specify a location with -`cache_dir=...` when you use the `from_pretrained` method, these models will automatically be downloaded in the -folder given by the shell environment variable ``TRANSFORMERS_CACHE``. The default value for it will be the PyTorch +`cache_dir=...` when you use methods like `from_pretrained`, these models will automatically be downloaded in the +folder given by the shell environment variable ``TRANSFORMERS_CACHE``. The default value for it will be the PyTorch cache home followed by ``/transformers/`` (even if you don't have PyTorch installed). This is (by order of priority): * shell environment variable ``ENV_TORCH_HOME`` * shell environment variable ``ENV_XDG_CACHE_HOME`` + ``/torch/`` * default: ``~/.cache/torch/`` -So if you don't have any specific environment variable set, the cache directory will be at +So if you don't have any specific environment variable set, the cache directory will be at ``~/.cache/torch/transformers/``. -**Note:** If you have set a shell enviromnent variable for one of the predecessors of this library -(``PYTORCH_TRANSFORMERS_CACHE`` or ``PYTORCH_PRETRAINED_BERT_CACHE``), those will be used if there is no shell +**Note:** If you have set a shell enviromnent variable for one of the predecessors of this library +(``PYTORCH_TRANSFORMERS_CACHE`` or ``PYTORCH_PRETRAINED_BERT_CACHE``), those will be used if there is no shell enviromnent variable for ``TRANSFORMERS_CACHE``. -## Tests - -An extensive test suite is included to test the library behavior and several examples. Library tests can be found in the [tests folder](https://github.com/huggingface/transformers/tree/master/tests) and examples tests in the [examples folder](https://github.com/huggingface/transformers/tree/master/examples). - -Refer to the [contributing guide](https://github.com/huggingface/transformers/blob/master/CONTRIBUTING.md#tests) for details about running tests. - -## OpenAI GPT original tokenization workflow - -If you want to reproduce the original tokenization process of the `OpenAI GPT` paper, you will need to install `ftfy` and `SpaCy`: - -``` bash -pip install spacy ftfy==4.4.3 -python -m spacy download en -``` - -If you don't install `ftfy` and `SpaCy`, the `OpenAI GPT` tokenizer will default to tokenize using BERT's `BasicTokenizer` followed by Byte-Pair Encoding (which should be fine for most usage, don't worry). - -## Note on model downloads (Continuous Integration or large-scale deployments) +### Note on model downloads (Continuous Integration or large-scale deployments) -If you expect to be downloading large volumes of models (more than 1,000) from our hosted bucket (for instance through your CI setup, or a large-scale production deployment), please cache the model files on your end. It will be way faster, and cheaper. Feel free to contact us privately if you need any help. +If you expect to be downloading large volumes of models (more than 1,000) from our hosted bucket (for instance through +your CI setup, or a large-scale production deployment), please cache the model files on your end. It will be way +faster, and cheaper. Feel free to contact us privately if you need any help. ## Do you want to run a Transformer model on a mobile device? You should check out our [swift-coreml-transformers](https://github.com/huggingface/swift-coreml-transformers) repo. -It contains a set of tools to convert PyTorch or TensorFlow 2.0 trained Transformer models (currently contains `GPT-2`, `DistilGPT-2`, `BERT`, and `DistilBERT`) to CoreML models that run on iOS devices. +It contains a set of tools to convert PyTorch or TensorFlow 2.0 trained Transformer models (currently contains `GPT-2`, +`DistilGPT-2`, `BERT`, and `DistilBERT`) to CoreML models that run on iOS devices. -At some point in the future, you'll be able to seamlessly move from pre-training or fine-tuning models in PyTorch to productizing them in CoreML, -or prototype a model or an app in CoreML then research its hyperparameters or architecture from PyTorch. Super exciting! +At some point in the future, you'll be able to seamlessly move from pre-training or fine-tuning models in PyTorch or +TensorFlow 2.0 to productizing them in CoreML, or prototype a model or an app in CoreML then research its +hyperparameters or architecture from PyTorch or TensorFlow 2.0. Super exciting! diff --git a/docs/source/model_doc/gpt.rst b/docs/source/model_doc/gpt.rst index 449a85c3fec1..4c54dee70a58 100644 --- a/docs/source/model_doc/gpt.rst +++ b/docs/source/model_doc/gpt.rst @@ -38,6 +38,17 @@ Hugging Face showcasing the generative capabilities of several models. GPT is on The original code can be found `here `_. +Note: + +If you want to reproduce the original tokenization process of the `OpenAI GPT` paper, you will need to install +``ftfy`` and ``SpaCy``:: + + pip install spacy ftfy==4.4.3 + python -m spacy download en + +If you don't install ``ftfy`` and ``SpaCy``, the :class:`transformers.OpenAIGPTTokenizer` will default to tokenize using +BERT's :obj:`BasicTokenizer` followed by Byte-Pair Encoding (which should be fine for most usage, don't +worry). OpenAIGPTConfig ~~~~~~~~~~~~~~~~~~~~~