Skip to content

Add missing type hints #16059

@Rocketknight1

Description

@Rocketknight1

This issue is part of our Great Code Cleanup 2022. If you're interested in helping out, take a look at this thread, or come join us on Discord and talk with other contributors!

🚀 Add missing type hints

Type hints are used inconsistently in the transformers repo across both TF and PT models, and it'd be nice to make them a complete, consistent thing for the core models, especially because we want to develop features that depend on them!

Guide to contributing:

  1. Ensure you've read our contributing guidelines 📜
  2. Claim your architecture(s) in this thread (ensure no one is working on it). It's 100% okay to only take the TensorFlow or PyTorch version of a model, if you're not familiar with both frameworks! It's also okay to claim multiple models and group those changes into a single PR! 🎯
  3. Implement the changes as in Adding type hints for TFRoBERTa #16057 or Add type annotations for BERT and copies #16074 (see the diff on the model architectures for a few examples) 💪
  4. Open the PR and tag me in it. You should run make fixup at the end to do a code quality check before your final commit!

Tips for making your PR

  1. The files you need to edit will be in src/transformers/models/[model_name]/
  2. For TensorFlow, you want the modeling_tf_[model_name].py file. For PyTorch, you want the modeling_[model_name].py file.
  3. Remember, you do not have to cover every class in that file!. The main thing we want to cover is the call (for TF) or forward (for PT) method for user-facing classes like TFRobertaForMaskedLM or RobertaForSequenceClassification. It's not necessary to add type hints to layers or base classes like RobertaModel or TFRobertaPreTrainedModel - these are trickier to write, and generally people do not use those classes as standalone models.
  4. If you're unfamiliar with how type hints work, you can read the Python library documentation on them, but it's probably even easier to just look at another PR that added them. Take a look at the list of changes in the pull requests linked above!
  5. The types will usually be obvious - most inputs are Optional[Union[np.ndarray, tf.Tensor]] for TF models and Optional[torch.Tensor] for PyTorch models, and boolean inputs are Optional[bool]. Pay attention to the first input of TF models, though, which is usually TFModelInputType - this is because Keras handles that first input in a special way! Other inputs to pay attention to are past_key_values, which can vary between models, and also the model output type. For the base model classes like RobertaModel, you may have to look at the corresponding MainLayer to figure out the right output type! Also, note that the output type may be a tuple if return_dict is False, in which case you should specify Union[Tuple, ...]. Finally, note that in TF models, training is never None, so it should be training: bool and not training: Optional[bool].
  6. Note that some code is copied across our codebase. If you see a line like # Copied from transformers.models.bert..., this means that the code is copied from that source, and our scripts will automatically keep that in sync. If you see that, you should not edit the copied method! Instead, edit the original method it's copied from, and run make fixup to synchronize that across all the copies. Be sure you installed the development dependencies with pip install -e ".[dev"], as described in the contributor guidelines above, to ensure that the code quality tools in make fixup can run.

How can I find models that need type hints?

I used to maintain a list here, but it got out of date, I'm sorry. Instead, you can use this Colab notebook. If you run this, it will show you models in PyTorch or TF that are still missing type hints. Unlike my manually curated lists, it's guaranteed to be up to date - but do double-check that someone else in the thread hasn't claimed a model before you start, because the Colab code will only register type hints after the PR containing them is merged!

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions