Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions docs/source/main_classes/model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,9 @@ are common among all the models to:
- prune the attention heads of the model.

The other methods that are common to each model are defined in :class:`~transformers.modeling_utils.ModuleUtilsMixin`
(for the PyTorch models) and :class:`~transformers.modeling_tf_utils.TFModuleUtilsMixin` (for the TensorFlow models).
(for the PyTorch models) and :class:`~transformers.modeling_tf_utils.TFModuleUtilsMixin` (for the TensorFlow models) or
for text generation, :class:`~transformers.generation_utils.GenerationMixin` (for the PyTorch models) and
:class:`~transformers.generation_tf_utils.TFGenerationMixin` (for the TensorFlow models)


``PreTrainedModel``
Expand Down Expand Up @@ -46,4 +48,8 @@ The other methods that are common to each model are defined in :class:`~transfor
Generative models
~~~~~~~~~~~~~~~~~

Coming soon
.. autoclass:: transformers.generation_utils.GenerationMixin
:members:

.. autoclass:: transformers.generation_tf_utils.TFGenerationMixin
:members:
2 changes: 1 addition & 1 deletion src/transformers/configuration_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ class PretrainedConfig(object):
keep for top-k-filtering that will be used by default in the :obj:`generate` method of the model.
- **top_p** (:obj:`float`, `optional`, defaults to 1) -- Value that will be used by default in the
:obj:`generate` method of the model for ``top_p``. If set to float < 1, only the most probable tokens
with probabilities that add up to ``top_p`` or highest are kept for generation.
with probabilities that add up to ``top_p`` or higher are kept for generation.
- **repetition_penalty** (:obj:`float`, `optional`, defaults to 1) -- Parameter for repetition penalty
that will be used by default in the :obj:`generate` method of the model. 1.0 means no penalty.
- **length_penalty** (:obj:`float`, `optional`, defaults to 1) -- Exponential penalty to the length that
Expand Down
138 changes: 66 additions & 72 deletions src/transformers/generation_tf_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,15 @@

class TFGenerationMixin:
"""
A class contraining all of the functions supporting generation, to be used as a mixin in TFPreTrainedModel.
A class contraining all of the functions supporting generation, to be used as a mixin in
:class:`~transfomers.TFPreTrainedModel`.
"""

def prepare_inputs_for_generation(self, inputs, **kwargs):
"""
Implement in subclasses of :class:`~transfomers.TFPreTrainedModel` for custom behavior to prepare inputs in the
generate method.
"""
return {"inputs": inputs}

def _use_cache(self, outputs, use_cache):
Expand Down Expand Up @@ -62,87 +67,76 @@ def generate(
decoder_start_token_id=None,
use_cache=None,
):
r""" Generates sequences for models with a LM head. The method currently supports greedy or penalized greedy decoding, sampling with top-k or nucleus sampling
and beam-search.

Adapted in part from `Facebook's XLM beam search code`_.
r"""
Generates sequences for models with a language modeling head. The method currently supports greedy decoding,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could mayb add here that this applies to all AUTO_MODEL_FOR_CAUSAL_LM and AUTO_MODEL_FOR _SEQ2SEQ -> not sure though whether this is very useful for the user

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are not documented so it's not really useful inside the documentation.

beam-search decoding, sampling with temperature, sampling with top-k or nucleus sampling.

.. _`Facebook's XLM beam search code`:
https://github.com/facebookresearch/XLM/blob/9e6f6814d17be4fe5b15f2e6c43eb2b2d76daeb4/src/model/transformer.py#L529
Adapted in part from `Facebook's XLM beam search code
<https://github.com/facebookresearch/XLM/blob/9e6f6814d17be4fe5b15f2e6c43eb2b2d76daeb4/src/model/transformer.py#L529>`__.

Apart from :obj:`input_ids` and :obj:`attention_mask`, all the arguments below will default to the value of the
attribute of the same name inside the :class:`~transformers.PretrainedConfig` of the model. The default values
indicated are the default values of those config.

Parameters:

input_ids: (`optional`) `tf.Tensor` of `dtype=tf.int32` of shape `(batch_size, sequence_length)`
The sequence used as a prompt for the generation. If `None` the method initializes
it as an empty `tf.Tensor` of shape `(1,)`.

max_length: (`optional`) int
The max length of the sequence to be generated. Between 1 and infinity. Default to 20.

min_length: (`optional`) int
The min length of the sequence to be generated. Between 0 and infinity. Default to 0.
do_sample: (`optional`) bool
If set to `False` greedy decoding is used. Otherwise sampling is used. Defaults to `False` as defined in `configuration_utils.PretrainedConfig`.

early_stopping: (`optional`) bool
if set to `True` beam search is stopped when at least `num_beams` sentences finished per batch. Defaults to `False` as defined in `configuration_utils.PretrainedConfig`.

num_beams: (`optional`) int
Number of beams for beam search. Must be between 1 and infinity. 1 means no beam search. Default to 1.

temperature: (`optional`) float
The value used to module the next token probabilities. Must be strictely positive. Default to 1.0.

top_k: (`optional`) int
The number of highest probability vocabulary tokens to keep for top-k-filtering. Between 1 and infinity. Default to 50.

top_p: (`optional`) float
The cumulative probability of parameter highest probability vocabulary tokens to keep for nucleus sampling. Must be between 0 and 1. Default to 1.

repetition_penalty: (`optional`) float
The parameter for repetition penalty. Between 1.0 and infinity. 1.0 means no penalty. Default to 1.0.

bos_token_id: (`optional`) int
Beginning of sentence token if no prompt is provided. Default to specicic model bos_token_id or None if it does not exist.

pad_token_id: (`optional`) int
Pad token. Defaults to pad_token_id as defined in the models config.

eos_token_id: (`optional`) int
EOS token. Defaults to eos_token_id as defined in the models config.

length_penalty: (`optional`) float
Exponential penalty to the length. Default to 1.

no_repeat_ngram_size: (`optional`) int
If set to int > 0, all ngrams of size `no_repeat_ngram_size` can only occur once.

bad_words_ids: (`optional`) list of lists of int
`bad_words_ids` contains tokens that are not allowed to be generated. In order to get the tokens of the words that should not appear in the generated text, use `tokenizer.encode(bad_word, add_prefix_space=True)`.

num_return_sequences: (`optional`) int
The number of independently computed returned sequences for each element in the batch. Default to 1.

attention_mask (`optional`) obj: `tf.Tensor` with `dtype=tf.int32` of same shape as `input_ids`
Mask to avoid performing attention on padding token indices.
Mask values selected in ``[0, 1]``:
``1`` for tokens that are NOT MASKED, ``0`` for MASKED tokens.
Defaults to `None`.
input_ids (:obj:`tf.Tensor` of :obj:`dtype=tf.int32` and shape :obj:`(batch_size, sequence_length)`, `optional`):
The sequence used as a prompt for the generation. If :obj:`None` the method initializes
it as an empty :obj:`tf.Tensor` of shape :obj:`(1,)`.
max_length (:obj:`int`, `optional`, defaults to 20):
The maximum length of the sequence to be generated.
min_length (:obj:`int`, `optional`, defaults to 10):
The minimum length of the sequence to be generated.
do_sample (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not to use sampling ; use greedy decoding otherwise.
early_stopping (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether to stop the beam search when at least ``num_beams`` sentences are finished per batch or not.
num_beams (:obj:`int`, `optional`, defaults to 1):
Number of beams for beam search. 1 means no beam search.
temperature (:obj:`float`, `optional`, defaults tp 1.0):
The value used to module the next token probabilities.
top_k (:obj:`int`, `optional`, defaults to 50):
The number of highest probability vocabulary tokens to keep for top-k-filtering.
top_p (:obj:`float`, `optional`, defaults to 1.0):
If set to float < 1, only the most probable tokens with probabilities that add up to ``top_p`` or
higher are kept for generation.
repetition_penalty (:obj:`float`, `optional`, defaults to 1.0):
Copy link
Contributor

@patrickvonplaten patrickvonplaten Aug 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can link even to the original paper here, where it is explained: https://arxiv.org/pdf/1909.05858.pdf - page 5 first equation, since it might be hard to understand what the penalty does exactly. But maybe this goes to far here...most of these params are explained here: https://huggingface.co/blog/how-to-generate maybe we can add a link above "Parameters".

The parameter for repetition penalty. 1.0 means no penalty.
pad_token_id (:obj:`int`, `optional`):
The id of the `padding` token.
bos_token_id (:obj:`int`, `optional`):
The id of the `beginning-of-stream` token.
eos_token_id (:obj:`int`, `optional`):
The id of the `end-of-stream` token.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beginning-of-sequence and end-of-sequence?

length_penalty (:obj:`float`, `optional`, defaults to 1.0):
Exponential penalty to the length. 1.0 means no penalty.
Copy link
Contributor

@patrickvonplaten patrickvonplaten Aug 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this param can be confusing as a penalty < 1.0 penalizes length whereas a penalty > 1.0 encourages longer sequences. Maybe we can add a sentence like:
Note: In order to encourage the model to generate shorter sequences, `length_penalty` should be set < 1.0. In order to encourage the model to produce longer sequences, the parameter should be set to > 1.0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #4915

no_repeat_ngram_size (:obj:`int`, `optional`, defaults to 0):
If set to int > 0, all ngrams of that size can only occur once.
bad_words_ids(:obj:`List[int]`, `optional`):
List of token ids that are not allowed to be generated. In order to get the tokens of the words that
should not appear in the generated text, use :obj:`tokenizer.encode(bad_word, add_prefix_space=True)`.
num_return_sequences(:obj:`int`, `optional`, defaults to 1):
The number of independently computed returned sequences for each element in the batch.
attention_mask (:obj:`tf.Tensor` of :obj:`dtype=tf.int32` and shape :obj:`(batch_size, sequence_length)`, `optional`):
Mask to avoid performing attention on padding token indices. Mask values are in ``[0, 1]``, 1 for
tokens that are mot masked, and 0 for masked tokens.

If not provided, will default to a tensor the same shape as :obj:`input_ids` that masks the pad token.

`What are attention masks? <../glossary.html#attention-mask>`__

decoder_start_token_id=None: (`optional`) int
If an encoder-decoder model starts decoding with a different token than BOS.
Defaults to `None` and is changed to `BOS` later.

use_cache: (`optional`) bool
If `use_cache` is True, past key values are used to speed up decoding if applicable to model. Defaults to `True`.
decoder_start_token_id (:obj:`int`, `optional`):
If an encoder-decoder model starts decoding with a different token than `bos`, the id of that token.
use_cache: (:obj:`bool`, `optional`, defaults to :obj:`True`):
Whether or not the model should use the past last key/values attentions (if applicable to the model) to
speed up decoding.
model_specific_kwargs:
Additional model specific kwargs will be forwarded to the :obj:`forward` function of the model.

Return:

output: `tf.Tensor` of `dtype=tf.int32` shape `(batch_size * num_return_sequences, sequence_length)`
sequence_length is either equal to max_length or shorter if all batches finished early due to the `eos_token_id`
:obj:`tf.Tensor` of :obj:`dtype=tf.int32` and shape :obj:`(batch_size * num_return_sequences, sequence_length)`:
The generated sequences. The second dimension (sequence_length) is either equal to :obj:`max_length` or
shorter if all batches finished early due to the :obj:`eos_token_id`.

Examples::

Expand Down
Loading