-
Notifications
You must be signed in to change notification settings - Fork 31.9k
Generation doc #6470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generation doc #6470
Changes from 1 commit
2c7f4a1
680f133
eb613b5
1d6e71e
9a8c168
b5ba758
3ce85d9
93dbc3d
6a120ca
c03654b
6662976
246bf87
3369e9f
d9cbc03
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -25,10 +25,15 @@ | |
|
|
||
| class TFGenerationMixin: | ||
| """ | ||
| A class contraining all of the functions supporting generation, to be used as a mixin in TFPreTrainedModel. | ||
| A class contraining all of the functions supporting generation, to be used as a mixin in | ||
| :class:`~transfomers.TFPreTrainedModel`. | ||
| """ | ||
|
|
||
| def prepare_inputs_for_generation(self, inputs, **kwargs): | ||
| """ | ||
| Implement in subclasses of :class:`~transfomers.TFPreTrainedModel` for custom behavior to prepare inputs in the | ||
| generate method. | ||
| """ | ||
| return {"inputs": inputs} | ||
|
|
||
| def _use_cache(self, outputs, use_cache): | ||
|
|
@@ -62,87 +67,76 @@ def generate( | |
| decoder_start_token_id=None, | ||
| use_cache=None, | ||
| ): | ||
| r""" Generates sequences for models with a LM head. The method currently supports greedy or penalized greedy decoding, sampling with top-k or nucleus sampling | ||
| and beam-search. | ||
|
|
||
| Adapted in part from `Facebook's XLM beam search code`_. | ||
| r""" | ||
| Generates sequences for models with a language modeling head. The method currently supports greedy decoding, | ||
| beam-search decoding, sampling with temperature, sampling with top-k or nucleus sampling. | ||
|
|
||
| .. _`Facebook's XLM beam search code`: | ||
| https://github.com/facebookresearch/XLM/blob/9e6f6814d17be4fe5b15f2e6c43eb2b2d76daeb4/src/model/transformer.py#L529 | ||
| Adapted in part from `Facebook's XLM beam search code | ||
| <https://github.com/facebookresearch/XLM/blob/9e6f6814d17be4fe5b15f2e6c43eb2b2d76daeb4/src/model/transformer.py#L529>`__. | ||
|
|
||
| Apart from :obj:`input_ids` and :obj:`attention_mask`, all the arguments below will default to the value of the | ||
| attribute of the same name inside the :class:`~transformers.PretrainedConfig` of the model. The default values | ||
| indicated are the default values of those config. | ||
|
|
||
| Parameters: | ||
|
|
||
| input_ids: (`optional`) `tf.Tensor` of `dtype=tf.int32` of shape `(batch_size, sequence_length)` | ||
| The sequence used as a prompt for the generation. If `None` the method initializes | ||
| it as an empty `tf.Tensor` of shape `(1,)`. | ||
|
|
||
| max_length: (`optional`) int | ||
| The max length of the sequence to be generated. Between 1 and infinity. Default to 20. | ||
|
|
||
| min_length: (`optional`) int | ||
| The min length of the sequence to be generated. Between 0 and infinity. Default to 0. | ||
| do_sample: (`optional`) bool | ||
| If set to `False` greedy decoding is used. Otherwise sampling is used. Defaults to `False` as defined in `configuration_utils.PretrainedConfig`. | ||
|
|
||
| early_stopping: (`optional`) bool | ||
| if set to `True` beam search is stopped when at least `num_beams` sentences finished per batch. Defaults to `False` as defined in `configuration_utils.PretrainedConfig`. | ||
|
|
||
| num_beams: (`optional`) int | ||
| Number of beams for beam search. Must be between 1 and infinity. 1 means no beam search. Default to 1. | ||
|
|
||
| temperature: (`optional`) float | ||
| The value used to module the next token probabilities. Must be strictely positive. Default to 1.0. | ||
|
|
||
| top_k: (`optional`) int | ||
| The number of highest probability vocabulary tokens to keep for top-k-filtering. Between 1 and infinity. Default to 50. | ||
|
|
||
| top_p: (`optional`) float | ||
| The cumulative probability of parameter highest probability vocabulary tokens to keep for nucleus sampling. Must be between 0 and 1. Default to 1. | ||
|
|
||
| repetition_penalty: (`optional`) float | ||
| The parameter for repetition penalty. Between 1.0 and infinity. 1.0 means no penalty. Default to 1.0. | ||
|
|
||
| bos_token_id: (`optional`) int | ||
| Beginning of sentence token if no prompt is provided. Default to specicic model bos_token_id or None if it does not exist. | ||
|
|
||
| pad_token_id: (`optional`) int | ||
| Pad token. Defaults to pad_token_id as defined in the models config. | ||
|
|
||
| eos_token_id: (`optional`) int | ||
| EOS token. Defaults to eos_token_id as defined in the models config. | ||
|
|
||
| length_penalty: (`optional`) float | ||
| Exponential penalty to the length. Default to 1. | ||
|
|
||
| no_repeat_ngram_size: (`optional`) int | ||
| If set to int > 0, all ngrams of size `no_repeat_ngram_size` can only occur once. | ||
|
|
||
| bad_words_ids: (`optional`) list of lists of int | ||
| `bad_words_ids` contains tokens that are not allowed to be generated. In order to get the tokens of the words that should not appear in the generated text, use `tokenizer.encode(bad_word, add_prefix_space=True)`. | ||
|
|
||
| num_return_sequences: (`optional`) int | ||
| The number of independently computed returned sequences for each element in the batch. Default to 1. | ||
|
|
||
| attention_mask (`optional`) obj: `tf.Tensor` with `dtype=tf.int32` of same shape as `input_ids` | ||
| Mask to avoid performing attention on padding token indices. | ||
| Mask values selected in ``[0, 1]``: | ||
| ``1`` for tokens that are NOT MASKED, ``0`` for MASKED tokens. | ||
| Defaults to `None`. | ||
| input_ids (:obj:`tf.Tensor` of :obj:`dtype=tf.int32` and shape :obj:`(batch_size, sequence_length)`, `optional`): | ||
| The sequence used as a prompt for the generation. If :obj:`None` the method initializes | ||
| it as an empty :obj:`tf.Tensor` of shape :obj:`(1,)`. | ||
| max_length (:obj:`int`, `optional`, defaults to 20): | ||
| The maximum length of the sequence to be generated. | ||
| min_length (:obj:`int`, `optional`, defaults to 10): | ||
| The minimum length of the sequence to be generated. | ||
| do_sample (:obj:`bool`, `optional`, defaults to :obj:`False`): | ||
| Whether or not to use sampling ; use greedy decoding otherwise. | ||
| early_stopping (:obj:`bool`, `optional`, defaults to :obj:`False`): | ||
| Whether to stop the beam search when at least ``num_beams`` sentences are finished per batch or not. | ||
| num_beams (:obj:`int`, `optional`, defaults to 1): | ||
| Number of beams for beam search. 1 means no beam search. | ||
| temperature (:obj:`float`, `optional`, defaults tp 1.0): | ||
| The value used to module the next token probabilities. | ||
| top_k (:obj:`int`, `optional`, defaults to 50): | ||
| The number of highest probability vocabulary tokens to keep for top-k-filtering. | ||
| top_p (:obj:`float`, `optional`, defaults to 1.0): | ||
| If set to float < 1, only the most probable tokens with probabilities that add up to ``top_p`` or | ||
| higher are kept for generation. | ||
| repetition_penalty (:obj:`float`, `optional`, defaults to 1.0): | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we can link even to the original paper here, where it is explained: https://arxiv.org/pdf/1909.05858.pdf - page 5 first equation, since it might be hard to understand what the penalty does exactly. But maybe this goes to far here...most of these params are explained here: https://huggingface.co/blog/how-to-generate maybe we can add a link above "Parameters". |
||
| The parameter for repetition penalty. 1.0 means no penalty. | ||
| pad_token_id (:obj:`int`, `optional`): | ||
| The id of the `padding` token. | ||
| bos_token_id (:obj:`int`, `optional`): | ||
| The id of the `beginning-of-stream` token. | ||
| eos_token_id (:obj:`int`, `optional`): | ||
| The id of the `end-of-stream` token. | ||
|
||
| length_penalty (:obj:`float`, `optional`, defaults to 1.0): | ||
| Exponential penalty to the length. 1.0 means no penalty. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this param can be confusing as a penalty < 1.0 penalizes length whereas a penalty > 1.0 encourages longer sequences. Maybe we can add a sentence like:
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See #4915 |
||
| no_repeat_ngram_size (:obj:`int`, `optional`, defaults to 0): | ||
| If set to int > 0, all ngrams of that size can only occur once. | ||
| bad_words_ids(:obj:`List[int]`, `optional`): | ||
| List of token ids that are not allowed to be generated. In order to get the tokens of the words that | ||
| should not appear in the generated text, use :obj:`tokenizer.encode(bad_word, add_prefix_space=True)`. | ||
| num_return_sequences(:obj:`int`, `optional`, defaults to 1): | ||
| The number of independently computed returned sequences for each element in the batch. | ||
| attention_mask (:obj:`tf.Tensor` of :obj:`dtype=tf.int32` and shape :obj:`(batch_size, sequence_length)`, `optional`): | ||
| Mask to avoid performing attention on padding token indices. Mask values are in ``[0, 1]``, 1 for | ||
| tokens that are mot masked, and 0 for masked tokens. | ||
sgugger marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| If not provided, will default to a tensor the same shape as :obj:`input_ids` that masks the pad token. | ||
|
|
||
| `What are attention masks? <../glossary.html#attention-mask>`__ | ||
|
|
||
| decoder_start_token_id=None: (`optional`) int | ||
| If an encoder-decoder model starts decoding with a different token than BOS. | ||
| Defaults to `None` and is changed to `BOS` later. | ||
|
|
||
| use_cache: (`optional`) bool | ||
| If `use_cache` is True, past key values are used to speed up decoding if applicable to model. Defaults to `True`. | ||
| decoder_start_token_id (:obj:`int`, `optional`): | ||
| If an encoder-decoder model starts decoding with a different token than `bos`, the id of that token. | ||
| use_cache: (:obj:`bool`, `optional`, defaults to :obj:`True`): | ||
| Whether or not the model should use the past last key/values attentions (if applicable to the model) to | ||
| speed up decoding. | ||
| model_specific_kwargs: | ||
| Additional model specific kwargs will be forwarded to the :obj:`forward` function of the model. | ||
|
|
||
| Return: | ||
|
|
||
| output: `tf.Tensor` of `dtype=tf.int32` shape `(batch_size * num_return_sequences, sequence_length)` | ||
| sequence_length is either equal to max_length or shorter if all batches finished early due to the `eos_token_id` | ||
| :obj:`tf.Tensor` of :obj:`dtype=tf.int32` and shape :obj:`(batch_size * num_return_sequences, sequence_length)`: | ||
| The generated sequences. The second dimension (sequence_length) is either equal to :obj:`max_length` or | ||
| shorter if all batches finished early due to the :obj:`eos_token_id`. | ||
|
|
||
| Examples:: | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could mayb add here that this applies to all
AUTO_MODEL_FOR_CAUSAL_LMandAUTO_MODEL_FOR _SEQ2SEQ-> not sure though whether this is very useful for the userThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are not documented so it's not really useful inside the documentation.