Skip to content

[Proposal] Adding new encoder_no_repeat_ngram_size to generate.#9984

Merged
Narsil merged 11 commits into
huggingface:masterfrom
Narsil:encoder_no_repeat_ngram_size
Feb 4, 2021
Merged

[Proposal] Adding new encoder_no_repeat_ngram_size to generate.#9984
Narsil merged 11 commits into
huggingface:masterfrom
Narsil:encoder_no_repeat_ngram_size

Conversation

@Narsil
Copy link
Copy Markdown
Contributor

@Narsil Narsil commented Feb 3, 2021

What does this PR do?

Blenderbot results seemed off compared to original ParlAI script:
https://parl.ai/projects/recipes/. Notably the model seems
to repeat a lot what was said during the conversation.

The actual problem was that no_repeat_ngram_size actually applies
to the encoder_input_ids but HF's no_repeat_ngram_size applies
to the previously generated ids (within the decoder). The history
conversation of blenderbot is within the encoder part so that
explains why HF's implementation had the repetitions.

This fix was focused on blenderbot not small and added tests
for those because they are quite different in configuration.

This change includes:

  • Adding a new EncoderNoRepeatLogitProcessor.
  • Adding 1 new arg to generate (encoder_no_repeat_ngram_size)
  • Adding 1 new config parameter encoder_no_repeat_ngram_size.
  • Adding 2 tests, one for the pipeline (high level, inputs exhibited
    repeat behavior, one low level for EncoderNoRepeatLogitProcessor)
  • Factored NoRepeatLogitProcessor so that logic could be reused.

Further work:

  • Blenderbot conversational pipeline still does not behave correctly
    as they way input is prepared within the pipeline is still incorrect
    (follow up PR)
  • Blenderbot allows the bot to have personas, which is done by
    prepending "your personna: XXXX" to the input, this could be explored
    too in a follow up PR.

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@patrickvonplaten
@LysandreJik

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors which may be interested in your PR.

Blenderbot results seemed off compared to original ParlAI script:
`https://parl.ai/projects/recipes/`. Notably the model seems
to repeat a lot what was said during the conversation.

The actual problem was that `no_repeat_ngram_size` actually applies
to the `encoder_input_ids` but HF's `no_repeat_ngram_size` applies
to the previously generated ids (within the decoder). The history
conversation of blenderbot is within the `encoder` part so that
explains why HF's implementation had the repetitions.

This fix was focused on blenderbot *not* small and added tests
for those because they are quite different in configuration.

This change includes:

- Adding a new EncoderNoRepeatLogitProcessor.
- Adding 1 new arg to `generate` (`encoder_no_repeat_ngram_size`)
- Adding 1 new config parameter `encoder_no_repeat_ngram_size`.
- Adding 2 tests, one for the pipeline (high level, inputs exhibited
repeat behavior, one low level for EncoderNoRepeatLogitProcessor)
- Factored NoRepeatLogitProcessor so that logic could be reused.

Further work:

- Blenderbot conversational pipeline still does not behave correctly
 as they way input is prepared within the pipeline is still incorrect
(follow up PR)
- Blenderbot allows the bot to have personas, which is done by
prepending "your personna: XXXX" to the input, this could be explored
too in a follow up PR.

@patrickvonplaten
@LysandreJik
Comment thread src/transformers/generation_logits_process.py Outdated
Comment thread src/transformers/generation_utils.py Outdated
Comment thread src/transformers/models/blenderbot/configuration_blenderbot.py Outdated
Comment thread src/transformers/generation_utils.py Outdated
Comment thread src/transformers/generation_utils.py Outdated
Comment thread src/transformers/generation_utils.py Outdated
Comment thread src/transformers/configuration_utils.py Outdated
Copy link
Copy Markdown
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! Thanks so much for diving into this and solving the blenderbot bug!
Like the design very much as discussed offline! Super nice to be able to solve the problem in such a clean way :-)

Left a couple of nits. But overall looks good to me!

Narsil and others added 7 commits February 4, 2021 09:31
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Copy link
Copy Markdown
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, this is indeed a clean fix. Do we know why our BlenderBot still behaves incorrectly compared to ParlAI?

Regarding personas, this could probably be handled directly in the ConversationalPipeline?

@LysandreJik
Copy link
Copy Markdown
Member

Before merging, please take a look at the failing tests.

@Narsil
Copy link
Copy Markdown
Contributor Author

Narsil commented Feb 4, 2021

LGTM, this is indeed a clean fix. Do we know why our BlenderBot still behaves incorrectly compared to ParlAI?

I need to look deeper, by default they use FP16 and final scores are still different in order of magnitude (I'm expecting they correspond to different things), but when looking at the full beam searches they still look similar.

I've done step by step debugging and scores withing the beam search are super close for a lot of steps.
This fix is the major drift that would occur pretty fast.

Regarding personas, this could probably be handled directly in the ConversationalPipeline?

Yes exactly my opinion.

def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
# B x num_beams
num_hypos = scores.shape[0]
num_beams = num_hypos // self.batch_size
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice - yeah that's safe!

@Narsil
Copy link
Copy Markdown
Contributor Author

Narsil commented Feb 4, 2021

@sgugger Can you take a look please?

@Narsil
Copy link
Copy Markdown
Contributor Author

Narsil commented Feb 4, 2021

@LysandreJik figured it out. Its' because of some logic within ConversationPipeline which is invalid for blenderbot.

Coming up with a follow-up PR.

Copy link
Copy Markdown
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for adding this!

Comment thread src/transformers/configuration_utils.py Outdated
Comment thread src/transformers/generation_utils.py Outdated
Narsil and others added 2 commits February 4, 2021 14:30
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
@Narsil Narsil merged commit aeb18b9 into huggingface:master Feb 4, 2021
@Narsil Narsil deleted the encoder_no_repeat_ngram_size branch February 4, 2021 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants