Skip to content

Conversation

@patil-suraj
Copy link
Contributor

This PR adds MBartForConditionalGeneration. Regarding #6416

@sshleifer

@patil-suraj patil-suraj changed the title Add mbart model [WIP][MBartForConditionalGeneration] Add mbart model Aug 12, 2020
@patil-suraj
Copy link
Contributor Author

The failure is coming from test_modeling_tf_electra.py

@patil-suraj patil-suraj changed the title [WIP][MBartForConditionalGeneration] Add mbart model [WIP] MBartForConditionalGeneration Aug 12, 2020
Copy link
Contributor

@sshleifer sshleifer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks suraj!

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Just left a few doc nits.


MBART is a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages using the BART objective. mBART is one of the first methods for pre-training a complete sequence-to-sequence model by denoising full texts in multiple languages, while previous approaches have focused only on the encoder, decoder, or reconstructing parts of the text.

The Authors' code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/mbart>`_
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Authors' code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/mbart>`_
The Authors' code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/mbart>`__

(Using only one _ will cause sphinx to associate every here with that link or complain, so it's best to always use two _)

MBART_PRETRAINED_MODEL_ARCHIVE_LIST = [
"facebook/mbart-large-cc25",
"facebook/mbart-large-en-ro",
# See all BART models at https://huggingface.co/models?filter=mbart
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# See all BART models at https://huggingface.co/models?filter=mbart
# See all multilingual BART models at https://huggingface.co/models?filter=mbart


MBART_START_DOCSTRING = r"""
This model is a PyTorch `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`_ sub-class.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This model is a PyTorch `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`_ sub-class.
This model is a PyTorch `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`__ sub-class.

@codecov
Copy link

codecov bot commented Aug 12, 2020

Codecov Report

Merging #6441 into master will increase coverage by 0.29%.
The diff coverage is 88.22%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6441      +/-   ##
==========================================
+ Coverage   79.77%   80.06%   +0.29%     
==========================================
  Files         148      156       +8     
  Lines       27214    28024     +810     
==========================================
+ Hits        21710    22438     +728     
- Misses       5504     5586      +82     
Impacted Files Coverage Δ
src/transformers/configuration_reformer.py 100.00% <ø> (ø)
src/transformers/data/test_generation_utils.py 0.00% <0.00%> (ø)
src/transformers/modeling_marian.py 90.00% <ø> (-0.91%) ⬇️
src/transformers/modeling_utils.py 87.35% <ø> (ø)
src/transformers/tokenization_bart.py 100.00% <ø> (+4.22%) ⬆️
src/transformers/trainer_tf.py 12.25% <0.00%> (-0.13%) ⬇️
src/transformers/testing_utils.py 51.92% <28.57%> (-20.81%) ⬇️
src/transformers/trainer.py 37.84% <37.50%> (-0.18%) ⬇️
src/transformers/modeling_tf_bert.py 98.38% <50.00%> (+1.79%) ⬆️
src/transformers/data/data_collator.py 90.90% <52.94%> (-5.68%) ⬇️
... and 69 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e92efcf...49f74a5. Read the comment docs.

@patil-suraj
Copy link
Contributor Author

Thanks @sgugger , I've applied the suggestions.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

"test_modeling_tf_xlm_roberta.py",
"test_modeling_xlm_roberta.py",
"test_modeling_pegasus.py",
"test_modeling_mbart.py",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking forward to adding this to the common tests!

@patil-suraj patil-suraj changed the title [WIP] MBartForConditionalGeneration MBartForConditionalGeneration Aug 13, 2020
@LysandreJik LysandreJik merged commit 680f133 into huggingface:master Aug 14, 2020
sgugger added a commit that referenced this pull request Aug 14, 2020
* Generation doc

* MBartForConditionalGeneration (#6441)

* add MBartForConditionalGeneration

* style

* rebase and fixes

* add mbart test in TEST_FILES_WITH_NO_COMMON_TESTS

* fix docs

* don't ignore mbart

* doc

* fix mbart fairseq link

* put mbart before bart

* apply doc suggestions

* Use hash to clean the test dirs (#6475)

* Use hash to clean the test dirs

* Use hash to clean the test dirs

* Use hash to clean the test dirs

* fix

* [EncoderDecoder] Add Cross Attention for GPT2 (#6415)

* add cross attention layers for gpt2

* make gpt2 cross attention work

* finish bert2gpt2

* add explicit comments

* remove attention mask since not yet supported

* revert attn mask in pipeline

* Update src/transformers/modeling_gpt2.py

Co-authored-by: Sylvain Gugger <[email protected]>

* Update src/transformers/modeling_encoder_decoder.py

Co-authored-by: Sylvain Gugger <[email protected]>

Co-authored-by: Sylvain Gugger <[email protected]>

* Sort unique_no_split_tokens to make it deterministic (#6461)

* change unique_no_split_tokens's type to set

* use sorted list instead of set

* style

* Import accuracy_score (#6480)

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <[email protected]>

* Address comments

* Styling

* Generation doc

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <[email protected]>

* Address comments

* Styling

Co-authored-by: Suraj Patil <[email protected]>
Co-authored-by: Kevin Canwen Xu <[email protected]>
Co-authored-by: Patrick von Platen <[email protected]>
Co-authored-by: Quentin Lhoest <[email protected]>
Co-authored-by: gijswijnholds <[email protected]>
Co-authored-by: Lysandre Debut <[email protected]>
@sshleifer sshleifer mentioned this pull request Aug 17, 2020
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants