-
Notifications
You must be signed in to change notification settings - Fork 31.3k
Fix convert for newer megatron-lm bert model #14082
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I believe @stas00 has worked on a conversion between Megatron and HF Transformers - Stas, can you confirm you have used it and if so, can you take a look at this PR? |
|
Oh, I completely forgot I had a related PR waiting in Draft mode ;) Switched it from Draft #13928 So, yes, did a lot of work on GPT2/Megatron recently. No, haven't done any work on Bert/Megatron, so I'm not aware of the nuances to qualify as a reviewer for this one. @yoquankara, from a quick look I'd only suggested to add a proper way of saving config, which should be: instead of the manual saving, which misses some important bits. and may be the tokenizer file addition code as well? For reference of the 2 changes see the very end of my PR https://github.com/huggingface/transformers/pull/13928/files Perhaps setting and you need to run |
|
@stas00 Thank you for your review and the pointer to a proper way of saving config! Regarding tokenizer, Nvidia's Megatron Bert models are using their owns models while their GPT2 tokenizer is using the default one for I've also run I will investigate more about |
|
Code style has been fixed. |
|
@LysandreJik @stas00 |
|
I suppose we just want to make sure that updated script works with:
so to test for both:
At least that's the validation process I did before proposing the GPT2 PR. |
|
Thank you, totally makes sense. I'll find time to also test the old official model and post the validation result when finished. |
|
@LysandreJik - should we just merge this? As even if there are issue here it's an improvement over the original version. |
|
@jdemouth, could you comment on this if you have a bit of bandwidth? Otherwise, let's go ahead and merge this next week. |
|
ping |
|
@yoquankara were you able to run tests to validate that it does not break the code for the older models? |
|
I have a feeling @yoquankara has either given up on our process or is busy with work/life. @jdemouth, Should we merge it and deal with any potential problems if they arise? |
|
@stas00 - I agree. I think we should merge and we'll fix things if something breaks. |
stas00
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for bearing with us, @yoquankara
|
Thank you for taking the time to share your needs, @yoquankara That is totally understandable - you had no obligation to do anything - we just wanted to know whether you wanted to continue being involved. But otherwise, we hope to see you in another PR in the future. |
|
io.UnsupportedOperation: seek. You can only torch.load from a file that is seekable. Please pre-load the data into a buffer like io.BytesIO and try to load from it instead. Getting this error while converting the BERT checkpoint |
|
You probably don't realize it but your comment is not actionable, @kaushalshetty, since we have no idea what you did. Please file a proper bug report so that we could reproduce the problem and then it'd be possible to act on it and help you with your need. Make sure to include the full traceback in your report. Thank you! |
|
I am so sorry. I understand that. My bad !
Who can helpInformationModel I am using Megatron-BERT(megatron-bert-uncased-345m): The problem arises when using:
The tasks I am working on is:
To reproduceSteps to reproduce the behavior:
Expected behaviorExpect megatron checkpoint gets converted to huggingface format. |
|
That's excellent, but in the future please open a new Issue. Once a PR is merged or an Issue is closed it's very difficult to track those. I tested your use case and it worked for me with python 3.8: and it indeed fails with python-3.6: So I'm passing the actual checkpoint file instead of the zip file in case it wasn't clear from the command line. or you can upgrade to a higher python version, 3.6 is very old. @LysandreJik, @sgugger - what do we want to do here as a long term fix? I propose to catch that it's python-3.6 and refuse to deal with the zipped checkpoint, asserting, asking to unzip it first? the same will need to be done for There is no problem with with py-3.7 and higher. |
|
We might also start saying Transformers requires Python 3.7 or above since Python 3.6 is at the end of its life cycle. |
|
oh, cool! thanks, @sgugger so should we just change Line 135 in 8481ece
to 3.7.0? then this Issue will get auto-resolved. @LysandreJik - is now a good time? |
What does this PR do?
Because both GPT2 and BERT share the same underlying issue of different tensor ordering, similar modifications in convert_megatron_gpt2_checkpoint.py are needed to convert newer Megatron-LM BERT models.
I actually tested that this fix was necessary to fine tune Megatron-LM BERT correctly with transformers API.
Who can review?
@LysandreJik @jdemouth