Skip to content

Allow converting HF Falcon models with only one shard in memory at a time#1

Merged
jploski merged 1 commit intojploski:falcon40bfrom
KerfuffleV2:feat-improve-falcon-convert-hf
Jun 14, 2023
Merged

Allow converting HF Falcon models with only one shard in memory at a time#1
jploski merged 1 commit intojploski:falcon40bfrom
KerfuffleV2:feat-improve-falcon-convert-hf

Conversation

@KerfuffleV2
Copy link
Copy Markdown

Some changes to the conversion script to allow loading just one part of the multi-part model into memory at a time. From what I have heard, Transformers shards don't split tensors so this should be safe.

Without this change, I wasn't able to convert the real 40B Falcon model in 64GB RAM. I was able to verify that the mini-Shakespeare model gets converted correctly and I get reasonable output from the quantized version.

@cmp-nct
Copy link
Copy Markdown

cmp-nct commented Jun 14, 2023

Just as a sidenote, I could convert the 40B model on 64GB RAM, you just need plenty of fast swap and a bit patience.
It heavily randomly swaps for a while which stabilizes (on windows at least) and then it finishes.

In any case, great work to split it up :)

@KerfuffleV2
Copy link
Copy Markdown
Author

you just need plenty of fast swap and a bit patience.

It's a shame I possess neither of those things!

But yeah, assuming you're willing to throw enough time and swap at the problem there's no memory issue that's insurmountable. :)

@KerfuffleV2 KerfuffleV2 force-pushed the feat-improve-falcon-convert-hf branch from bc4dadb to ac64e94 Compare June 14, 2023 21:22
@KerfuffleV2 KerfuffleV2 changed the title Allow converting Falcon models one part at a time Allow converting HF Falcon models with only one shard in memory at a time Jun 14, 2023
@jploski jploski merged commit cc8ac10 into jploski:falcon40b Jun 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants