Skip to content

Conversation

@mostafaelhoushi
Copy link

@mostafaelhoushi mostafaelhoushi commented Oct 28, 2024

I followed this blog and implemented early exit self-speculation here.

@mostafaelhoushi
Copy link
Author

I ran for a large number of models, comparing 2-model speculative decoding and self-speculative decoding and put the results here (note there is a separate sheet for each task):
https://docs.google.com/spreadsheets/d/1YASFEJl5WPmiXbtW-5-PA5nVqtXlZM9YifaAuv49hhI/edit?usp=sharing

Cc @gante

@gante
Copy link
Owner

gante commented Nov 4, 2024

Perfect, thank you for opening the PR 💛 I'll merge this one as soon as the transformers PR is merged (on main, early_exit doesn't exist as an arg to generate)


if args.max_gpu_memory is None: # fails if it doesn't fit in a GPU
max_memory = {0: "100GiB", "cpu": "0GiB"}
max_memory = {0: "100GiB", "cpu": "50GiB"}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n00b question: why do we need to modify max_memory?
For Llama2 70B ran on 8 A100 GPUs, I had to replace this line with:
max_memory = None
to get it to work

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha for no good reason: long ago, when I wrote this script, my setup crashed if a) the model didn't fully fit on the GPU OR b) I didn't set a limit slightly below the device capacity -- e.g. 20GiB on a RTX3090 (24GB)

can and should be replaced

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha ... OK. I submitted a commit to keep max memory to None if it is not specified by the user

@mostafaelhoushi
Copy link
Author

After merging huggingface/transformers#34240 I think this PR is ready to merge.
I have also updated the early_exit argument to assistant_early_exit.

@mostafaelhoushi
Copy link
Author

Just noticed there are conflicts. Working now on resolving them.

@mostafaelhoushi
Copy link
Author

I re-ran commands to verify that they are working and fixed some further errors that popped up.

@mostafaelhoushi
Copy link
Author

Hi @gante . Just a gentle ping if you would like to merge the PR

@scatyf3
Copy link

scatyf3 commented Nov 27, 2025

this pr is helpful, since there is no official huggingfaces' version of layerskip benchmark, pls merge it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants