Skip to content

Conversation

@ekagra-ranjan
Copy link
Owner

@ekagra-ranjan ekagra-ranjan commented Apr 9, 2025

I was looking into SD metrics in V1 and find that spec_decoding_stats is reinit every time we do an engine step and we use an observe function which from the name is supposed to aggregate over miltiple observe calls. However, since it's reinit everytime, we will always have 1 observe call and there is no aggregation.

To enable AL computation for checking correctness, this PR aggregates the metrics across steps in the EngineCoreOutputs.scheduler_stats

Comment on lines +109 to +112
num_draft_tokens = scheduler_stats.spec_decoding_stats.num_draft_tokens
num_accepted_tokens = scheduler_stats.spec_decoding_stats.num_accepted_tokens
num_spec_proposal = num_draft_tokens / args.num_spec_tokens
mean_accepted_tokens = 1 + num_accepted_tokens / num_spec_proposal
Copy link
Owner Author

@ekagra-ranjan ekagra-ranjan Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_spec_proposal is the num of times the SD call was made

mean_accepted_tokens = (sum of generated tokens over num_spec_proposal) / num_spec_proposal
= (num_spec_proposal + sum of accepted tokens over num_spec_proposal) / num_spec_proposal
= 1 + num_accepted_tokens / num_spec_proposal

Comment on lines +573 to +574
# spec_decoding_stats: Optional[SpecDecodingStats] = None
spec_decoding_stats = self.spec_decoding_stats
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache the spec_decoding_stats so that it keeps a running metric instead of reinit it every engine step

model_dir = "meta-llama/Meta-Llama-3-8B-Instruct"
eagle_dir = "abhigoyal/EAGLE-LLaMA3-Instruct-8B-vllm"
# eagle_dir = "yuhuili/EAGLE-LLaMA3-Instruct-8B"
eagle_dir = "lmsys/sglang-EAGLE-LLaMA3-Instruct-8B"
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markmc
Copy link

markmc commented May 9, 2025

hi @ekagra-ranjan

I was looking into SD metrics in V1 and find that spec_decoding_stats is reinit every time we do an engine step and we use an observe function which from the name is supposed to aggregate over miltiple observe calls. However, since it's reinit everytime, we will always have 1 observe call and there is no aggregation.

Hmm, we discussed this on Slack shortly after you submitted this PR

spec_decoding_stats = self.make_spec_decoding_stats(
                    spec_decoding_stats,

a new SpecDecodingStats should only be created once per update_from_output() call - we should aggregate across all requests in a single step

   def make_spec_decoding_stats(
        self,
        spec_decoding_stats: Optional[SpecDecodingStats],
        ...
    ) -> Optional[SpecDecodingStats]:
        ...
        if spec_decoding_stats is None:
            spec_decoding_stats = SpecDecodingStats()
        ...
        return spec_decoding_stats

Your response, for reference:

Oh, the purpose of the SpecDecodingStats is just to aggregate across the req per step and is reinit per step. Then it is working fine.

@ekagra-ranjan
Copy link
Owner Author

Hi @markmc - yup, we are good. I am still using this hacky PR whenever I want to quickly find the AL for my evals since vllm-project#16367 is still not merged

This was referenced May 17, 2025
@github-actions
Copy link

github-actions bot commented Aug 8, 2025

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

@github-actions github-actions bot added the stale label Aug 8, 2025
@github-actions
Copy link

github-actions bot commented Sep 8, 2025

This pull request has been automatically closed due to inactivity. Please feel free to reopen if you intend to continue working on it. Thank you!

@github-actions github-actions bot closed this Sep 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants