Skip to content

[Feature]: Request for SmartSpec Method Support #5886

@bong-furiosa

Description

@bong-furiosa

🚀 The feature, motivation and pitch

Recently, we read a paper where the vLLM team proposed a method called SmartSpec.
We believe that the research, which dynamically adjusts the speculation length in a commercialized LLM serving system, is superior in terms of practicality compared to existing dynamic speculative length studies.

This idea could be applied to the current vLLM speculative decoding with Batch Expansion enabled, and it might also be applicable to future versions of vLLM with Batch Expansion disabled.
(I am curious whether the SmartSpec research was conducted on vLLM with Batch Expansion enabled. 🤔)

I wonder if the SmartSpec method will be implemented into the main repository in the near future.

Alternatives

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions