[Feature]: Request for SmartSpec Method Support

### 🚀 The feature, motivation and pitch

Recently, we read a paper where the vLLM team proposed a method called **SmartSpec**.
We believe that the research, which dynamically adjusts the speculation length in a commercialized LLM serving system, is superior in terms of practicality compared to existing dynamic speculative length studies.
- [Optimizing Speculative Decoding for Serving Large Language Models Using Goodput](https://arxiv.org/pdf/2406.14066v1)

This idea could be applied to the current vLLM speculative decoding with Batch Expansion enabled, and it might also be applicable to future versions of vLLM with Batch Expansion disabled.
(I am curious whether the SmartSpec research was conducted on vLLM with Batch Expansion enabled. :thinking:)

**I wonder if the SmartSpec method will be implemented into the main repository in the near future.**

### Alternatives

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Request for SmartSpec Method Support #5886

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Request for SmartSpec Method Support #5886

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions