-
Notifications
You must be signed in to change notification settings - Fork 4k
Open
0 / 10 of 1 issue completedLabels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is neededhigh priority
Description
Motivation
We have already implemented the initial support for eagle speculative decoding with the overlap scheduler, and here is the roadmap for more feature optimizations and support. The initial skeleton code is this PR #11398
The design illustration is here
Note
The arg --enable-beta-spec has been deprecated, please use export SGLANG_ENABLE_SPEC_V2=1 to enable this feature.
page size & topk support
- Support page size > 1 @cicirori @hnyls2002 [overlap-spec] support page size > 1 #11772
- Support topk > 1 @vincentzed [eagle overlap spec] wip impl top k > 1 in overlap eagle worker(v2) #11839
- Support topk > 1 + page size > 1 @vincentzed
memory allocation
- over-allocation optimization @hnyls2002
- over-allocation with page size > 1 + topk > 1
Attention backend support
- Remove or make
verify_done.synchronize()an option @hnyls2002 - Different attention backend support @Fridge003 @Qiaolin-Yu
sampling
- [Feature] Fully Overlap with spec v2 + Constrained Decoding #13019
- penalty support
- logprob support
speculative methods
- new speculative model worker interface (Abstraction for spec worker and code cleanup #11643)
- standalone speculative support @Qiaolin-Yu
- ngram speculative support @a4zhangfei
- Top
SpecTpWorkerfor all speculative decoding backends @hnyls2002 - Make
SpecTpWorkercompatible with allTpModelWorkerfeatures. - specialize for high throughput case (num_step=1, topk=1, num_verify_forward_pass_tokens=2) @yukavio
DP attention support
- Support idle batch @iforgetmyname
- cover testcases with dp-attention + overlap + spec @iforgetmyname
EP support
PD disaggregation
- Event loop adjust in Prefill / Decode worker @shaharmor98
- Cover testcases with PD-Disagg + overlap + spec @ShangmingCai
LoRA Support
Aggressive Optimizations
- Enable a separate `plan_stream
Related resources
No response
Swipe4057, neelabhsinha, zhyncs, cicirori, Qiaolin-Yu and 15 more
Sub-issues
Metadata
Metadata
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is neededhigh priority