[Bug] Deepseek3.2 PD 128k indexer cache transfer error

### Checklist

- [x] I searched related issues but found no solution.
- [x] The bug persists in the latest version.
- [ ] Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
- [ ] If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
- [x] Please use English. Otherwise, it will be closed.

### Describe the bug

Prefill: 
<img width="1570" height="737" alt="Image" src="https://github.com/user-attachments/assets/24566ad8-8bb2-4194-be1a-83c10b3ee117" />

### Reproduction

#### prefill
`python -m sglang.launch_server --model-path /work/models/ --port 30000 --trust-remote --host 0.0.0.0 --served-model-name xdeepseekv3testbo --disable-radix-cache --max-running-requests 512 --collect-tokens-histogram --chunked-prefill-size 4096 --tp 8 --pp-size 2 --context-length 131072 --mem-fraction-static 0.9 --page-size 64 --disaggregation-ib-device mlx5_bond_0,mlx5_bond_1,mlx5_bond_2,mlx5_bond_3 --enable-metrics --tokenizer-worker-num 8 --disaggregation-mode prefill --nnodes 2 --dist-init-addr deepseek-v32-deploy-prefill-0.deepseek-v32-deploy-prefill.aiservice:20102 --node-rank 0`

#### decode 
`python -m sglang.launch_server --model-path /work/models/ --port 30000 --trust-remote --host 0.0.0.0 --served-model-name xdeepseekv3testbo --prefill-round-robin-balance --moe-a2a-backend deepep --crash-dump-folder /log --eplb-rebalance-layers-per-chunk 29 --init-expert-location /home/aiges/tuned/attachment_ep_statistics/decode_in500out1500.json --enable-dp-lm-head --page-size 64 --chunked-prefill-size 131072 --enable-dp-attention --tp-size 16 --dp-size 16 --deepep-mode low_latency --context-length 131073 --mem-fraction-static 0.75 --cuda-graph-max-bs 64 --max-running-requests 2048 --ep-num-redundant-experts 32 --moe-dense-tp-size 1 --disaggregation-ib-device mlx5_bond_0,mlx5_bond_1,mlx5_bond_2,mlx5_bond_3 --enable-metrics --tokenizer-worker-num 8 --disaggregation-mode decode --nnodes 2 --dist-init-addr deepseek-v32-deploy-decode-0.deepseek-v32-deploy-decode.aiservice:20102 --node-rank 0`

#### minilb:
`  nohup  python -m sglang_router.launch_router --mini-lb --prefill http://26.5.27.243:30000 --decode http://26.5.27.239:30000 --port 8000 --host 0.0.0.0  --pd-disaggregation &`


 #### benchmark:

` python3 -m sglang.bench_serving --host 127.0.0.1 --port 8000 --dataset-path ./ShareGPT_V3_unfiltered_cleaned_split.json --num-prompt 1 --random-input 131000 --random-output 1  --max-concurrency 1 --warmup-requests 2 --backend sglang --dataset-name random --random-range-ratio 1 --tokenizer /work/models --model /work/models`

and in prefill node ,you will find that's error.

### Environment

H20 96G*8 *4

CC @ShangmingCai @hnyls2002 @YAMY1234 @Fridge003 @xu-yfei 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Deepseek3.2 PD 128k indexer cache transfer error #15532

Checklist

Describe the bug

Reproduction

prefill

decode

minilb:

benchmark:

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Deepseek3.2 PD 128k indexer cache transfer error #15532

Description

Checklist

Describe the bug

Reproduction

prefill

decode

minilb:

benchmark:

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions