Skip to content

[Bug] Deepseek3.2 PD 128k indexer cache transfer error #15532

@whybeyoung

Description

@whybeyoung

Checklist

  • I searched related issues but found no solution.
  • The bug persists in the latest version.
  • Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
  • If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
  • Please use English. Otherwise, it will be closed.

Describe the bug

Prefill:
Image

Reproduction

prefill

python -m sglang.launch_server --model-path /work/models/ --port 30000 --trust-remote --host 0.0.0.0 --served-model-name xdeepseekv3testbo --disable-radix-cache --max-running-requests 512 --collect-tokens-histogram --chunked-prefill-size 4096 --tp 8 --pp-size 2 --context-length 131072 --mem-fraction-static 0.9 --page-size 64 --disaggregation-ib-device mlx5_bond_0,mlx5_bond_1,mlx5_bond_2,mlx5_bond_3 --enable-metrics --tokenizer-worker-num 8 --disaggregation-mode prefill --nnodes 2 --dist-init-addr deepseek-v32-deploy-prefill-0.deepseek-v32-deploy-prefill.aiservice:20102 --node-rank 0

decode

python -m sglang.launch_server --model-path /work/models/ --port 30000 --trust-remote --host 0.0.0.0 --served-model-name xdeepseekv3testbo --prefill-round-robin-balance --moe-a2a-backend deepep --crash-dump-folder /log --eplb-rebalance-layers-per-chunk 29 --init-expert-location /home/aiges/tuned/attachment_ep_statistics/decode_in500out1500.json --enable-dp-lm-head --page-size 64 --chunked-prefill-size 131072 --enable-dp-attention --tp-size 16 --dp-size 16 --deepep-mode low_latency --context-length 131073 --mem-fraction-static 0.75 --cuda-graph-max-bs 64 --max-running-requests 2048 --ep-num-redundant-experts 32 --moe-dense-tp-size 1 --disaggregation-ib-device mlx5_bond_0,mlx5_bond_1,mlx5_bond_2,mlx5_bond_3 --enable-metrics --tokenizer-worker-num 8 --disaggregation-mode decode --nnodes 2 --dist-init-addr deepseek-v32-deploy-decode-0.deepseek-v32-deploy-decode.aiservice:20102 --node-rank 0

minilb:

nohup python -m sglang_router.launch_router --mini-lb --prefill http://26.5.27.243:30000 --decode http://26.5.27.239:30000 --port 8000 --host 0.0.0.0 --pd-disaggregation &

benchmark:

python3 -m sglang.bench_serving --host 127.0.0.1 --port 8000 --dataset-path ./ShareGPT_V3_unfiltered_cleaned_split.json --num-prompt 1 --random-input 131000 --random-output 1 --max-concurrency 1 --warmup-requests 2 --backend sglang --dataset-name random --random-range-ratio 1 --tokenizer /work/models --model /work/models

and in prefill node ,you will find that's error.

Environment

H20 96G*8 *4

CC @ShangmingCai @hnyls2002 @YAMY1234 @Fridge003 @xu-yfei

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions