-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Checklist
- I searched related issues but found no solution.
- The bug persists in the latest version.
- Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
- If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
- Please use English. Otherwise, it will be closed.
Describe the bug
Reproduction
prefill
python -m sglang.launch_server --model-path /work/models/ --port 30000 --trust-remote --host 0.0.0.0 --served-model-name xdeepseekv3testbo --disable-radix-cache --max-running-requests 512 --collect-tokens-histogram --chunked-prefill-size 4096 --tp 8 --pp-size 2 --context-length 131072 --mem-fraction-static 0.9 --page-size 64 --disaggregation-ib-device mlx5_bond_0,mlx5_bond_1,mlx5_bond_2,mlx5_bond_3 --enable-metrics --tokenizer-worker-num 8 --disaggregation-mode prefill --nnodes 2 --dist-init-addr deepseek-v32-deploy-prefill-0.deepseek-v32-deploy-prefill.aiservice:20102 --node-rank 0
decode
python -m sglang.launch_server --model-path /work/models/ --port 30000 --trust-remote --host 0.0.0.0 --served-model-name xdeepseekv3testbo --prefill-round-robin-balance --moe-a2a-backend deepep --crash-dump-folder /log --eplb-rebalance-layers-per-chunk 29 --init-expert-location /home/aiges/tuned/attachment_ep_statistics/decode_in500out1500.json --enable-dp-lm-head --page-size 64 --chunked-prefill-size 131072 --enable-dp-attention --tp-size 16 --dp-size 16 --deepep-mode low_latency --context-length 131073 --mem-fraction-static 0.75 --cuda-graph-max-bs 64 --max-running-requests 2048 --ep-num-redundant-experts 32 --moe-dense-tp-size 1 --disaggregation-ib-device mlx5_bond_0,mlx5_bond_1,mlx5_bond_2,mlx5_bond_3 --enable-metrics --tokenizer-worker-num 8 --disaggregation-mode decode --nnodes 2 --dist-init-addr deepseek-v32-deploy-decode-0.deepseek-v32-deploy-decode.aiservice:20102 --node-rank 0
minilb:
nohup python -m sglang_router.launch_router --mini-lb --prefill http://26.5.27.243:30000 --decode http://26.5.27.239:30000 --port 8000 --host 0.0.0.0 --pd-disaggregation &
benchmark:
python3 -m sglang.bench_serving --host 127.0.0.1 --port 8000 --dataset-path ./ShareGPT_V3_unfiltered_cleaned_split.json --num-prompt 1 --random-input 131000 --random-output 1 --max-concurrency 1 --warmup-requests 2 --backend sglang --dataset-name random --random-range-ratio 1 --tokenizer /work/models --model /work/models
and in prefill node ,you will find that's error.
Environment
H20 96G*8 *4
