Hongbo Xu Hongbosherlock

💼 LLM Inference Optimization Engineer at Baidu, focusing on distributed serving and quantization.
🛠 Stack: PyTorch, CUDA/CUTLASS, nsys/ncu
🧑‍💻 Open-source contributor: SGLang / vLLM — focusing on PD disaggregation & Quantization.
📖 Selected posts: DeepSeek FP8 Block-wise Quantization Explained
📷 I am an amateur photographer📷. My work can be found at: https://photo.leoneo.top
🤝 Always happy to collaborate on LLM infra.
📫 Contact: hongbosherlock@gmail.com | Homepage

Provide feedback