- 💼 LLM Inference Optimization Engineer at Baidu, focusing on distributed serving and quantization.
- 🛠 Stack: PyTorch, CUDA/CUTLASS, nsys/ncu
- 🧑💻 Open-source contributor: SGLang / vLLM — focusing on PD disaggregation & Quantization.
- 📖 Selected posts: DeepSeek FP8 Block-wise Quantization Explained
- 📷 I am an amateur photographer📷. My work can be found at: https://photo.leoneo.top
- 🤝 Always happy to collaborate on LLM infra.
- 📫 Contact: hongbosherlock@gmail.com | Homepage
🎯
Focusing
Pinned Loading
-
sgl-project/sglang
sgl-project/sglang PublicSGLang is a high-performance serving framework for large language models and multimodal models.
-
vllm-project/vllm
vllm-project/vllm PublicA high-throughput and memory-efficient inference and serving engine for LLMs
-
Infrasys-AI/AISystem
Infrasys-AI/AISystem PublicAISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.



