KevinCheung2259

Follow

Yue Zhang KevinCheung2259

Follow

AI infra / Efficient AI

19 followers · 8 following

Sun Yat-sen University
China
https://kevincheung2259.github.io/

Achievements

Achievements

Highlights

Pro

Pinned Loading

vllm vllm Public

Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 1
LMCache LMCache Public

Forked from LMCache/LMCache

Supercharge Your LLM with the Fastest KV Cache Layer

Python
production-stack production-stack Public

Forked from FlowGPT/production-stack

vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

Python
mms mms Public

Forked from alpa-projects/mms

AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23) AdaptServe: Auto-Scalable DL Serving with Dynamic Model Parallelism (HPSC 25)

Jupyter Notebook
kvcache-hit-rate-calculator kvcache-hit-rate-calculator Public

Computing the theoretical KVCache hit rate in the LLM serving system.

Python 1
llm-inference-benchmarking llm-inference-benchmarking Public

Forked from FlowGPT/llm-inference-benchmarking

Python 1 1