MLServe

Scaling Machine Learning Workloads on GPU Clusters

Configured an NVIDIA GPU cluster with CUDA, cuDNN, and Jaxlib.
Used Alpa to leverage model parallelism and statistical multiplexing to scale inference workloads across GPUs using Ray framework.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
ML model serving with multiplexing.pdf		ML model serving with multiplexing.pdf
README.md		README.md

Provide feedback