Title: Building Open Source Inference Serving Systems
Speaker: Simon Mo
Advisor: Joseph Gonzalez, Ion Stoica
Date: Wednesday, May 6, 2026
Time: 11:00 AM–12:00 PM Pacific Time
This is a hybrid event held in person and virtually over Google Meet.
In Person Location: Soda 510
Abstract:
Inference serving has become its own area of systems research. This talk reflects on seven years of open-source work across three layers of the stack: SLO-aware pipeline serving (Clipper, InferLine, Ray Serve), virtual memory for KV caches (vLLM), and GPU kernel multiplexing below the model boundary. Alongside the technical work, it is a reflection on how open-source adoption and systems research intertwine — how artifacts deployed at scale reshape the research questions that follow.