Dissertation Talk: Building Open Source Inference Serving Systems – Simon Mo

Title: Building Open Source Inference Serving Systems

Speaker: Simon Mo

Advisor: Joseph Gonzalez, Ion Stoica

Date: Wednesday, May 6, 2026

Time: 11:00 AM–12:00 PM Pacific Time

This is a hybrid event held in person and virtually over Google Meet.

In Person Location: Soda 510

Abstract:

Inference serving has become its own area of systems research. This talk reflects on seven years of open-source work across three layers of the stack: SLO-aware pipeline serving (Clipper, InferLine, Ray Serve), virtual memory for KV caches (vLLM), and GPU kernel multiplexing below the model boundary. Alongside the technical work, it is a reflection on how open-source adoption and systems research intertwine — how artifacts deployed at scale reshape the research questions that follow.