Dissertation Talk: Lianmin Zheng – Efficient and Scalable Systems for Large Language Models

Title: Efficient and Scalable Systems for Large Language Models
Speaker: Lianmin Zheng
Advisors: Ion Stoica and Joseph E. Gonzalez

Date: Friday, Dec 15, 2023
Time: 10:00 AM – 11:00 AM PDT
Location (In-person): Soda 465H. This is a hybrid event held in person and virtually over Zoom.

Abstract: In this talk, I will present our work on full-stack and scalable system support for large language models. First, I will introduce Alpa, an automatic model parallelism system for scalable distributed training. Subsequently, I will cover optimizations for scalable serving, including cache optimizations, generating high-performance tensor programs, and handling request bursts. Lastly, I will discuss the scalable evaluation of open-ended chat assistants using Chatbot Arena, a crowd-sourced benchmark platform, and LLM-as-a-Judge, a method for approximating human preference.