Dissertation Talk: An Extensible Architecture for Distributed Heterogeneous Processing – Frank Sifei Luan

Title: An Extensible Architecture for Distributed Heterogeneous Processing
Speaker: Frank Sifei Luan
Advisors: Ion Stoica

Date: Friday, December 6th, 2024
Time: 9:00 AM – 10:00 AM

Location: Soda Hall, 465H

Abstract: 
The exponential growth in AI compute demands has significantly outpaced the advancement in single-node processing power. This widening gap has made distributed heterogeneous processing essential for modern AI applications.

In this talk, I will present an extensible system architecture for distributed heterogeneous processing. First, I will discuss the streaming batch model, designed for efficient heterogeneous execution and dynamic adaptability to varying workloads under heavy memory pressure. This model is implemented in the Ray Data library, used for Stable Diffusion pre-training. Second, I will introduce Exoshuffle, a distributed shuffle library that enables flexible shuffle control without sacrificing performance, demonstrating that complex data operations can be implemented efficiently as application libraries rather than requiring purpose-built systems. Exoshuffle is used to set a new world record for the most cost-effective sorting of data on a public cloud.