Speaker: Robert Nishihara, Richard Liaw, and Amjad Almahairi
Location: Soda 510
Date: November 22nd, 2024
Time: 12pm-1pm PDT
Title: AI Infrastructure at Anyscale
Abstract:
Anyscale was founded in 2019 to commercialize Ray and to simplify distributed machine learning for organizations across the world. In this talk, we’ll cover the story behind Anyscale and deep dive into Ray Data and Ray Compiled Graphs, two key frameworks we’ve developed to solve emerging AI infrastructure challenges faced by leading companies today. Ray Data is a distributed data processing engine for multimodal data and AI workloads. We’ll discuss how Ray Data is uniquely suited to handle data-intensive training and inference workloads through its support for heterogeneous hardware and streaming execution. We’ll also share our experience deploying Ray Data in production with companies running critical AI applications, including Runway, Instacart, and Canva. We’ll also cover Ray Compiled Graphs, a new set of primitives in Ray for heterogeneous training workloads. Compiled Graphs provides optimized communication across accelerators with complex control flows, enabling efficient training of sophisticated models like multimodal architectures and large language models. We’ll demonstrate how RCG achieves significant performance improvements over existing solutions, including up to 40% better throughput per dollar for multimodal training compared to traditional approaches. Throughout the talk, we’ll share our perspective on the future of distributed ML infrastructure and discuss the upcoming challenges in scaling AI systems, from batch LLM inference to automated resource selection in cloud environments.
Bio:
Robert Nishihara – Co-creator of Ray and co-founder of Anyscale. Robert completed his PhD in machine learning and distributed systems at UC Berkeley under Michael Jordan, following his mathematics degree from Harvard.
Richard Liaw – Founding engineer at Anyscale. Richard worked on Ray as part of his PhD at UC Berkeley under Ion Stoica and Joseph Gonzalez before joining Anyscale full-time in 2020.
Amjad Almahairi – Research Scientist at Anyscale focusing on distributed training systems. Prior to joining Anyscale, Amjad was at Meta where he was a member of the Llama 2 team. He holds a PhD in machine learning from the University of Montreal (2018)