Sky Systems Seminar: Shivaram Venkataraman (UW Madison) – Resource Efficient Large Scale ML: Plan Before You Run

Speaker: Shivaram Venkataraman

Location: Soda 510

Date: December 1, 2023

Time: 11am-12pm PST

Title: Resource Efficient Large Scale ML: Plan Before You Run

Abstract:

As ML on structured data becomes prevalent across enterprises, improving resource efficiency is crucial to lower costs and energy consumption. Designing systems for learning on structured data is challenging because of the large number of model parameters and data access patterns. We identify that current systems are bottlenecked by data movement which results in poor resource utilization and inefficient training.
In this talk, I will describe our work on developing systems that plan data access ahead of time to yield drastic improvements in resource efficiency. I will first describe Marius, a system for training ML models on billion-edge graphs using a single machine. Marius is designed as an out-of-core, pipelined training system and includes new buffer-aware data orderings that minimize disk accesses. I will then describe BagPipe, a recently developed system that lowers remote data access overheads for distributed training of recommendation models while maintaining synchronous training semantics. Finally, I will discuss how our design approach can also be extended cluster-wide to improve resource utilization across ML training jobs.

Bio: Shivaram Venkataraman is an Assistant Professor in the Computer Science Department at University of Wisconsin, Madison. His research interests are in designing systems and algorithms for large scale data analysis and machine learning. Previously, he completed his PhD from UC Berkeley where he was advised by Ion Stoica and Mike Franklin. His work has been recognized with an NSF CAREER award, a SIGMOD Systems award and a SACM Student Choice Professor of the year award.