Speaker: Marco Serafini
Location: Soda 510
Date: November 8th, 2024
Time: 12pm-1pm PDT
Title: Harnessing Structure with Parallelism: Scalable Systems for GNN Training
Abstract:
In many machine learning domains such as knowledge graphs, social networks, and recommendation systems, training data is interconnected in a graph structure. Graph Neural Networks (GNNs) leverage both data and structure to learn accurate representations for tasks like node classification, link prediction, and question answering, often achieving state-of-the-art performance. This ability to model structural dependencies, however, introduces significant challenges when scaling and parallelizing GNN training.
In this talk, I will discuss systems challenges and solutions for efficient and scalable GNN training. I will start by comparing the two main classes of GNN training systems: full-graph and mini-batch. Our recent comprehensive evaluation shows that mini-batch systems not only converge faster but also achieve similar or, in some cases, superior accuracy compared to full-graph systems across a range of datasets, GNN models, and hardware configurations.
Mini-batch GNN training faces two key bottlenecks: the irregular and expensive sampling operation required to bound the mini-batch size, and redundant computation and data loading across GPUs due to overlapping vertices and edges across micro-batches. I will present two approaches we developed to address these issues: NextDoor, which introduces GPU-based GNN sampling and optimizes it with transit parallelism, and GSplit, which employs split parallelism and avoids redundancy by splitting mini-batches online at each training iteration.
Bio:
Marco Serafini is an assistant professor at the Manning College of Information and Computer Sciences at UMass Amherst. His research focuses on designing data systems that support emerging data analytics and machine learning applications while abstracting the complexity of parallel and distributed computing. He specializes in systems for graph learning, mining, and storage, as well as cloud data management systems, and has his work was applied to big-data platforms such as Apache Zookeeper and Storm.
Marco has served on the Program Committees of major systems and database conferences, including SOSP, OSDI, Eurosys, ASPLOS, SIGMOD, VLDB, and ICDE. He has been Program Chair of the LADIS and APSys workshops, Track Program Chair for ICDCS, and Associate Editor for SIGMOD.