Speaker: Abhinav Venigalla
Location: Soda 380 (Note the change in venue)
Date: April 12, 2024
Time: 11-12pm PST (Note the change in time)
Title:
Systems and Optimizations for training MoEs at Scale
Abstract:
DBRX is a new Mixture-of-Experts (MoE) open model trained from scratch in ~3 months by the Databricks Mosaic research team. This talk will cover how DBRX was built using both open-source tools (StreamingDataset, Composer, Megablocks, etc.) and internal Databricks infrastructure. It will cover systems and optimization challenges associated with training MoEs at scale (on 3072xH100), how the team arrived at some of the modeling decisions, and directions for future improvement.
Bio:
Abhinav Venigalla is an NLP Architect at Databricks, where he helps organizations train and deploy their own custom language models. He recently led the development of the open model DBRX. Prior to that, he was the 2nd engineer at MosaicML, and a member of the Algorithms team at Cerebras Systems.