Speaker: Jeffrey Pang and Karthik Lakshminarayanan
Location: Soda 510
Date: October 13, 2023
Time: 11am-12pm PST
Automating a Lakehouse: Opportunities for AI-driven automation in data platforms
Abstract: Recently, the data management landscape introduced the “data lakehouse,” which merges the benefits of data lakes and data warehouses. The goal of a lakehouse is to enable efficient data storage, processing, analytics, and machine learning all within a single platform. In this talk, we describe the Databricks Lakehouse architecture—in particular, we examine some examples of how a data platform can use data emitted by its own systems to train models and optimize itself. Our results demonstrate significant opportunities to optimize the experience of data applications, cloud storage data layouts, and serverless compute using AI, when given sufficient data. The lakehouse architecture is still in its infancy, and we believe these results show there is ample room to make data platforms simpler, faster, and cheaper with AI.
Jeffrey Pang is a Distinguished Engineer at Databricks, where he works on multi-tenant control plane architecture, storage infrastructure, reliable distributed systems, software development lifecycle, and other parts of the Databricks Lakehouse. He previously worked on large scale network data collection, analysis, and anomaly detection at AT&T Labs-Research. He received his Ph.D in Computer Science from Carnegie Mellon University, and a BA in Computer Science from UC Berkeley.
Karthik Lakshminarayanan is a VP of Engineering at Databricks, where he leads the Application Platform organization that focuses on building frameworks and systems to enable all Databricks engineering teams to build and deploy large-scale systems safely and swiftly. He previously worked on building large-scale serving and data infrastructure in Google Ads for 10 years, and prior to that at Conviva for 4 years. He received his Ph.D. in Computer Science from UC Berkeley, advised by Prof. Ion Stoica.