Sky Seminar: Justin Levandoski (Google) – Google BigQuery: from Google Cloud to a multi-cloud lakehouse

Speaker: Justin Levandoski
Location: Soda 430-438, Woz Lounge
Date: February 9, 2023
Time: 12-1pm PST

Title: Google BigQuery: from Google Cloud to a multi-cloud lakehouse

Abstract:
Google BigQuery is a serverless, scalable, and cost effective cloud data warehouse. Having evolved from internal Google infrastructure (Dremel), BigQuery is unique in a number of dimensions. This talk provides an overview of some of the unique architectural aspects of BigQuery and how it provides a true serverless and multi-tenant warehousing solution to customers. We then cover recent work on extending BigQuery to become a multi-cloud “lakehouse”, including (1) BigQuery Omni, our infrastructure that allows us to ship BigQuery on other clouds; (2) BigLake, extending BigQuery infrastructure to provide first-class support for external query processing engines (e.g., Spark) and open formats on object storage; and (3) BQML and Object Tables, support for processing unstructured data (e.g., images, documents, video) and performing ML inference completely within BigQuery.  

Bio:
Justin Levandoski is a Director of Engineering at Google working on BigQuery, where he leads the Lake Analytics and Omni cross-cloud data warehousing infrastructure efforts. Prior to Google, he was a principal engineer at Amazon Web Services (AWS), where he worked on Amazon Aurora, a cloud-native operational database system. Before that, he was a member of the database group at Microsoft Research, where he worked on main-memory databases, database support for new hardware platforms, transaction processing, and cloud computing. His research was commercialized in a number of Microsoft products, including the SQL Server Hekaton main-memory database engine, Azure CosmosDB, and bing. He was an associate editor for IEEE TKDE and has served as program and organizing committees for top database conferences such as ACM SIGMOD, VLDB, ICDE, CIDR, and HPTS.