Database Seminar: Peter Boncz (CWI) – “Towards a new Big Data file format”

Speaker: Peter Boncz
Location: Soda 510
Date: January 18, 2024
Time: 1pm-2pm PST
Title: “Towards a new Big Data file format”

Abstract: Currently popular formats like Parquet and ORC are widely used in databases, data warehousing and data science. So called “data lakes” now
store zettabytes of data in the cloud. While many ideas behind these formats are still valid and are the foundation of their success, one can argue there is room for improvement. Wide-data parallelism is now prevalent in hardware (SIMD CPUs, but also GPUs and TPUs) and sparse, wide and nested tables are becoming ever more popular, driven by ML workloads. In this talk I will describe some ideas and recent results (FastLanes, ALP) that are elements of our agenda to help create a next-generation big data format.

Bio: Peter Boncz holds appointments as tenured researcher at CWI and professor at VU University Amsterdam. His academic background is in database systems, with the open-source column-store MonetDB the outcome of his PhD. He has a track record in bridging the gap between academia and commercial application, founding multiple startups. In 2008 he co-founded Vectorwise around the analytical database system by the same name, which pioneered vectorized query execution, and lightweight data compression; which have been adopted broadly in analytical database systems. He is currently on sabbatical at MotherDuck.