Sky Seminar Series: Fatma Ozcan (Google) – ML, LLMs and Data Management


Speaker: Fatma Ozcan
Location: Soda 510
Date: February 23, 2024
Time: 12-1pm PST

Title: ML, LLMs and Data Management

Abstract:
In this talk, we discuss using ML and LLMs for data management problems.  In the first part, we review natural languages interfaces to data, and how LLMs are fueling new interest and solutions in this space. After reviewing some of the existing issues, we argue that we need semantic data models and better context for improving the accuracy of existing solutions. By enriching our understanding of the underlying data with semantics, we can generate more precise input context, and guide the prompts for better SQL generation. We will also show other applications where better understanding of data semantics leads to more accuracy.

In the second part of the talk, we introduce foundational cost models for databases. For learned database tasks, the state-of-the-art is one-off models that need to be trained individually per task and even per database which causes extremely high training overheads. In this talk, we argue that a new learning paradigm is needed that moves away from one-off models towards more generalizable models that can be used with only minimal overhead for an unseen database on a wide spectrum of tasks. To realize this goal, we build models to learn representations of data and queries that can be used to solve various database tasks. We will present our  early-prototype and initial results, and conclude with a research roadmap with many open challenges.

Bio:
Fatma Ozcan is a Principal Engineer at Systems Research@Google. Before that, she was a Distinguished Research Staff Member and a senior manager at IBM Almaden Research Center. Her current research focuses on ML for databases,  and platforms and infra-structure for large-scale data analysis. Dr. Ozcan got her PhD degree in computer science from University of Maryland, College Park, and her BSc degree in computer engineering from METU, Ankara. She has over 23 years of experience in industrial research, and has delivered core technologies into IBM products. She has been a contributor to various SQL standards, including SQL/XML, SQL/JSON and SQL/PTF.  Dr. Ozcan co-authored several conference papers and patents. She received the VLDB Women in Database Research Award in 2022. She is an ACM Distinguished Member, and the vice chair of ACM SIGMOD.