An Engine For Querying Data with LLMs
The semantic capabilities of language models (LMs) have the potential to enable rich analytics and reasoning over vast knowledge corpora. Unfortunately, existing systems lack high-level abstractions to perform semantic queries at scale. We introduce semantic operators, a declarative programming interface that extends the relational model with composable AI-based operations for semantic queries over datasets (e.g., sorting or aggregating records using natural language criteria). Each operator can be implemented and optimized in multiple ways, opening a rich space for execution plans similar to relational operators. We implement our operators and several optimizations for them in LOTUS (LLMs Over Tables of Unstructured and Structured Data), an open-source query engine with a Pandas-like API.
Contributors
Liana Patel, Siddharth Jha, Carlos Guestrin, Matei Zaharia