A Benchmark for Table-Augmented Generation

Table-Augmented Generation (TAG) is a unified and general-purpose paradigm for answering natural language questions over databases. The TAG model represents a wide range of interactions between the LM and database that have been previously unexplored in methods such as Text2SQL and RAG. We provide the first benchmark to study the TAG problem and find that standard methods struggle to answer such queries, confirming the need for further research in this area.
The initial TAG v1 benchmark is built upon the BIRD Text2SQL benchmark. To enhance the complexity of the queries and challenge the capabilities of LLMs, the queries in the TAG benchmark have been modified to require either world knowledge or semantic reasoning that goes beyond the information explicitly available in the database. We use and modify a subset of match-based, comparison, ranking, and aggregation queries from BIRD. In total, there are 80 queries, with 20 of each type. 40 of these queries require parameteric knowledge of the model and the other 40 require the model to perform reasoning.
Contributors
Asim Biswal, Liana Patel, Siddharth Jha, Amog Kamsetty, Shu Liu, Joseph E. Gonzalez, Carlos Guestrin, Matei Zaharia
Publications
CoRR – Text2SQL is Not Enough: Unifying AI and Databases with TAG