Speaker: Batya Kenig
Location: Soda 380
Date: October 4, 2023
Time: 11 AM – 12 PM PST
Title: “*Decomposing Data with Approximate Acyclic Schemas*”
Abstract: Acyclic schemas have numerous applications in databases and in machine learning, such as improved design, more efficient storage, and increased performance for queries and machine learning algorithms. In this work, we address the problem of automatically fitting an acyclic schema to a universal relation. An acyclic schema is lossless with respect to a universal relation if joining the projections, associated with the relations of the schema, results in the original universal relation. An intuitive and standard measure of loss entailed by such an acyclic schema is the number of redundant tuples generated by its associated join. We demonstrate how to characterize this loss using an information-theoretic measure, and how this measure can help discover numerous acyclic schemas that accurately fit the data.