Speaker: Ashot Vardanian
Location: Soda 510
Date: May 5, 2023
Time: 12-1pm PST
Vector Search at Scale: Bottlenecks and Solutions
Modern CLIP-like AI models allow embedding multi-modal unstructured data, such as images and texts, into shared representations in some vector space. Vector search indexes retrieve similar objects in such collections in sub-linear time. With the rise of LLMs, embeddings demonstrate great results on broad tasks, extending applicability beyond classical Semantic Search. Those new embeddings are much larger, often reaching 2048 dimensions, introducing new bottlenecks and making vector search performance more crucial.
In this talk, we will define the vector search problem and look at different algorithms implementing it. We will dissect existing state-of-the-art vector search engines, identify their bottlenecks, and demonstrate improvements through our open-source implementation.
Our insight is that vector search is a memory-bound problem, and understanding what our hardware can do for us helps us perform better. We will discuss the importance of memory layout, the role of the memory allocator, and quantization in vector search systems and demonstrate ways to improve each. We will show that our insights are general and apply to many domains different from vector search, as demonstrated by several of the products offered by Unum.
Ashot Vardanian is a CS and AI researcher, founder of Unum, and the organizer of Armenia’s C++ and Systems Design groups. He has been investing in cloud companies and analyzing their software since 2014. Today, he is rethinking the overall cloud service architecture with vertical optimizations, aiming for efficiency across Key-Value Stores, Communication Libraries, Vector Search, and Multi-Modal Neural Networks.