vCache – UC Berkeley Sky Computing Lab

Reliable and Efficient Semantic Prompt Caching

vCache is the first semantic prompt cache that guarantees user-defined error rate bounds. Semantic caching reduces LLM latency and cost by returning cached model responses for semantically similar prompts (not just exact matches). vCache replaces static thresholds with online-learned, embedding-specific decision boundaries—no manual fine-tuning required. This enables reliable cached response reuse across any embedding model or workload.

GitHub

Contributors

Luis Gaspar Schroeder, Aditya Desai, Alejandro Cuadron, Kyle Chu, Shu Liu, Mark Zhao, Stephan Krusche, Alfons Kemper, Ion Stoica, Matei Zaharia, Joseph E. Gonzalez

Publications

CoRR – vCache: Verified Semantic Prompt Caching