Title: Understanding Models When Everything Is a Vibe
Speaker: Lisa Dunlap
Advisor: Joseph Gonzalez and Trevor Darrell (and unofficially jacob steinhardt)
Date: Wednesday, April 15th
Time: 12:00PM-1:00PM (Pacific Time)
This is a hybrid event held in person and virtually over Zoom.
Location (In-person): Berkeley Way West, 8th Floor multi-purpose area
Abstract: I suppose all good things must come to an end….
Benchmarks have driven remarkable progress in AI, but as generative models get deployed for open-ended tasks, the hard problem has shifted from capability to understanding: what are our models actually doing, where do they fail, and do the signals we optimize reflect what users want? This talk is a walk through my PhD spent poking at these questions, and the shenanigans I passed off as research along the way. I’ll show how a surprising amount of evaluation reduces to one primitive (compare two piles of model inputs or outputs, see what falls out), share what we found building Chatbot Arena, and confirm that yes, every LLM has its own distinct drunk personality.