Dissertation Talk: Algorithmic Methods For Efficient Deep Learning Inference on the Edge – Eyal Sela

Title: Algorithmic Methods For Efficient Deep Learning Inference on the Edge
Speaker: Eyal Sela
Advisor: Ion Stoica and Joseph Gonzalez

Date: Thursday, May 7th
Time: 10AM-11AM (Pacific Time)

This is a hybrid event held in person and virtually over Zoom.
Location (In-person): Soda 511

Abstract:
Applications such as live video analytics, robotics, and autonomous vehicles require streaming perception to respond rapidly to evolving scenes, while balancing accuracy, latency, and compute cost. In conventional perception, scaling to larger models or higher-resolution inputs can improve accuracy. In streaming settings, however, the added latency can make predictions stale or unusable, and the accuracy increase from more computation can vary substantially across a video.
To better exploit high-compute methods, this talk frames streaming perception as a test-time compute allocation problem: rather than fixing one inference configuration before deployment, systems should decide at runtime when additional computation is likely to improve end-to-end streaming perception quality. The central observation is that the trade off among accuracy, latency, and cost is context-dependent and can be estimated from the scene dynamics. Building on this observation, I show how streaming perception systems can selectively invoke high-compute methods when their expected accuracy benefit outweighs their latency and cost.