GSO – UC Berkeley Sky Computing Lab

Challenging Software Optimization Tasks for Evaluating SWE-Agents

Developing high-performance software is a complex task that requires specialized expertise. GSO (Global Software Optimization) is a benchmark for evaluating language models’ capabilities in developing high-performance software. We develop an automated pipeline that generates and executes synthetic end-to-end performance tests to analyze repository commit histories.

We identify 102 challenging optimization tasks across 10 codebases, spanning diverse domains and programming languages. In each GSO task, an agent is provided with a codebase and a performance test as a precise specification, and tasked to improve the runtime efficiency. Evaluation involves measuring the correctness of the model-generated patch’s and its performance with respect to the expert developer commit that serves as a target.

Website

Leaderboard

Contributors

Manish Shetty, Naman Jain, Jinjian Liu, Vijay Kethanaboyina, Koushik Sen, Ion Stoica

Publications

CoRR – GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents