Continual Learning Bench

The First, Realistic Benchmark for Measuring How AI Systems Can Improve in Online Settings

Continual Learning Bench 1.0 is a benchmark of expert-validated task sequences across multiple real-world domains, where:

  • Tasks are not independent
  • Systems are allowed (and expected) to change during evaluation
  • Performance depends on how the system uses what it has seen before

We release Continual Learning Bench 1.0 with a set of difficult task sequences from various domains, such as software engineering, data science, strategic modeling, etc. Our tasks leave significant headroom in the design space for sophisticated continual learning methods to improve in the real continual learning settings we design. In fact, vanilla in-context learning systems are among the best we’ve looked at!


Contributors

Parth Asawa, Chris Glaze, Gabe Orlanski, Benji Xu, Ramya Ramakrishnan, Asim Biswal, Vincent Sunn Chen, Frederic Sala, Matei Zaharia, Joseph E. Gonzalez

Publications