BenchJack

AI Agent Benchmark Hackability Scanner 

BenchJack is a hackability scanner for AI agent benchmarks. It runs a multi-phase audit pipeline — static analysis tools plus AI-powered deep inspection via Claude Code or Codex — and streams results to a live web dashboard as they arrive.

Point it at any benchmark repo. BenchJack will tell you whether an agent can cheat.


Contributors

Hao Wang

Publications