Search Arena
A Crowdsourced In-The-Wild Evaluation Platform for Search-Augmented LLM Systems Based on Human Preference
R2E-Gym
Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents
BARE
A Method for Combining Base Language Models and Instruction-Tuned Language Models for Better Synthetic Data Generation.
Ember
A Compositional Framework for Building and Deploying Large Inference-Time Scaling Architectures and Strategies
LOTUS
Easily Build Knowledge-Intensive LLM Applications That Reason Over Your Data With LOTUS!
RouteLLM
A Framework for Serving and Evaluating LLM Routers – Save LLM Costs Without Compromising Quality!
Berkeley Function-Calling Leaderboard
Measuring Function-Calling Capabilities of Different LLMs
Skydentity
Let Orchestrators Run Your Workloads on Your Cloud Resources Without Handing Over Your Cloud Credentials and Data
RAFT
“Retrieval-Augmented Fine-Tuning” combines the benefits of Retrieval-Augmented Generation and Fine-Tuning for better domain adaptation
Arena Hard
An Automatic Pipeline to Build High-Quality LLM Benchmarks with High Separability and Agreement to Human Preference from Live Data
Gorilla
Gorilla is an open-source, state-of-the-art LLM that invokes API calls to interact with services!
SkyPilot
SkyPilot is a framework for running LLMs, AI, and batch jobs on any infrastructure, offering maximum cost savings, highest GPU availability, and managed execution.