An Inference-Time Scaling Architecture Framework

Ember is a compositional framework for building and deploying large inference-time scaling architectures and strategies. We call these architectures Networks of Networks, NONs for shorthand. These architectures can be employed to eclipse the quality or reliability frontier available via today’s frontier LLMs or to achieve comparable quality at 1/1000th or less the cost.
Contributors
Jared Quincy DavisF,S, Marquita EllisI, Diana ArroyoI, Pravein Govindan KannanI, Paul CastroI, Siddharth SharmaF,S, Parth AsawaB, Alan ZhuB, Connor ChowB, Jason LeeB, Jay Adityanag TipirneniB, Chad FergusonB, Kathleen GeB, Kunal AgrawalB, Rishab BhatiaB, Rohan PenmatchaB, Sai KolasaniB,Théo Jaffrelot InizanB, Lingjiao ChenMS, Omar KhattabD,MT, Deepak NarayananN, Long FeiF, Aparajit RaghavanF, Eyal CidonF, Jacob ScheinF, Prasanth SomasundarF, Boris HaninF,P, James ZouS, Alex DimakisB, Joey GonzalezB, Peter BailisG,S, Ion StoicaA,B,D, Matei ZahariaD,B
Foundry (MLFoundry)F, DatabricksD, IBM ResearchI, Stanford UniversityS, UC BerkeleyB, MITMT, NVIDIAN, MicrosoftMS, AnyscaleA, GoogleG, PrincetonP