Arena turns AI evaluation into $100M business
Arena, the AI leaderboard and evaluation platform, has reached $100M in annualised run-rate revenue after commercialising model evaluation tools.

AI model evaluation is turning from a research utility into a real business.
What happened
Arena, the AI leaderboard and evaluation platform that began as a UC Berkeley research project, has reached $100M in annualised run-rate revenue eight months after launching its commercial service.
Its leaderboard is built from more than 10M user evaluations, while the paid product gives model labs and enterprises deeper analytics on model performance.
Why it matters
As AI models multiply, companies need better ways to compare them. Raw benchmark scores are not enough when teams care about reliability, user preference, cost and performance across real tasks.
Arena’s growth shows that AI evaluation is becoming its own infrastructure layer, not just a side project for researchers.
The bigger picture
The model market is getting crowded, and buyers need trust signals. Evaluation platforms can become the neutral measurement layer between AI labs, enterprises and developers.
If model choice becomes more dynamic, tools that explain which model performs best for which job could become core enterprise software.
