[State of Evals] LMArena's $1.7B Vision — Anastasios Angelopoulos, LMArena

Latent Space: The AI Engineer Podcast • January 06, 2026 • Solo Episode

Guests

No guests identified for this episode.

Description

We are reupping this episode after LMArena announced their fresh Series A (https://www.theinformation.com/articles/ai-evaluation-startup-lmarena-valued-1-7-billion-new-funding-round?rc=luxwz4), raising $150m at a $1.7B valuation, with $30M annualized consumption revenue (aka $2.5m MRR) after their September evals product launch. —- From building LMArena in a Berkeley basement to raising $100M and becoming the de facto leaderboard for frontier AI, Anastasios Angelopoulos returns to Latent Space to recap 2025 in one of the most influential platforms in AI—trusted by millions of users, every major lab, and the entire industry to answer one question: which model is actually best for real-world use cases? We caught up with Anastasios live at NeurIPS 2025 to dig into the origin story (spoiler: it started as an academic project incubated by Anjney Midha at a16z, who formed an entity and gave grants before they even committed to starting a company), why they decided to spin out instead of staying academic or nonprofit (the only way to scale was to build a company), how they're spending that $100M (inference costs, React migration off Gradio, and hiring world-class talent across ML, product, and go-to-market), the leaderboard delusion controversy and why their response demolished the paper's claims (factual errors, misrepresentation of open vs. closed source sampling, and ignoring the transparency of preview testing that the community loves), why platform integrity comes first (the public leaderboard is a charity, not a pay-to-play system—models can't pay to get on, can't pay to get off, and scores reflect millions of real votes), how they're expanding into occupational verticals (medicine, legal, finance, creative marketing) and multimodal arenas (video coming soon), why consumer retention is earned every single day (sign-in and persistent history were the unlock, but users are fickle and can leave at any moment), and his vision for Arena as the central evaluation platform that provides the North Star for the industry—constantly fresh, immune to overfitting, and grounded in millions of real-world conversations from real users. We discuss: The $100M raise: use of funds is primarily inference costs (funding free usage for tens of millions of monthly conversations), React migration off Gradio (custom loading icons, better developer hiring, more flexibility), and hiring world-class talent The scale: 250M+ conversations on the platform, tens of millions per month, 25% of users do software for a living, and half of users are now logged in The leaderboard illusion controversy: Cohere researchers claimed undisclosed private testing created inequities, but Arena's response demolished the paper's factual errors (misrepresented open vs. closed source sampling, ignored transparency of preview testing that the community loves) Why preview testing is loved by the community: secret codenames (Gemini Nano Banana, named after PM Naina's nickname), early access to unreleased models, and the thrill of being first to vote on frontier capabilities The Nano Banana moment: changed Google's market share overnight, billions of dollars in stock movement, and validated that multimodal models (image generation, video) are economically critical for marketing, design, and AI-for-science New categories: occupational and expert arenas (medicine, legal, finance, creative marketing), Code Arena, and video arena coming soon Chapters 00:00:00 Introduction: Anastasios from Arena and the LM Arena Journey 00:01:36 The Anjney Midha Incubation: From Berkeley Basement to Startup 00:02:47 The Decision to Start a Company: Scaling Beyond Academia 00:03:38 The $100M Raise: Use of Funds and Platform Economics 00:05:10 Arena's User Base: 5M+ Users and Diverse Demographics 00:06:02 The Competitive Landscape: Artificial Analysis, AI.xyz, and Arena's Differentiation 00:08:12 Educational Value and Learning from the Community 00:08:41 Technical Migration: From Gradio to React and Platform Evolution 00:10:18 Leaderboard Delusion Paper: Addressing Critiques and Maintaining Integrity 00:12:29 Nano Banana Moment: How Preview Models Create Market Impact 00:13:41 Multimodal AI and Image Generation: From Skepticism to Economic Value 00:15:37 Core Principles: Platform Integrity and the Public Leaderboard as Charity 00:18:29 Future Roadmap: Expert Categories, Multimodal, Video, and Occupational Verticals 00:19:10 API Strategy and Focus: Doing One Thing Well 00:19:51 Community Management and Retention: Sign-In, History, and Daily Value 00:22:21 Partnerships and Agent Evaluation: From Devon to Full-Featured Harnesses 00:21:49 Hiring and Building a High-Performance Team

Audio