
Episodes
CUGA Agent: From Benchmarks to Business Impact of IBM's Generalist Agent | 2/11/2026 | 23:04 | |
TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture | 11/24/2025 | 23:44 | |
Meta AI Researcher Explains ARE and Gaia2: Scaling Up Agent Environments and Evaluations | 11/10/2025 | 22:34 | |
Georgia Tech's Santosh Vempala Explains Why Language Models Hallucinate, His Research With OpenAI | 10/14/2025 | 31:24 | |
Atropos Health’s Arjun Mukerji, PhD, Explains RWESummary: A Framework and Test for Choosing LLMs to Summarize Real-World Evidence (RWE) Studies | 9/22/2025 | 26:22 | |
Stan Miasnikov, Distinguished Engineer, AI/ML Architecture, Consumer Experience at Verizon Walks Us Through His New Paper | 9/6/2025 | 48:11 | |
Small Language Models are the Future of Agentic AI | 9/5/2025 | 31:15 | |
Watermarking for LLMs and Image Models | 7/30/2025 | 42:56 | |
Self-Adapting Language Models: Paper Authors Discuss Implications | 7/8/2025 | 31:26 | |
The Illusion of Thinking: What the Apple AI Paper Says About LLM Reasoning | 6/20/2025 | 30:35 | |
Accurate KV Cache Quantization with Outlier Tokens Tracing | 6/4/2025 | 25:11 | |
Scalable Chain of Thoughts via Elastic Reasoning | 5/16/2025 | 28:54 | |
Sleep-time Compute: Beyond Inference Scaling at Test-time | 5/2/2025 | 30:24 | |
LibreEval: The Largest Open Source Benchmark for RAG Hallucination Detection | 4/18/2025 | 27:19 | |
AI Benchmark Deep Dive: Gemini 2.5 and Humanity's Last Exam | 4/4/2025 | 26:11 | |
Model Context Protocol (MCP) | 3/25/2025 | 15:03 | |
AI Roundup: DeepSeek’s Big Moves, Claude 3.7, and the Latest Breakthroughs | 2/28/2025 | 30:23 | |
How DeepSeek is Pushing the Boundaries of AI Development | 2/21/2025 | 29:54 | |
Multiagent Finetuning: A Conversation with Researcher Yilun Du | 2/4/2025 | 30:03 | |
Training Large Language Models to Reason in Continuous Latent Space | 1/14/2025 | 24:58 |