Deep Papers

0 Favorites

Episodes

CUGA Agent: From Benchmarks to Business Impact of IBM's Generalist Agent	2/11/2026	23:04
TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture	11/24/2025	23:44
Meta AI Researcher Explains ARE and Gaia2: Scaling Up Agent Environments and Evaluations	11/10/2025	22:34
Georgia Tech's Santosh Vempala Explains Why Language Models Hallucinate, His Research With OpenAI	10/14/2025	31:24
Atropos Health’s Arjun Mukerji, PhD, Explains RWESummary: A Framework and Test for Choosing LLMs to Summarize Real-World Evidence (RWE) Studies	9/22/2025	26:22
Stan Miasnikov, Distinguished Engineer, AI/ML Architecture, Consumer Experience at Verizon Walks Us Through His New Paper	9/6/2025	48:11
Small Language Models are the Future of Agentic AI	9/5/2025	31:15
Watermarking for LLMs and Image Models	7/30/2025	42:56
Self-Adapting Language Models: Paper Authors Discuss Implications	7/8/2025	31:26
The Illusion of Thinking: What the Apple AI Paper Says About LLM Reasoning	6/20/2025	30:35
Accurate KV Cache Quantization with Outlier Tokens Tracing	6/4/2025	25:11
Scalable Chain of Thoughts via Elastic Reasoning	5/16/2025	28:54
Sleep-time Compute: Beyond Inference Scaling at Test-time	5/2/2025	30:24
LibreEval: The Largest Open Source Benchmark for RAG Hallucination Detection	4/18/2025	27:19
AI Benchmark Deep Dive: Gemini 2.5 and Humanity's Last Exam	4/4/2025	26:11
Model Context Protocol (MCP)	3/25/2025	15:03
AI Roundup: DeepSeek’s Big Moves, Claude 3.7, and the Latest Breakthroughs	2/28/2025	30:23
How DeepSeek is Pushing the Boundaries of AI Development	2/21/2025	29:54
Multiagent Finetuning: A Conversation with Researcher Yilun Du	2/4/2025	30:03
Training Large Language Models to Reason in Continuous Latent Space	1/14/2025	24:58