RESEARCH PAPERS

Full list of 14 new papers referenced in today's briefing.

CRITICAL READ2512.03921ReasoningAgentic

Recursive Criticism and Improvement (RCI) for Complex Reasoning

Stanford University AI Lab Dec 18, 2025 1,287

Introduces a self-correction framework where LLMs critique their own outputs before final generation, improving accuracy on reasoning benchmarks by 14%.

READ PAPER

CRITICAL READANTH-CU-2025-V2AgentsTool Use

Anthropic Computer Use 2.0: Enhanced Browser Control

Anthropic Research Dec 18, 2025 4,156

Major update to Computer Use capability with improved screenshot understanding, faster response times, and new safety guardrails for enterprise deployment.

READ PAPER

CRITICAL READOAI-REALTIME-V2VoiceRealtime

OpenAI Realtime API: Voice-First Prompt Engineering

OpenAI Engineering Dec 18, 2025 2,345

Best practices for designing prompts that work seamlessly in voice-first applications, including handling interruptions and maintaining conversational context.

READ PAPER

CRITICAL READOAI-SO-V2APIReliability

Structured Outputs v2: Achieving 99.8% Schema Reliability

OpenAI Engineering Dec 17, 2025 2,156

Deep dive into the constrained decoding techniques behind the new Structured Outputs API, eliminating the need for complex JSON-formatting prompts.

READ PAPER

DM-SQL-OPTSQLOptimization

Prompt-to-SQL Optimization via Schema-First Context Injection

Google DeepMind Dec 16, 2025 934

A new few-shot prompting technique that prioritizes schema definitions over natural language instructions, reducing SQL generation errors by 22%.

READ PAPER

LC-GRAPH-25OrchestrationArchitecture

From Chains to Graphs: Orchestrating Cyclic Agent Workflows

LangChain AI Dec 15, 2025 1,623

A conceptual framework for moving beyond linear prompt chains to state-machine based graphs (LangGraph) for more resilient agent behaviors.

READ PAPER

MS-CoT-V2ReasoningEfficiency

Chain-of-Thought v2: Dynamic Step Allocation

Microsoft Research Dec 14, 2025 987

Proposes a method where the model dynamically decides how many reasoning steps to allocate based on problem complexity, rather than a fixed prompt.

READ PAPER

META-LLAMA-GUARDSafetyLlama

Llama Guard 4: Specialized Safety Prompting

Meta AI Dec 12, 2025 712

Research on using specialized safety models to filter inputs/outputs without degrading the performance of the main reasoning model.

READ PAPER

ARXIV-2511.098RAGContext Window

The Limits of In-Context Learning for Long-Tail Knowledge

UC Berkeley Dec 10, 2025 1,178

Empirical study showing that RAG is still superior to massive context windows for retrieving obscure, long-tail facts.

READ PAPER

NVIDIA-NEMO-25AlignmentInference

NeMo SteerLM: Aligning Models During Inference

NVIDIA Dec 08, 2025 589

Technique to adjust model attributes (helpfulness, humor, creativity) in real-time during inference using steering vectors.

READ PAPER

HF-ZEPHYR-BETAFine-TuningOpen Source

Zephyr-Beta: Fine-Tuning on Synthetic Instructions

Hugging Face Dec 05, 2025 1,945

Demonstrates that high-quality synthetic data generated by larger models can effectively fine-tune smaller models to state-of-the-art performance.

READ PAPER

AWS-BEDROCK-AGENTSEnterpriseCompliance

Bedrock Agents: Traceability in Enterprise Workflows

AWS AI Dec 03, 2025 467

Whitepaper on ensuring auditability and traceability when deploying autonomous agents in regulated enterprise environments.

READ PAPER

COHERE-RERANK-3RAGMultilingual

Rerank 3: Optimizing Retrieval for Multilingual Contexts

Cohere Dec 01, 2025 812

Analysis of the new Rerank 3 model's ability to improve RAG performance across 100+ languages without translating documents.

READ PAPER

MISTRAL-MOE-PMoEPrompting

Mixture-of-Experts Prompting Strategies

Mistral AI Nov 28, 2025 1,398

Best practices for prompting MoE models, specifically how to route queries to specific experts implicitly through phrasing.

READ PAPER

GOOGLE-GEMINI-MMMultimodalGemini

Gemini 2.0: Multimodal Prompting Nuances

Google DeepMind Nov 25, 2025 2,287

Guide to interleaving text, image, and video inputs effectively to maximize reasoning performance in the new Gemini 2.0 architecture.

READ PAPER

CLAUDE-ARTIFACTS-25ArtifactsInteractive

Claude Artifacts: Best Practices for Interactive Outputs

Anthropic Research Nov 22, 2025 1,654

Comprehensive guide to prompting Claude for generating interactive artifacts including code, visualizations, and documents.

READ PAPER