Agentic Evaluations Workshop Deep Dive

Media Summary: As agents evolve from text conversations to autonomous agents capable of multi-step reasoning, tool use, and real-world task ... Reinforcement learning is becoming central to Most agents get tested by running a few queries and checking if it looks right. Laurie calls this the vibes problem: it doesn't catch ...

Agentic Evaluations Workshop Deep Dive - Detailed Analysis & Overview

As agents evolve from text conversations to autonomous agents capable of multi-step reasoning, tool use, and real-world task ... Reinforcement learning is becoming central to Most agents get tested by running a few queries and checking if it looks right. Laurie calls this the vibes problem: it doesn't catch ... On SWE-Bench Pro, six frontier models land within a couple of percentage points of each other. The harness they run inside shifts ... In this episode of "AWS Show and Tell", we will Many RAG initiatives stall after early demos because they hallucinate, break under orchestration, or fail to show measurable ...

For more information about Stanford's Artificial Intelligence professional and graduate programs, visit: Want to learn real AI Engineering? Go here: Want to start freelancing? Let me help: ... Master the fundamentals of the Microsoft AB-100: We've all seen website chat bots which can look up an order or answer a basic question -- but what does it take to build ... For more information about Stanford's Artificial Intelligence programs visit: In this webinar, you will gain an ...

Photo Gallery

Agentic Evaluations Workshop - Deep Dive on the Future on Evals for Agents.

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

Ship Real Agents: Hands-On Evals for Agentic Applications — Laurie Voss, Arize

Agentic Evaluations at Scale, For Everybody — Nicholas Kang & Michael Aaron, Google DeepMind

Agentic AI Engineering: Complete 4-Hour Workshop feat. MCP, CrewAI and OpenAI Agents SDK

Amazon Bedrock AgentCore Deep dive series: AgentCore Evaluations | AWS Show and Tell

Agentic RAG in Production: Orchestration, Evaluation & ROI - Rohit Bhardwaj

Agentic Automation for Testers – A Hands-On Deep Dive

Stanford CS230 | Autumn 2025 | Lecture 8: Agents, Prompts, and RAG

How to Systematically Setup LLM Evals (Metrics, Unit Tests, LLM-as-a-Judge)

Pass the AB-100 Exam: Agentic AI Fundamentals, Ecosystem, and Solution Planning (2-Hour Intensive)

Case Study + Deep Dive: Telemedicine Support Agents with LangGraph/MCP - Dan Mason

View Detailed Profile

Agentic Evaluations Workshop - Deep Dive on the Future on Evals for Agents.

Agentic Evaluations Workshop - Deep Dive on the Future on Evals for Agents.

As agents evolve from text conversations to autonomous agents capable of multi-step reasoning, tool use, and real-world task ...

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

Reinforcement learning is becoming central to

Ship Real Agents: Hands-On Evals for Agentic Applications — Laurie Voss, Arize

Ship Real Agents: Hands-On Evals for Agentic Applications — Laurie Voss, Arize

Most agents get tested by running a few queries and checking if it looks right. Laurie calls this the vibes problem: it doesn't catch ...

Agentic Evaluations at Scale, For Everybody — Nicholas Kang & Michael Aaron, Google DeepMind

Agentic Evaluations at Scale, For Everybody — Nicholas Kang & Michael Aaron, Google DeepMind

On SWE-Bench Pro, six frontier models land within a couple of percentage points of each other. The harness they run inside shifts ...

Agentic AI Engineering: Complete 4-Hour Workshop feat. MCP, CrewAI and OpenAI Agents SDK

Agentic AI Engineering: Complete 4-Hour Workshop feat. MCP, CrewAI and OpenAI Agents SDK

In this comprehensive hands-on

Amazon Bedrock AgentCore Deep dive series: AgentCore Evaluations | AWS Show and Tell

Amazon Bedrock AgentCore Deep dive series: AgentCore Evaluations | AWS Show and Tell

In this episode of "AWS Show and Tell", we will

Agentic RAG in Production: Orchestration, Evaluation & ROI - Rohit Bhardwaj

Agentic RAG in Production: Orchestration, Evaluation & ROI - Rohit Bhardwaj

Many RAG initiatives stall after early demos because they hallucinate, break under orchestration, or fail to show measurable ...

Agentic Automation for Testers – A Hands-On Deep Dive

Agentic Automation for Testers – A Hands-On Deep Dive

As AI continues to reshape software

Stanford CS230 | Autumn 2025 | Lecture 8: Agents, Prompts, and RAG

Stanford CS230 | Autumn 2025 | Lecture 8: Agents, Prompts, and RAG

For more information about Stanford's Artificial Intelligence professional and graduate programs, visit: https://stanford.io/ai ...

How to Systematically Setup LLM Evals (Metrics, Unit Tests, LLM-as-a-Judge)

How to Systematically Setup LLM Evals (Metrics, Unit Tests, LLM-as-a-Judge)

Want to learn real AI Engineering? Go here: https://go.datalumina.com/iIO93Ps Want to start freelancing? Let me help: ...

Pass the AB-100 Exam: Agentic AI Fundamentals, Ecosystem, and Solution Planning (2-Hour Intensive)

Pass the AB-100 Exam: Agentic AI Fundamentals, Ecosystem, and Solution Planning (2-Hour Intensive)

Master the fundamentals of the Microsoft AB-100:

Case Study + Deep Dive: Telemedicine Support Agents with LangGraph/MCP - Dan Mason

Case Study + Deep Dive: Telemedicine Support Agents with LangGraph/MCP - Dan Mason

We've all seen website chat bots which can look up an order or answer a basic question -- but what does it take to build ...

Stanford Webinar - Agentic AI: A Progression of Language Model Usage

Stanford Webinar - Agentic AI: A Progression of Language Model Usage

For more information about Stanford's Artificial Intelligence programs visit: https://stanford.io/ai In this webinar, you will gain an ...