Media Summary: Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam ... Shishir Patal, a Research Scientist at Meta, delivered a presentation on AI agents and their Today, I want to share a new episode with Aman Khan. The best way to learn about AI

Agentic Evaluations What To Do - Detailed Analysis & Overview

Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam ... Shishir Patal, a Research Scientist at Meta, delivered a presentation on AI agents and their Today, I want to share a new episode with Aman Khan. The best way to learn about AI Evaluating AI agents is one of the toughest challenges in the world of LLMs—but it doesn't have to be. In this video, we walk you ... On SWE-Bench Pro, six frontier models land within a couple of percentage points of each other. The harness they run inside shifts ... Evaluating Agents with ADK → This video applies the theory of AI agent

For more information about Stanford's graduate programs, visit: November 21, ... When companies deploy their agents into production, a key challenge emerges: how to evaluate whether the agent is performing ... Want to learn real AI Engineering? Go here: Want to start freelancing? Let me help: ... This video introduces a new series on testing AI agents, focusing on why traditional Hamel Husain and Shreya Shankar teach the world's most popular course on AI evals and have trained over 2000 PMs and ... As agents evolve from text conversations to autonomous agents capable of multi-step reasoning, tool use, and real-world task ...

Photo Gallery

LLM as a Judge: Scaling AI Evaluation Strategies
Agentic Evals by Shishir Patil
Complete Beginner's Course on AI Evaluations in 50 Minutes (2025) | Aman Khan
How to Evaluate Agents: Galileo’s Agentic Evaluations in Action
Agentic Evaluations at Scale, For Everybody — Nicholas Kang & Michael Aaron, Google DeepMind
How to evaluate agents in practice
Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 8 - LLM Evaluation
Beginner's Guide to Agent Evaluations
How to Systematically Setup LLM Evals (Metrics, Unit Tests, LLM-as-a-Judge)
The agent evaluation revolution
Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar
What is Agentic AI and How Does it Work?
View Detailed Profile
LLM as a Judge: Scaling AI Evaluation Strategies

LLM as a Judge: Scaling AI Evaluation Strategies

Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam ...

Agentic Evals by Shishir Patil

Agentic Evals by Shishir Patil

Shishir Patal, a Research Scientist at Meta, delivered a presentation on AI agents and their

Complete Beginner's Course on AI Evaluations in 50 Minutes (2025) | Aman Khan

Complete Beginner's Course on AI Evaluations in 50 Minutes (2025) | Aman Khan

Today, I want to share a new episode with Aman Khan. The best way to learn about AI

How to Evaluate Agents: Galileo’s Agentic Evaluations in Action

How to Evaluate Agents: Galileo’s Agentic Evaluations in Action

Evaluating AI agents is one of the toughest challenges in the world of LLMs—but it doesn't have to be. In this video, we walk you ...

Agentic Evaluations at Scale, For Everybody — Nicholas Kang & Michael Aaron, Google DeepMind

Agentic Evaluations at Scale, For Everybody — Nicholas Kang & Michael Aaron, Google DeepMind

On SWE-Bench Pro, six frontier models land within a couple of percentage points of each other. The harness they run inside shifts ...

How to evaluate agents in practice

How to evaluate agents in practice

Evaluating Agents with ADK → https://goo.gle/testagent This video applies the theory of AI agent

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 8 - LLM Evaluation

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 8 - LLM Evaluation

For more information about Stanford's graduate programs, visit: https://online.stanford.edu/graduate-education November 21, ...

Beginner's Guide to Agent Evaluations

Beginner's Guide to Agent Evaluations

When companies deploy their agents into production, a key challenge emerges: how to evaluate whether the agent is performing ...

How to Systematically Setup LLM Evals (Metrics, Unit Tests, LLM-as-a-Judge)

How to Systematically Setup LLM Evals (Metrics, Unit Tests, LLM-as-a-Judge)

Want to learn real AI Engineering? Go here: https://go.datalumina.com/iIO93Ps Want to start freelancing? Let me help: ...

The agent evaluation revolution

The agent evaluation revolution

This video introduces a new series on testing AI agents, focusing on why traditional

Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar

Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar

Hamel Husain and Shreya Shankar teach the world's most popular course on AI evals and have trained over 2000 PMs and ...

What is Agentic AI and How Does it Work?

What is Agentic AI and How Does it Work?

What exactly is

Agentic Evaluations Workshop - Deep Dive on the Future on Evals for Agents.

Agentic Evaluations Workshop - Deep Dive on the Future on Evals for Agents.

As agents evolve from text conversations to autonomous agents capable of multi-step reasoning, tool use, and real-world task ...