The agent evaluation revolution // TRAIN BRAIN

The agent evaluation revolution

This video introduces a new series on testing AI agents, focusing on why traditional evaluation methods fall short for autonomous systems. Discover what "agent evaluation" truly means, encompassing the entire AI stack from the LLM brain to external tools and memory. We explore a full stack checklist for system level testing and highlight the unique challenges of multi-agent evaluation, providing a real life example to illustrate these concepts.
Subscribe to Google Cloud Tech → https://goo.gle/GoogleCloudTech
#GoogleCloud #AIAgents
Speakers: Annie Wang
Products Mentioned: AI Infrastructure

Google Cloud Tech

Helping you build what's next with secure infrastructure, developer tools, APIs, data analytics and machine learning....

What are domain specific language models?

Cloud Run Networking explained (Updated!)

Fundamentals of ADK - Learning Path

Looker and AlloyDB: The ultimate stack for near real time operational business intelligence

How to authenticate Google Cloud Client Libraries

Reinforcement learning on TPU demo

Private routing to Google with Network Connectivity Center

Running a multi-agent AI architecture

How to get started with Google Cloud Client Libraries

BigQuery Migration Service: Validation and optimization

Build an AI agent with Gemini CLI and Agent Development Kit

Mainframe Connector demo series

Accelerate with AI debrief

Quickstart: Conversational Analytics with GCP Billing and Looker

Scaling your AI agent architecture with Cloud Run

Building AI agents that speak to each other

What are Google Cloud Client Libraries?

Architecting multi-agent systems

BigQuery Migration Service: SQL and data transfer

Can we build the ultimate AI co-founder in 72 hours with Gemini?

Reinforcement learning & fine-tuning on TPUs | The Agent Factory Podcast

Building a life saving MCP server on Cloud Run (Avalanche demo)

What is Cluster Director?

Serving open models on Vertex AI: The comprehensive developer's guide

How to evaluate agents in practice

Antigravity and Nano Banana Pro with Remik | The Agent Factory Podcast

How to build context systems for AI agents

Run MongoDB compatible apps on Firestore (Zero code changes)

Stop coding, start architecting: Google Antigravity + Cloud Run

[Demo] Network Security Integration with Palo Alto

How to build a financial analyst assistant with Vertex AI Studio & Gemini in under 10 minutes

The agent evaluation revolution

Agent sandbox and Pod snapshotting: Supercharging agents on GKE | The Agent Factory Podcast

Leveraging the Looker connector in Looker Studio

How to assess data lake and data warehouse migrations to BigQuery

Refining your vision: A guide to AI image editing

From text to vision: An intro to AI image generation

Evolving your story: A guide to AI video editing

Bringing ideas to life: An intro to AI video generation

Building with Gemini 3, AI Studio, Antigravity, and Nano Banana | The Agent Factory Podcast

Fine-tuning open LLMs on GKE: The implementation gap

Video avatar agent | The Agent Factory Podcast

Gemini CLI: Write and deploy a Cloud Run app in 5 minutes

Build ANYTHING with Gemini 3 | The Agent Factory Podcast

Building Your Own MCP Server with ADK

This AI agent runs on Cloud Run + NVIDIA GPUs

Scaling AI with Google Cloud's TPUs

Deploying scalable and reliable AI inference on Google Cloud

Serving AI models at scale with vLLM

AI workload orchestration options

AI/ML frameworks for cloud TPUs

Model types and performance bottlenecks

AI workload storage options

Connecting ADK Agents to MCP Servers

Use the Gemini CLI Jules and Observability extensions together

Introduction to Vertex AI Agent Engine

Power your AI agents with MCP tools on Google Cloud Run

Use the Gemini CLI Jules and security extensions to fix security vulnerabilities in the background

Use the Jules extension for Gemini CLI to fix multiple GitHub issues

Dataplex fundamentals: Aspects & glossaries

We tried to jailbreak our AI (and Model Armor stopped it)

Parallel bug fixing & unit testing with Jules and Observability extensions for Gemini CLI

How to fix security vulnerabilities with the Jules and security extensions for Gemini CLI

How to fix multiple GitHub issues at once using the Jules extension for Gemini CLI

The path to AI inferencing on GKE Part 1: Guided model research

Vibe coding with Google AI Studio | The Agent Factory

Is it possible to create a model agnostic prompt?

Building agentic RAG for e-commerce with ADK and Vector Search

Demo: Vibe coding a command line Markdown viewer with the Gemini CLI

Don't guess: How to benchmark your AI prompts

Identity and Access Management for Agents

ComfyUI on GKE for Genmedia solutions

Meet Cloud SQL: Google Cloud's fully managed and intelligent relational database service

Autoscaling Your AI Agent Under Load

Common Looker CI errors (and how to tackle them)

Multi-agent vs. single-agent: Which should you use?

Spanner: The always-on, virtually unlimited scale database

Building an AI tutor that ACTUALLY remembers you

Agent Sessions and Tool Authentication

How to build a multi-agent app with ADK and Gemini