This AI agent runs on Cloud Run + NVIDIA GPUs
Source code for the smart health agent → https://goo.gle/4nJsFax
Have you ever wondered how to build a real AI agent application on a serverless NVIDIA GPU? In this video, Martin Omander (Google) sits down with Jay Rodge (NVIDIA) to walk through a complete setup. Jay demonstrates a smart health agent that runs on Cloud Run with an NVIDIA L4 GPU. Watch along as the duo dive right into the code and architecture.
See how Martin and Jay run open source models like Gemma with Ollama on Cloud Run, use LangGraph to build a multi-agent workflow (RAG + tools), explain the architecture for splitting a an app into a CPU frontend (with Gradio), showcase a GPU backend, and explain why a developer would host their own model vs. calling a managed API.
Chapters:
0:00 - Intro
0:40 - Demo of the smart health app
2:25 - How the app was built
5:00 - Code for multi-agent
5:33 - LangGraph vs ADK
5:50 - Hosting an LLM vs calling Gemini API
6:40 - Developer experience
7:06 - Wrap up
Watch more Serverless Expeditions → https://goo.gle/ServerlessExpeditions
? Subscribe to Google Cloud Tech → https://goo.gle/GoogleCloudTech
#GoogleCloud #CloudRun #Serverless
Speakers: Martin Omander, Jay Rodge
Products Mentioned: Cloud Run, Agent Development Kit
Google Cloud Tech
Helping you build what's next with secure infrastructure, developer tools, APIs, data analytics and machine learning....