Use GPUs in Cloud Run
Sign up for the preview → https://goo.gle/3NnobXv
GPU best practices → https://goo.gle/4elRpBE
Run LLM inference on Cloud Run GPUs with Ollama → https://goo.gle/3BwN6F1
Cloud Run, known for its scalability, now incorporates GPUs, ushering in a new era for machine learning inference. Join Googlers Martin Omander and Wietse Venema as they provide a practical demonstration of deploying Google's Gemma 2, an open-source large language model, through Ollama on Cloud Run.
Chapters:
0:00 - Intro
0:22 - Google Vertex AI vs GPUs with Cloud Run
1:12 - AI app architecture
2:04 - [Demo] Deploying Ollama API
3:26 - [Demo] Testing the deployment
5:28 - [Demo] Build & deploy the front end
6:02 - How do GPUs scale on Cloud Run?
6:34 - Where are Gemma 2 model files stored?
7:12 - Getting started with GPUs in Cloud Run
More Resources:
Cloud Run pricing → https://goo.gle/3BeMhAD
Watch more Serverless Expeditions → https://goo.gle/ServerlessExpeditions
Subscribe to Google Cloud Tech → https://goo.gle/GoogleCloudTech
#ServerlessExpeditions #GoogleCloud
Speaker: Martin Omander, Wietse Venema
Products Mentioned: Cloud Run, Gemma
Google Cloud Tech
Helping you build what's next with secure infrastructure, developer tools, APIs, data analytics and machine learning....