Use GPUs in Cloud Run // TRAIN BRAIN

Use GPUs in Cloud Run

Sign up for the preview → https://goo.gle/3NnobXv
GPU best practices → https://goo.gle/4elRpBE
Run LLM inference on Cloud Run GPUs with Ollama → https://goo.gle/3BwN6F1
Cloud Run, known for its scalability, now incorporates GPUs, ushering in a new era for machine learning inference. Join Googlers Martin Omander and Wietse Venema as they provide a practical demonstration of deploying Google's Gemma 2, an open-source large language model, through Ollama on Cloud Run.
Chapters:
0:00 - Intro
0:22 - Google Vertex AI vs GPUs with Cloud Run
1:12 - AI app architecture
2:04 - [Demo] Deploying Ollama API
3:26 - [Demo] Testing the deployment
5:28 - [Demo] Build & deploy the front end
6:02 - How do GPUs scale on Cloud Run?
6:34 - Where are Gemma 2 model files stored?
7:12 - Getting started with GPUs in Cloud Run
More Resources:
Cloud Run pricing → https://goo.gle/3BeMhAD
Watch more Serverless Expeditions → https://goo.gle/ServerlessExpeditions
Subscribe to Google Cloud Tech → https://goo.gle/GoogleCloudTech
#ServerlessExpeditions #GoogleCloud
Speaker: Martin Omander, Wietse Venema
Products Mentioned: Cloud Run, Gemma

Google Cloud Tech

Helping you build what's next with secure infrastructure, developer tools, APIs, data analytics and machine learning....

Google AI Studio: Your AI playground ??

Google AI Studio for Beginners

How to evaluate your Gen AI models with Vertex AI

Build an AI-powered recipe generator with Gemini on AI Studio

How to tune embeddings for generative AI on Vertex AI

Which version of Google Colab do I need?

Upgrading Colab: Pro, Pro+, and Enterprise

AI is everyone

How to choose the right machine learning algorithms accelerators

Google AI Studio for developers

The generative AI decision tree

Supercharge your Google Colab workflow ⚡

Integrate GitHub and Kaggle with Google Colab

How can developers get started with AI?

Using Cloud TPUs for open model fine-tuning

Dynamic Workload Scheduler for AI workloads

Ad-hoc reporting with Looker

Looker data insights: reports vs dashboards

Introduction to JAX with Pallas

How to use custom JAX kernels with Pallas

How to build interactive Gen AI applications with Vertex AI

Securing your AI inference pipeline

Why use GKE for AI/ML workloads?

Skip versions when upgrading clusters on GDC

Chatting with Gemini as a developer ?

How to use Gemini function calling with Cloud Run

Protect against vulnerabilities in your app with Red teaming

Google Cloud Tech

Securing your AI model development pipeline

Gemini APIs for advanced developers

How to use the Gemini APIs: Advanced techniques

Text to image with Google Cloud’s Vertex AI on Cloud Run

How to use customer-managed keys (CMEK)

Gemini Code Assist tools: Stay in the flow while coding

Google Cloud x MLB Hackathon - Building with Gemini Models

How to create Looker Studio Reports in Looker

How to use stable diffusion on Cloud Run

Your first workload with AI Hypercomputer

Deploy Gemma 2 with multiple LoRA adapters on GKE

Multimodal AI in action

How to prepare data for LLMs

Cloud migration insights from banking

Introduction to grounding with Gemini on Vertex AI

Fine-tuning open AI models using Hugging Face TRL

Run Hugging Function Face transformers on GPU enabled Cloud Run functions

Ollama and Cloud Run with GPUs

Cloud Run functions with Gemma 2 and Ollama

Running Diffusion with Cloud Run GPUs

Introduction to Gemini on Vertex AI

How do I know my AI app is working?

How to evaluate AI applications

Choosing between self-hosted GKE and managed Vertex AI to host AI models

How to autoscale a TGI deployment on GKE

Looker Conversational Analytics

RAG expansion for AI apps

Using RAG expansion to improve model speed and accuracy

Protecting sensitive data in AI apps

Learn Hybrid Search with Vertex AI Vector Search

Deploy HUGS on GKE with Hugging Face

New "task type" embedding from the DeepMind team improves RAG search quality

Quick Gemma 2 deployment with Hugging Face

What is an AI agent?

Intro to AI agents

RAG vs Model tuning vs Large prompt window

65K node Kubernetes AI Platform - A Reality

Semantic modeling for AI

Function calling for LLMs, what is it? ?

AI + your code: Function Calling

Deploy open models with TGI on Cloud Run

RAG with LangChain on Google Cloud

Looker's Chart Config Editor & Visualization Assistant

Advanced RAG techniques for better retrieval performance

Advanced RAG techniques for developers

Prompt engineering for developers

How to run anything on Google Axion Processors

Google Axion Processors, explained

GenAI is a game changer for podcasts ??

What are Hugging Face Deep Learning Containers?

Demystifying RAG for developers

How to use Retrieval Augmented Generation (RAG)