Deploy open models with TGI on Cloud Run // TRAIN BRAIN

Deploy open models with TGI on Cloud Run

Tutorial: How to deploy Gemma 2 on Cloud Run with TGI → https://goo.gle/3Yoztjh
Get started with Cloud Run GPU → https://goo.gle/4ec7mJS
Docs: Text Generation Inference → https://goo.gle/4e7qusz
Start serving text generation inference with fast token speed and serve requests for a fraction of the cost of traditional methods. Watch along and learn how to deploy the Gemma 2 model to Cloud Run using Hugging Face TGI with Wietse Venema (Google) and Alvaro Bartolome (Hugging Face).
More resources:
Gemma 2 (9b) on the Hugging Face Hub → https://goo.gle/3C1vX6R
Hugging Face Deep Learning Containers for Google Cloud → https://goo.gle/3BPaYUM
Watch more Google Cloud: Building with Hugging Face → https://goo.gle/BuildWithHuggingFace
Subscribe to Google Cloud Tech → https://goo.gle/GoogleCloudTech
#GoogleCloud #HuggingFace
Speakers: Wietse Venema, Alvaro Bartolome
Products Mentioned: Gemma, Hugging Face, Cloud Run

Google Cloud Tech

Helping you build what's next with secure infrastructure, developer tools, APIs, data analytics and machine learning....

Google AI Studio: Your AI playground ??

Google AI Studio for Beginners

How to evaluate your Gen AI models with Vertex AI

Build an AI-powered recipe generator with Gemini on AI Studio

How to tune embeddings for generative AI on Vertex AI

Which version of Google Colab do I need?

Upgrading Colab: Pro, Pro+, and Enterprise

AI is everyone

How to choose the right machine learning algorithms accelerators

Google AI Studio for developers

The generative AI decision tree

Supercharge your Google Colab workflow ⚡

Integrate GitHub and Kaggle with Google Colab

How can developers get started with AI?

Using Cloud TPUs for open model fine-tuning

Dynamic Workload Scheduler for AI workloads

Ad-hoc reporting with Looker

Looker data insights: reports vs dashboards

Introduction to JAX with Pallas

How to use custom JAX kernels with Pallas

How to build interactive Gen AI applications with Vertex AI

Securing your AI inference pipeline

Why use GKE for AI/ML workloads?

Skip versions when upgrading clusters on GDC

Chatting with Gemini as a developer ?

How to use Gemini function calling with Cloud Run

Protect against vulnerabilities in your app with Red teaming

Google Cloud Tech

Securing your AI model development pipeline

Gemini APIs for advanced developers

How to use the Gemini APIs: Advanced techniques

Text to image with Google Cloud’s Vertex AI on Cloud Run

How to use customer-managed keys (CMEK)

Gemini Code Assist tools: Stay in the flow while coding

Google Cloud x MLB Hackathon - Building with Gemini Models

How to create Looker Studio Reports in Looker

How to use stable diffusion on Cloud Run

Your first workload with AI Hypercomputer

Deploy Gemma 2 with multiple LoRA adapters on GKE

Multimodal AI in action

How to prepare data for LLMs

Cloud migration insights from banking

Introduction to grounding with Gemini on Vertex AI

Fine-tuning open AI models using Hugging Face TRL

Run Hugging Function Face transformers on GPU enabled Cloud Run functions

Ollama and Cloud Run with GPUs

Cloud Run functions with Gemma 2 and Ollama

Running Diffusion with Cloud Run GPUs

Introduction to Gemini on Vertex AI

How do I know my AI app is working?

How to evaluate AI applications

Choosing between self-hosted GKE and managed Vertex AI to host AI models

How to autoscale a TGI deployment on GKE

Looker Conversational Analytics

RAG expansion for AI apps

Using RAG expansion to improve model speed and accuracy

Protecting sensitive data in AI apps

Learn Hybrid Search with Vertex AI Vector Search

Deploy HUGS on GKE with Hugging Face

New "task type" embedding from the DeepMind team improves RAG search quality

Quick Gemma 2 deployment with Hugging Face

What is an AI agent?

Intro to AI agents

RAG vs Model tuning vs Large prompt window

65K node Kubernetes AI Platform - A Reality

Semantic modeling for AI

Function calling for LLMs, what is it? ?

AI + your code: Function Calling

Deploy open models with TGI on Cloud Run

RAG with LangChain on Google Cloud

Looker's Chart Config Editor & Visualization Assistant

Advanced RAG techniques for better retrieval performance

Advanced RAG techniques for developers

Prompt engineering for developers

How to run anything on Google Axion Processors

Google Axion Processors, explained

GenAI is a game changer for podcasts ??

What are Hugging Face Deep Learning Containers?

Demystifying RAG for developers

How to use Retrieval Augmented Generation (RAG)