Quick Gemma 2 deployment with Hugging Face // TRAIN BRAIN

Quick Gemma 2 deployment with Hugging Face

Tutorial: Serve Gemma on GKE with TGI → https://goo.gle/4fFKt2Q
Learn more about TGI (text generation inference) from Hugging Face → https://goo.gle/4e7qusz
Hugging Face Deep Learning containers for Google Cloud → https://goo.gle/3BPaYUM
Text Generation Inference (TGI) is a toolkit for deploying and serving Large Language Models (LLMs). TGI enables high performance text generation for the most popular open LLMs. Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models. Watch along as Googlers Wietse Venema and Mofi Rahman demonstrate how to deploy Gemma 2 with 27 billion parameters on Google Kubernetes Engine using Hugging Face TGI.
Watch more Google Cloud: Building with Hugging Face → https://goo.gle/BuildWithHuggingFace
Subscribe to Google Cloud Tech → https://goo.gle/GoogleCloudTech
#GoogleCloud #HuggingFace
Speakers: Wietse Venema, Mofi Rahman
Products Mentioned: Gemma, Hugging Face Deep Learning containers, Google Kubernetes Engine

Google Cloud Tech

Helping you build what's next with secure infrastructure, developer tools, APIs, data analytics and machine learning....

Gemini APIs for advanced developers

How to use the Gemini APIs: Advanced techniques

Text to image with Google Cloud’s Vertex AI on Cloud Run

How to use customer-managed keys (CMEK)

Gemini Code Assist tools: Stay in the flow while coding

Google Cloud x MLB Hackathon - Building with Gemini Models

How to create Looker Studio Reports in Looker

How to use stable diffusion on Cloud Run

Your first workload with AI Hypercomputer

Deploy Gemma 2 with multiple LoRA adapters on GKE

Multimodal AI in action

How to prepare data for LLMs

Cloud migration insights from banking

Introduction to grounding with Gemini on Vertex AI

Fine-tuning open AI models using Hugging Face TRL

Run Hugging Function Face transformers on GPU enabled Cloud Run functions

Ollama and Cloud Run with GPUs

Cloud Run functions with Gemma 2 and Ollama

Running Diffusion with Cloud Run GPUs

Introduction to Gemini on Vertex AI

How do I know my AI app is working?

How to evaluate AI applications

Choosing between self-hosted GKE and managed Vertex AI to host AI models

How to autoscale a TGI deployment on GKE

Looker Conversational Analytics

RAG expansion for AI apps

Using RAG expansion to improve model speed and accuracy

Protecting sensitive data in AI apps

Learn Hybrid Search with Vertex AI Vector Search

Deploy HUGS on GKE with Hugging Face

New "task type" embedding from the DeepMind team improves RAG search quality

Quick Gemma 2 deployment with Hugging Face

What is an AI agent?

Intro to AI agents

RAG vs Model tuning vs Large prompt window

65K node Kubernetes AI Platform - A Reality

Semantic modeling for AI

Function calling for LLMs, what is it? ?

AI + your code: Function Calling

Deploy open models with TGI on Cloud Run

RAG with LangChain on Google Cloud

Looker's Chart Config Editor & Visualization Assistant

Advanced RAG techniques for better retrieval performance

Advanced RAG techniques for developers

Prompt engineering for developers

How to run anything on Google Axion Processors

Google Axion Processors, explained

GenAI is a game changer for podcasts ??

What are Hugging Face Deep Learning Containers?

Demystifying RAG for developers

How to use Retrieval Augmented Generation (RAG)

Architecting a RAG Podcast Summarizer #AI #Tech

Data insights with Looker for Google Workspace

Embeddings for AI/ML and RAG apps

Build RAG apps with embeddings

How to secure your cloud with VPC Service Controls

Leverage LLM strengths for your features ?

Deploy Hugging Face models from Vertex AI Model Garden

Introducing pipe syntax in BigQuery and Cloud Logging

Experience multimodal AI with manga ONE PIECE (ワンピース)

Gemini for Developers - Vertex AI

Architecting a healthcare and life sciences startup with Google Cloud

Cloud KMS Autokey best practices

Boost productivity with Gemini Code Assist's Multi-suggestion Feature

Meet AI’s multitool: Vector embeddings

What are text embeddings?

Visualize data with Looker Studio

Dataflow for IoT galactic research

How to customize Gemini Code Assist with your private code

Dataflow for real-time IoT analytics

Persistent runtimes for notebooks ❌⌛️? #ProTip

Modern generative AI use cases #shorts

Gemini for Developers - Security

A developer’s guide to LLMs

Gemini for Developers - RAG

Use GPUs in Cloud Run

Getting started with Vertex AI #shorts

How Volkswagen built its AI assistant pt. 2

Gemini at Work: Create a marketing campaign with Imagen 3

Gemini at Work: Extracting business-critical insights from video content