Accelerating AI inference workloads // TRAIN BRAIN

Accelerating AI inference workloads

Deploying AI models at scale demands high-performance inference capabilities. Google Cloud offers a range of cloud tensor processing units (TPUs) and NVIDIA-powered graphics processing unit (GPU) VMs. Join Debi Cabrera as she sits down with Alex Spiridonov, Group Product Manager, to discuss key considerations for choosing TPUs and GPUs for your inference needs. Watch along and understand the cost implications, how to deploy and optimize your inference pipeline on Google Cloud, and more!
Chapters:
0:00 - Meet Alex
2:52 - Balancing cost and efficiency
5:51 - TPU vs GPU for AI models
8:21 - Getting started with Google Cloud TPUs and GPUs
10:05 - Common challenges when using inference optimization
12:10 - Available resources for AI inference workloads
13:13 - Wrap up
Resources:
Watch the full session here → https://goo.gle/3JC32qx
Check out Alex’s blog post → https://goo.gle/3wa2DZb
JetStream GitHub → https://goo.gle/49SoSRj
MaxDiffusion GitHub → https://goo.gle/4aQ1g11
MaxText GitHub → https://goo.gle/49SoYZb
Watch more Cloud Next 2024 → https://goo.gle/Next-24
Subscribe to Google Cloud Tech → https://goo.gle/GoogleCloudTech
#GoogleCloudNext #GoogleGemini
Event: Google Cloud Next 2024
Speakers: Debi Cabrera, Alex Spiridonov
Products Mentioned: Cloud TPUs, Cloud GPUs

Google Cloud Tech

Helping you build what's next with secure infrastructure, developer tools, APIs, data analytics and machine learning....

How to deploy all the JavaScript frameworks to Cloud Run

Use Gemini to build a web application from scratch on Google Cloud

Grounding for Gemini with Vertex AI Search and DIY RAG

Connect Gemini to real-world data using LangChain’s open-source capabilities

Boost performance of Go applications with profile guided optimization

Build generative AI agents with Vertex AI Agent Builder and Flutter

What's new in Google Cloud and Google Workspace

Tune and deploy Gemini with Vertex AI and ground with Cloud databases

Accelerate building AI applications with Cloud Run

Get Ready for Google I/O 2024 with Chloe Condon

Cloud Run Jobs overrides

What have Cloud Next 2024 attendees learned at Cloud Next?

What do Google Cloud Next attendees think of when they hear "Gen AI"?

Build your own LLM on Google Cloud

Empower your data with AI

Multi-cloud security made easy for DevOps

Generative AI for regulated industries

Google Cloud cybersecurity solutions

AI generated trivia games on Google Cloud

Google Security Operations with Gemini

Accelerate AI adoption

Modernization of manufacturing processes

Google Distributed Cloud for retail and manufacturing

How generative AI can support medical staff

How businesses can leverage Gemini

Extract value with assisted analytics

AI in action with Google Cloud and DataRobot

Agent Builder: Wikipedia PDF Search Widget #Shorts

AI solutions for healthcare

Gemini with AlloyDB for PostgreSQL

What is Gemini Code Assist?

What is semantic search?

Gemini in Security Operations

How CIOs can reduce cloud cost

The UPS network of the future

How can Gemini help tech employees be more productive?

Unlocking Document Automation with Vertex AI Platform

Google Cloud security solutions

Accelerating AI inference workloads

Scaling and innovating with Google Cloud databases

Google Cloud Marketplace for deploying AI models faster

Use Gemini for Google Cloud

Analyze audio in BigQuery with Speech-to-Text

Google Cloud Serverless: Accelerate your AI

Generative AI with Google Cloud for founders and startups

Analyze documents in BigQuery with Document AI

Easy and affordable access to GPUs for AI/ML workloads

Build AI apps on Google Cloud with LangChain

BigQuery vector search and embedding generation

What is GKE Enterprise?

Protecting workloads with Google Cloud next generation firewall

Google Cloud’s approach to generative AI

Analyze data in BigQuery using Gemini models

Vertex AI Search and Conversation: Wikipedia PDF Search Widget

How to replicate from MySQL to BigQuery with Datastream

How to create a Datastream Connection Profile

Google Cloud Storage: managing billions of objects

Generative AI app development with Cloud SQL for PostgreSQL

Accelerate time to market with Gemini in Apigee

Accelerate time to market with Gemini

What’s next with generative AI at Google Cloud in under 9 minutes

What’s next for data analytics in the AI era in under 7 minutes

Workload-optimized and AI-powered infrastructure in under 9 minutes

What's next for security professionals in under 4 minutes

Unlocking AI with a platform strategy in 10 minutes

Supercharge app dev with generative AI in under 7 minutes

What's next for Google Cloud databases in under 10 minutes

BigQuery data canvas overview

Google Cloud Next '24 Developer Keynote in under 10

Introduction to Gemini in BigQuery

Cloud NGFW Enterprise demo

2024 Developer Keynote recap with Kaslin Fields

2024 Developer Keynote recap with Chloe Condon

New posture and infrastructure as code protection

Thank you developers for a fun Cloud Next

Build generative apps faster with Vertex AI

Scaling your applications with Cloud SQL for MySQL

Can AI analytics be used with sports other than golf?

Building Gen AI apps with Google databases and cloud runtimes

How does AI for sports work?