Unlocking Local LLMs with Quantization - Marc Sun, Hugging Face // TRAIN BRAIN

Unlocking Local LLMs with Quantization - Marc Sun, Hugging Face

Unlocking Local LLMs with Quantization - Marc Sun, Hugging Face
This talk will share the story of quantization, its rise in popularity, and its current status in the open-source community. We'll begin by reviewing key quantization papers, such as QLoRA by Tim Dettmers and GPTQ by Elias Frantar. Next, we'll demonstrate how quantization can be applied at various stages of model development, including pre-training, fine-tuning, and inference. Specifically, we'll share our experience in pre-training a 1.58-bit model, show how fine-tuning is achievable using PEFT + QLoRA, and discuss optimizing inference performance with torch.compile or custom kernels. Finally, we'll highlight efforts within the community to make quantized models more accessible, including how transformers incorporate state-of-the-art quantization schemes and how to run GGUF models from llama.cpp.

The Linux Foundation

The Linux Foundation is a nonprofit consortium dedicated to fostering the growth of Linux and collaborative software development. Founded in 2000, the organization sponsors the work of Linux creator Linus Torvalds and promotes, protects and advances the L...

Why should you read Decentralization and AI?

What’s next for open source?

The Linux Foundation 2024 Annual Report

What are the unique needs of open-source developers? Hilary Carter

Open source in Europe: Opportunities, challenges, and the path forward in 2024

? A More Open Digital World Awaits

Take advantage of CyberWeek savings with certifications from the Linux Foundation

Three K8s Certifications with BIG savings until December 11

Confidential Computing Consortium is democratizing Confidential Computing | Mike Bursell

BIG CyberWeek Savings on Kuberketes Certifications from the Linux Foundation

CyberWeek is here! Save BIG on Linux Foundation courses and certifications

Save BIG on Linux Foundation CyberWeek Training and Certifications

2024 GENERATIVE AI Report by LF Research , CNCF, and LF AI & Data

Standardizing telco APIs: CAMARA's vision for global connectivity | Markus Kümmerle

The 2024 State of OSPOs and Open Source Management

NASA & ESA collaborate with Linux Foundation on Linux for space applications

Advancing developer relations: The Linux Foundation’s new initiative explained

Linux Foundation Member Summit 2024 - Keynote Sessions

Keynote: Closing Remarks - Bailey Hayes, CTO, Cosmonic

Keynote: Welcome Back Remarks - Bailey Hayes, CTO, Cosmonic

Behind the Scenes: Debugging Kotlin/Wasm - Artem Kobzar, JetBrains

Keynote: Safety-Critical Meets Web-Native: Wasm Revolutionizes Embedded Systems - Panel Discussion

Keynote: FaaS Platform Engineering with Wasm- Vamsi Sangavarapu, VP of Engineering, American Express

Wasm Ecosystem Q&A - Alex Crichton, Fermyon

Workshop: Choose Your Own Adventure: Wasm Edition - Bailey Hayes & Taylor Thomas, Cosmonic

Keynote: WebAssembly & the Future of NGINX - Oscar Spencer, Principal Engineer, F5 NGINX

wRPC: Distributed Components, No Assembly Required - Roman Volosatovs & Taylor Thomas, Cosmonic

Panel Discussion: Playing Safely in the... Ram Iyengar, Ralph Squillace, Luke Wagner & Bailey Hayes

Sponsored Keynote: WordPress Meets WASM: Full Power of the CMS In Any... Jason Bahl, Code Wrangler

Keynote: FaaS Platform Engineering with Wasm - Vamsi Sangavarapu, VP of Engineering

Whamm! A WebAssembly Bytecode Instrumentation DSL - Elizabeth Gilbert

Keynote: Welcome & Opening Remarks - Nigel Poulton, Author and Video Trainer

Distributing and Running Containers for Wasm-Enabled Environments - Kohei Tokunaga, NTT

Unleash the Power of Open Source WASM on a Hyper-Distributed Clo... Colin Murphy & Douglas Rodrigues

The Future of Green Cloud: Serverless WebAssembly on Arm64 Archi... Kate Goldenring & Aaron Williams

WebAssembly at Google - Thomas Nattestad & Thomas Steiner, Google

Wasi-Webgpu: Build Beautiful Graphics and Safely Run AI with Any GPU - Sean Isom & Mendy Berger

Lightning Talk: From Server to Client: Ruby on Rails on WebAssembly - Vladimir Dementyev

Composable Concurrency for WebAssembly Components - Joel Dice, Fermyon Technologies

Keynote - Larry Carvalho, Chris Woods, Dan Mihai Dumitriu, Stephen Berard, Moderated by Divya Mohan

Mentorship Session: Using Clang and LLVM to Build the Linux Kernel

LF Networking’s Thoth enables telcos to leverage domain-specific AI

Countdown to #KubeCon + #CloudNativeCon NA in Salt Lake City!

Why healthcare trails in open source adoption | Linux Foundation Research

LFX Mentorship Showcase - Moderated by Shuah Khan, The Linux Foundation

KubeCon + CloudNativeCon Experience

LF Live Webinar: GenAI & Coding: Prompts for Maximum Workflow

Open Source Summit Europe 2024 Vienna Highlights

Secure and Encrypted Boot in Zephyr RTOS - Parthiban N, Linumiz

The Power of Global Community Support at #KubeCon + #CloudNativeCon

Empowering Diversity and Collaboration at #Kubecon + #cloudnativecon

Rethinking Kubernetes Management #kubernetes #cloudnative

Improving Bpftrace Reliability - Daniel Xu, Meta

How Can Your OSPO Maximize Open Source Business Value for the Organization? - Masae Shida, Broadcom

Contributing to KernelCI for Better Testing and Collaboration - Arisu Tachibana, Cybertrust Japan Co

Step by Step, What Should We Do for the Kernel Ecosystem? - Hirotaka Motai, Cybertrust Japan

Democratizing Diffusion Models with Diffusers - Sayak Paul, Hugging Face

Exploring CXL Memory: Configuration and Emulation - Yasunori Goto, Fsas Technologies Inc.

A Next-generation IoT Platform for Edge AI Apps Leveraging Se...- Munehiro Shimomura & Kenji Shimizu

Unlocking Local LLMs with Quantization - Marc Sun, Hugging Face

Optimize Your AI Cloud Infrastructure: A Hardware Perspective - Liang Yan, CoreWeave

eBPF BoF - Shung-Hsi Yu, SUSE & Yutaro Hayakawa, Isovalent

Discover Valkey: The Path to Now - Hayato Tustsumi, AWS

Data Prep Kit: A Comprehensive Cloud-Native Toolkit for Scalable Da... - Daiki Tsuzuku & Takuya Goto

Recent TPM Security Enhancements to the Linux Kernel - James Bottomley, Microsoft

Middle-Platform Empowerment: Growing and Sustaining Open Source Projects...- Xiaoya Xia & Peggy Dong

Exploring Distributed Caching for Faster GPU Training with NVMe, GDS, and RDMA - Hope Wang & Bin Fan

Why #Kubecon + #cloudnativecon is amazing?

KubeCon Scholarship Journey #kubecon #cloudnativecon

Action Deduplication: Faster and Cheaper Remote Builds Without Lifting a Finger - Christian Scott

Building 1300 Container Images in 4 Minutes - Sushain Cherivirala, Stripe

Pigweed and Bazel for Embedded Development - Ted Pudlik, Google, LLC

Spotify's Journey to Releasing One of the World's Largest Apps wit... Luka Cindro & Gabriel Borglund

Aspect Build - Alex Eagle, Aspect Build Systems

BazelCon Keynote - Mícheál Ó Foghlú & Tobias Werth, Alex Eagle, Helen Altshuler, Chuck Grindel

State of the Union - Tobias Werth & John Field, Google

Post Mortems for 4 Years of Remote Execution - Ulf Adams, EngFlow Inc.

Performant Bazel Builds for Web Monorepos at Scale - Sharmila Jesupaul, Airbnb

Introducing Remote Bazel - Maggie Lou, BuildBuddy

Running a Start-up on Bazel - Prasanna Swaminathan, Ergatta