Name: Multi-Node Finetuning LLMs on Kubernetes: A Practitioner’s Guide - Ashish Kamra & Boaz Ben Shabat, Red Hat
Start: 2024-12-11T14:55:00+0530
End: 2024-12-11T15:30:00+0530

In-person
11-12 December
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon India 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in India Standard Time (UTC+5:30). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis.

Wednesday December 11, 2024 2:55pm - 3:30pm IST

Room 1

Large Language Model (LLM) finetuning on enterprise private data has emerged as an important strategy for enhancing model performance on specific downstream tasks. This process however demands substantial compute resources, and presents some unique challenges in Kubernetes environments. This session offers a practical, step-by-step guide to implementing multi-node LLM finetuning on Kubernetes clusters with GPUs, utilizing PyTorch FSDP and the Kubeflow training operator. We'll cover - preparing a Kubernetes cluster for LLM finetuning, optimizing cluster, system, and network configurations, and comparing performance of various network topologies including pod networking, secondary networks, and GPU Direct RDMA over ethernet for peak performance. By the end of this session, the audience will have a comprehensive understanding of the intricacies involved in multi-node LLM finetuning on Kubernetes empowering them to introduce the same in their own production Kubernetes environments.

Speakers

ASHISH KAMRA

Senior Manager, Red Hat

Dr. Ashish Kamra is an accomplished engineering leader with over 15 years of experience managing high-performing teams in AI, machine learning, and cloud computing. He joined Red Hat in March 2017, where he currently serves as the Senior Manager of AI Performance at Red Hat. In this... Read More →

Boaz Ben Shabat

Senior AI Performance Engineer, Red Hat

I am an Engineer with extensive experience in optimizing large-scale, high-performance computing environments. My expertise includes network architecture, system performance tuning, and cloud infrastructure. I excel in solving complex technical challenges and improving efficiency... Read More →

Wednesday December 11, 2024 2:55pm - 3:30pm IST
Room 1

AI + ML

Content Experience Level Intermediate

KubeCon + CloudNativeCon India 2024

ASHISH KAMRA

Boaz Ben Shabat

Attendees (2)

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!