The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon India 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.
Please note: This schedule is automatically displayed in India Standard Time (UTC+5:30). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis.
Sign up or log in to bookmark your favorites and sync them to your phone or calendar.
Large Language Model (LLM) finetuning on enterprise private data has emerged as an important strategy for enhancing model performance on specific downstream tasks. This process however demands substantial compute resources, and presents some unique challenges in Kubernetes environments. This session offers a practical, step-by-step guide to implementing multi-node LLM finetuning on Kubernetes clusters with GPUs, utilizing PyTorch FSDP and the Kubeflow training operator. We'll cover - preparing a Kubernetes cluster for LLM finetuning, optimizing cluster, system, and network configurations, and comparing performance of various network topologies including pod networking, secondary networks, and GPU Direct RDMA over ethernet for peak performance. By the end of this session, the audience will have a comprehensive understanding of the intricacies involved in multi-node LLM finetuning on Kubernetes empowering them to introduce the same in their own production Kubernetes environments.
Dr. Ashish Kamra is an accomplished engineering leader with over 15 years of experience managing high-performing teams in AI, machine learning, and cloud computing. He joined Red Hat in March 2017, where he currently serves as the Senior Manager of AI Performance at Red Hat. In this... Read More →
I am an Engineer with extensive experience in optimizing large-scale, high-performance computing environments. My expertise includes network architecture, system performance tuning, and cloud infrastructure. I excel in solving complex technical challenges and improving efficiency... Read More →
Wednesday December 11, 2024 2:55pm - 3:30pm IST
Auditorium
This session will explore the AI/ML Framework (AIMLFW) within O-RAN SC (O-RAN Software Community) community, designed for dynamic and efficient 5G network management. Key Topics: - Introduction to O-RAN SC and AIMLFW: Overview of O-RAN’s architecture and mission. - AI/ML Use Cases in O-RAN: Real-world applications like traffic prediction and anomaly detection, supported by AIMLFW’s scalable platform. - Architecture and Components of AIMLFW: * Kubeflow for Model Training * KServe for Model Deployment * O-RAN Specification for AI/ML Workload Deployment * Core ML Lifecycle Components - Challenges and Solutions in AI/ML Deployment: Addressing common challenges in distributed 5G environments. - Future Directions and Community Collaboration: Potential integration with Flyte and MLflow for enhanced AI/ML workflow management.
Subhash Kumar Singh is a Senior Chief Engineer at Samsung, where he leads the AI/ML Framework (AIMLFW) project within the O-RAN Software Community (SC). Over the years, Subhash has been actively involved in several prominent open-source communities. His extensive experience in these... Read More →
Wednesday December 11, 2024 3:45pm - 4:20pm IST
Auditorium
In the era of digital transformation, PepsiCo is leading the way in integrating edge computing to ensure real-time data processing across its network. Utilizing lightweight Kubernetes solutions like K3s and RKE2, PepsiCo has built a platform that boosts computational capabilities at edge locations. Supported by Rancher and Longhorn, this platform enables efficient microservices deployment, providing the agility needed to meet dynamic market demands. A key component is the deployment of advanced ML models for camera and video inferences, which need substantial GPU resources. PepsiCo employs cutting-edge GPU sharing techniques to optimize these costly assets, improving performance and scalability while reducing costs. Join us to explore PepsiCo's edge computing strategy, its use of lightweight Kubernetes, and innovative GPU sharing techniques. Learn how PepsiCo is harnessing edge computing to drive operational excellence, sales growth and maintain a competitive edge.
I currently hold the position of Deputy Director of Integration Engineering in PepsiCo with 17 years of experience, I specialize in platform engineering and application development. With certifications in CKA, CKS, K3S, and Edge Architect, I’ve spent 6 years in platform strategy... Read More →
Praseed Naduvath is a techno-manager with over 18 years in IT, specializing in cloud infrastructure, container orchestration, and service mesh technologies. A Certified Kubernetes Administrator and Security Specialist, he excels in managing and securing complex Kubernetes environments... Read More →
Wednesday December 11, 2024 4:50pm - 5:25pm IST
Auditorium
OPEA, the Open Platform for Enterprise AI, is a new project with the Linux Foundation. It provides a framework of composable microservices for state-of-the-art GenAI systems including LLMs, data stores, and prompt engines to expedite enterprise adoption. It provides blueprints of end-to-end workflows for popular usage such as ChatQnA, CodeGen, and RAG systems. This talk explores the practical steps for deploying Generative AI (GenAI) applications in cloud-native environments using OPEA. It will show deployments on a Kubernetes cluster on a range of hyperscalers. A wide variety of data stores, including opensource vector databases and managed services, will be used to demonstrate RAG capabilities. We will show the results of scale testing with > 50K documents. The attendees will learn how to deploy GenAI applications in a cloud-native way using OPEA. Explicit contribution opportunities will be shared with the attendees.
Arun Gupta is vice president of Open Ecosystem Initiatives at Intel Corporation. He is an open source strategist, advocate, and practitioner for over two decades. He has taken companies such as Apple, Amazon, and Sun Microsystems through systemic changes to embrace open source principles... Read More →
Wednesday December 11, 2024 5:40pm - 6:15pm IST
Auditorium
Showcasing Azimuth - AstraZeneca’s cutting-edge Enterprise Cloud Native Machine Learning Platform. It is built on Kubernetes and integrates a diverse array of cloud-native tools, enabling seamless development, deployment, and management of machine learning workflows. My presentation will delve into the architecture, key components, real-world applications, and the integration with Cloudability for cost management, highlighting its role in empowering data science teams and accelerating innovation within AstraZeneca. The tech stack involves Kubeflow, Weights & Biases, Ray, Volcano Scheduler, Grafana, Prometheus, ELK, Harbor, NetApp Ontap FSx, Kyverno, GitHub Actions, ArgoCD, Argo Rollouts, CloudNativePG, etc
As organisations increasingly integrate AI solutions, the demand for environmentally sustainable practices within this space has never been more critical. This presentation delves into the collaborative effort between the Cloud Native Computing Foundation (CNCF) AI WG and the TAG Environmental Sustainability to define a repeatable design approach aimed at fostering sustainable AI in cloud-native environments. Our discussion will outline the crucial considerations in such approach, including efficient management of compute resources, storage optimisation, and advanced networking solutions. Attendees will gain insights into the lifecycle of AI/ML deployments, from inception through operation, emphasising resilience, scalability, and resource efficiency. By highlighting innovative "green" strategies, this session will provide actionable best practices and recommendations, alongside a forward-looking perspective on future trends and research directions in sustainable AI.
Vincent Caldeira, CTO of Red Hat in APAC, is responsible for strategic partnerships and technology strategy. Named a top CTO in APAC in 2023, he has 20+ years in IT, excelling in technology transformation in finance. An authority in open source, cloud computing, and digital transformation... Read More →
Thursday December 12, 2024 3:45pm - 4:20pm IST
Auditorium
Machine learning platforms aim to streamline the workflow for ML practitioners, allowing them to focus on developing their models while the platform handles repetitive tasks like packaging code, dependencies and configurations. Traditional methods using Dockerfiles require ML engineers to navigate complex Linux processes and maintain multiple Dockerfiles for different projects, which can be time-consuming and prone to errors. Additionally, security mandates for regular patching and updates adding further to the complexity. Join this talk to explore how Cloud Native Buildpacks can simplify and secure MLOps deployments. By automating the packaging of ML projects, including custom libraries and hardware specifications, Buildpacks enhance flexibility, maintainability and security. This approach reduces the operational burden on both developers and security teams, ensuring a more efficient and scalable MLOps deployment process
Aditya Soni is a DevOps/SRE tech professional He worked with Product and Service based companies including Red Hat, Searce, and is currently positioned at Forrester Research as a DevOps Engineer II. He holds AWS, GCP, Azure, RedHat, and Kubernetes Certifications.He is a CNCF Ambassador... Read More →
Senior Solution Engineer @ VMware by Broadcom, Broadcom
Suman is a Senior Cloud Native Architect at VMware. He is a consultant and advisor for Tanzu platform and help the customers and users in their journey of app modernisation adoption and cultural shift with DevOps best practices. Suman is a distinguished speaker in many community Meetups... Read More →
Thursday December 12, 2024 4:50pm - 5:25pm IST
Auditorium
Deploying large language models (LLMs) is inherently complex, challenging, and expensive. This case study demonstrates how Kubernetes, specifically Kserve with Modelcar OCI storage backend, simplifies the deployment and management of private LLM services. First, we explore how Kserve enables efficient and scalable model serving within a Kubernetes environment, allowing seamless integration and optimized GPU utilization. Second, we delve into how Modelcar OCI artifacts streamline artifact delivery beyond container images, reducing duplicate storage usage, increasing download speeds, and minimizing governance overhead. The session will cover implementation details, benefits, best practices, and lessons learned. Walk away learning how to leverage Kubernetes, Kserve, and OCI artifacts to enhance your MLOps journey, achieving significant efficiency gains and overcoming common challenges in deploying and scaling private LLM services.
Mayuresh Krishna is the CTO and Co-Founder of initializ.ai, where he drives product engineering, building AI models and private AI services. He has previously worked at VMware Tanzu as a Solution Engineering Leader & Pivotal Software as a Senior Platform Architect.
Thursday December 12, 2024 5:40pm - 6:15pm IST
Auditorium