Scalable Machine Learning with PyTorch, Kubeflow and Kubernetes

Track: Artificial Intelligence
In the dynamic world of machine learning, data scientists and engineers face a constant challenge to design, train, and deploy models at scale while maintaining flexibility and reproducibility. PyTorch, a popular deep learning framework, Kubeflow, an advanced machine learning platform built on Kubernetes, and Kubernetes, a leading container orchestration system, have emerged as a triumphant trio to tackle these complex demands. With Kubeflow, data teams can effortlessly manage distributed training, hyperparameter tuning, and model serving, leveraging the scalability and fault tolerance of Kubernetes. This talk will explore the seamless integration of PyTorch with Kubeflow over Kubernetes orchestration platform to efficiently design and train models on any scale. To begin, we will provide an overview of PyTorch, highlighting its strengths in building dynamic and efficient deep learning models. Next, we will delve into Kubeflow's core capabilities, elucidating how it extends Kubernetes to orchestrate end-to-end machine learning workflows.
Mo Haghighi
Dr Mo Haghighi is distinguished engineer/director for Cloud Platform and Infrastructure at Discover Financial Services. His current focus is hybrid and multi-cloud strategy, application modernisation and automating application/workload migration across public and private clouds. Previously, he held various leadership positions as a program director at IBM, where he led Developer Ecosystem and Cloud Engineering teams in 27 countries across Europe, Middle East and Africa. Prior to IBM, he was a research scientist at Intel and a Java developer at Sun Microsystems/Oracle. Mo obtained a PhD in computer science, and his primary areas of expertise are distributed and edge computing, cloud native, IoT and AI, with several publications and patents in those areas. Mo is a regular keynote/speaker at major developer conferences including DevOpsCon, Java/Code One, Codemotion, DevRelCon, O’Reilly, The Next Web, DevNexus, IEEE/ACM, ODSC, AiWorld, CloudConf and Pycon.