Building Reproducible Distributed Applications at Scale
Offered By: EuroPython Conference via YouTube
Course Description
Overview
Explore the challenges and solutions for packaging Python code in distributed computing environments through this conference talk. Dive into various methods for deploying Python code to compute clusters, examining the role of Python's pickling feature and self-contained executables. Learn about the complexities of shipping code to large-scale clusters with thousands of nodes running jobs like TensorFlow or Spark. Discover how to execute a PySpark job on S3 storage using PEX as a self-contained executable artifact. Gain insights into generalizing these concepts for different job types, virtual environments, and distributed storage systems. Walk away with an understanding of Python packaging challenges for distributed applications and practical code samples applicable to your own projects.
Syllabus
Fabian Höring - Building reproducible distributed applications at scale
Taught by
EuroPython Conference
Related Courses
Fundamentals of Scalable Data ScienceIBM via Coursera Data Science and Engineering with Spark
Berkeley University of California via edX Master of Machine Learning and Data Science
Imperial College London via Coursera Data Analysis Using Pyspark
Coursera Project Network via Coursera Building Machine Learning Pipelines in PySpark MLlib
Coursera Project Network via Coursera