Dependency Management in Spark Connect - Simple, Isolated, Powerful
Offered By: Databricks via YouTube
Course Description
Overview
Explore the new session-based dependency management system in Spark Connect, introduced in Apache Spark⢠3.5.0, through this 20-minute conference talk by Databricks engineers Akhil Gudesa and Hyukjin Kwon. Dive into the challenges of managing application environments in distributed computing and learn how Spark Connect addresses the limitations of static dependency setups. Discover the power of the Artifact API for dynamic dependency updates during runtime while maintaining strict isolation across sessions. Through practical examples, gain insights on creating, packaging, utilizing, and updating custom isolated environments for seamless execution of both Python and Scala applications. Enhance your understanding of flexible dependency management in distributed computing environments and explore additional resources on Data Lakehouse architecture and Lakehouse Fundamentals Training.
Syllabus
Dependency Management in Spark Connect: Simple, Isolated, Powerful
Taught by
Databricks
Related Courses
CS115x: Advanced Apache Spark for Data Science and Data EngineeringUniversity of California, Berkeley via edX Big Data Analytics
University of Adelaide via edX Big Data Essentials: HDFS, MapReduce and Spark RDD
Yandex via Coursera Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames
Yandex via Coursera Introduction to Apache Spark and AWS
University of London International Programmes via Coursera