GraphX - Graph Processing in a Distributed Dataflow Framework
Offered By: USENIX via YouTube
Course Description
Overview
Explore GraphX, a graph processing framework embedded within Apache Spark, in this conference talk from OSDI '14. Dive into the advantages of using general-purpose distributed dataflow systems for graph processing, challenging the notion that specialized graph systems are necessary. Learn how GraphX implements graph-specific optimizations using basic dataflow operators and achieves performance parity with specialized systems. Discover how this approach enables low-cost fault tolerance and supports a wider range of computations. Examine real-world workload evaluations, benchmarks for PageRank and Connected Components, and a demonstration of a small pipeline in GraphX. Gain insights into modern analytics, graph-parallel patterns, representation techniques, and join site selection using routing tables.
Syllabus
Intro
Modern Analytics
Separate Systems
Key Question
Graph-Parallel Pattern
Graph System Optimizations
Representation
Graph Operators (Scala)
Join Site Selection using Routing Tables Routing Vertex
Additional Optimizations
PageRank Benchmark
Connected Comp. Benchmark
A Small Pipeline in GraphX
Taught by
USENIX
Related Courses
Functional Programming Principles in ScalaÉcole Polytechnique Fédérale de Lausanne via Coursera Functional Program Design in Scala
École Polytechnique Fédérale de Lausanne via Coursera Parallel programming
École Polytechnique Fédérale de Lausanne via Coursera Big Data Analysis with Scala and Spark
École Polytechnique Fédérale de Lausanne via Coursera Functional Programming in Scala Capstone
École Polytechnique Fédérale de Lausanne via Coursera