Your Coflow has Many Flows - Sampling them for Fun and Speed
Offered By: USENIX via YouTube
Course Description
Overview
Explore a conference talk on improving coflow scheduling for enhanced data-intensive application performance. Learn about Philae, a novel online coflow scheduler that leverages the spatial dimension of coflows to reduce overhead in coflow size learning. Discover how this approach utilizes flow sampling to estimate average flow size and implements Shortest Coflow First scheduling. Examine the robustness of sampling-based learning to flow size skew and its scalability benefits. Analyze comparative performance results against prior art Aalo, showcasing significant reductions in coflow completion time across various testbed sizes and production cluster traces. Gain insights into the technical aspects of coflow scheduling, including challenges, practical issues, and comparisons with other approaches like Coda.
Syllabus
Introduction
Big Data Analytics
MapReduce
Communication Phase
Coflow Abstraction
Online Coflow Healing
Proposed Online Coflow
Outline
Example
Primary Drawbacks
Intrinsic Overhead
Roundrobin
Recap
Doubts about Sampling
Practical Issues
Valuation of Fillet
Fillet Speedup
Fillet Job Speed
Fillet Sensitivity
Summary
Mario Agassi
Practical Challenges
Comparison with Coda
Taught by
USENIX
Related Courses
Big Data Analytics in HealthcareGeorgia Institute of Technology via Udacity Mining Massive Datasets
Stanford University via edX The Caltech-JPL Summer School on Big Data Analytics
California Institute of Technology via Coursera Big Data Analytics for Healthcare
Georgia Institute of Technology via Coursera Data Lakes for Big Data
EdCast