YoVDO

Scaling Machine Learning Feature Engineering in Apache Spark at Facebook

Offered By: Databricks via YouTube

Tags

Apache Spark Courses Machine Learning Courses Data Processing Courses Facebook Courses Feature Engineering Courses

Course Description

Overview

Explore a 21-minute conference talk on scaling machine learning feature engineering in Apache Spark at Facebook. Dive into the implementation of Feature Injection and Feature Reaping techniques, including Spark core/SQL enhancements, indexed/aligned tables, and the new ORC FlatMap encoding. Learn about catalyst optimizations, new ORC physical encodings for feature maps, and the process of writing/committing indexed feature tables. Gain insights into Facebook's approach to improving prediction model quality through efficient data management and processing techniques in Spark.

Syllabus

Intro
Machine Learning at Facebook
Data Layouts (Tables and Physical Encodings)
Background: Apache ORC
How is a Feature Map Stored in ORC?
Introducing: ORC Flattened Map
Feature Reaping
Introducing: Aligned Table
Query Plan for Aligned Table
Reading Aligned Tables
End to End Performance
Summary
Future Work


Taught by

Databricks

Related Courses

Community Journalism: Digital and Social Media
Cardiff University via FutureLearn
Personal Branding for Social Networks
Build Academy via EdCast
Online Tools for Professional Success
Georgia Institute of Technology via Coursera
Social Media: How Media Got Social
Curtin University via edX
Introducción al Community Manager
Universidad ESAN via Miríadax