Deep Learning Pipelines for High Energy Physics Using Apache Spark and Distributed Keras
Offered By: Databricks via YouTube
Course Description
Overview
Syllabus
Intro
Experimental High Energy Physics is Data Intensive
Key Data Processing Challenge
Data Flow at LHC Experiments
R&D - Data Pipelines
Particle Classifiers Using Neural Networks
Deep Learning Pipeline for Physics Data
Analytics Platform at CERN
Hadoop and Spark Clusters at CERN
Step 1: Data Ingestion • Read input files: 4.5 TB from custom (ROOT) format
Feature Engineering
Step 2: Feature Preparation Features are converted to formats suitable for training
Performance and Lessons Learned • Data preparation is CPU bound
Neural Network Models and
Hyper-Parameter Tuning-DNN • Hyper-parameter tuning of the DNN model
Deep Learning at Scale with Spark
Spark, Analytics Zoo and BigDL
BigDL Run as Standard Spark Programs
BigDL Parameter Synchronization
Model Development - DNN for HLF • Model is instantiated using the Keras- compatible API provided by Analytics Zoo
Model Development - GRU + HLF A more complex network topology, combining a GRU of Low Level Feature + a DNN of High Level Features
Distributed Training
Performance and Scalability of Analytics Zoo/BigDL
Results - Model Performance
Workload Characterization
Training with TensorFlow 2.0 Training and test data
Recap: our Deep Learning Pipeline with Spark
Model Serving and Future Work
Summary • The use case developed addresses the needs for higher efficiency in event filtering at LHC experiments • Spark, Python notebooks
Labeled Data for Training and Test • Simulated events Software simulators are used to generate events
Taught by
Databricks
Related Courses
Neural Networks for Machine LearningUniversity of Toronto via Coursera Good Brain, Bad Brain: Basics
University of Birmingham via FutureLearn Statistical Learning with R
Stanford University via edX Machine Learning 1—Supervised Learning
Brown University via Udacity Fundamentals of Neuroscience, Part 2: Neurons and Networks
Harvard University via edX