Deep Learning Pipelines for High Energy Physics Using Apache Spark and Distributed Keras
Offered By: Databricks via YouTube
Course Description
Overview
Syllabus
Intro
Experimental High Energy Physics is Data Intensive
Key Data Processing Challenge
Data Flow at LHC Experiments
R&D - Data Pipelines
Particle Classifiers Using Neural Networks
Deep Learning Pipeline for Physics Data
Analytics Platform at CERN
Hadoop and Spark Clusters at CERN
Step 1: Data Ingestion • Read input files: 4.5 TB from custom (ROOT) format
Feature Engineering
Step 2: Feature Preparation Features are converted to formats suitable for training
Performance and Lessons Learned • Data preparation is CPU bound
Neural Network Models and
Hyper-Parameter Tuning-DNN • Hyper-parameter tuning of the DNN model
Deep Learning at Scale with Spark
Spark, Analytics Zoo and BigDL
BigDL Run as Standard Spark Programs
BigDL Parameter Synchronization
Model Development - DNN for HLF • Model is instantiated using the Keras- compatible API provided by Analytics Zoo
Model Development - GRU + HLF A more complex network topology, combining a GRU of Low Level Feature + a DNN of High Level Features
Distributed Training
Performance and Scalability of Analytics Zoo/BigDL
Results - Model Performance
Workload Characterization
Training with TensorFlow 2.0 Training and test data
Recap: our Deep Learning Pipeline with Spark
Model Serving and Future Work
Summary • The use case developed addresses the needs for higher efficiency in event filtering at LHC experiments • Spark, Python notebooks
Labeled Data for Training and Test • Simulated events Software simulators are used to generate events
Taught by
Databricks
Related Courses
Vers l'infiniment petit - Voyages de l'infiniment grand à l'infiniment petitÉcole Polytechnique via Coursera Introduction to Particle Accelerators (NPAP MOOC)
Lund University via Coursera Vers l'infiniment petit
École Polytechnique via Coursera Question Reality: Matter
Dartmouth College via Coursera The Higgs Boson and Beyond
The Great Courses Plus