From Telemetry Data to CSVs with Python, Spark and Azure Databricks
Offered By: EuroPython Conference via YouTube
Course Description
Overview
Explore an ETL solution for transforming telemetry data into CSV files using Python, Spark, and Azure Databricks in this EuroPython 2021 conference talk. Learn how Tenova, an engineering company, collects and processes data from industrial equipment to support data scientists and process engineers in developing analytics solutions and retraining AI models. Discover the implementation of Databricks Notebooks that leverage PySpark and Pandas to manipulate raw JSON Lines files into formatted CSVs, meeting specific requirements such as device-specific data, daily file generation, and midnight value recording. Gain insights into the architecture, variable handling, and various notebook functionalities involved in this data transformation process, as well as the integration with Azure DataFactory for daily execution.
Syllabus
Intro
Title
Agenda
Nicol Giso
Customer plants
Tenova Industrial IoT platform
Data security
Data application
Sample file
Requirements
Architecture
Variables
Merge Notebook
Update Last Values Notebook
Read Table Notebook
Update DataFrame
Timestamp Notebook
Data Notebook
Databricks Release Pipeline
Azure Databricks
Next steps
Questions
Taught by
EuroPython Conference
Related Courses
Interprofessional Healthcare InformaticsUniversity of Minnesota via Coursera Data Science at Scale - Capstone Project
University of Washington via Coursera Implementing ETL with SQL Server Integration Services
Microsoft via edX Introduzione a R
University of Modena and Reggio Emilia via EduOpen Практики работы с данными средствами Power Query и Power Pivot
Saint Petersburg State University via Coursera