From Telemetry Data to CSVs with Python, Spark and Azure Databricks
Offered By: EuroPython Conference via YouTube
Course Description
Overview
Explore an ETL solution for transforming telemetry data into CSV files using Python, Spark, and Azure Databricks in this EuroPython 2021 conference talk. Learn how Tenova, an engineering company, collects and processes data from industrial equipment to support data scientists and process engineers in developing analytics solutions and retraining AI models. Discover the implementation of Databricks Notebooks that leverage PySpark and Pandas to manipulate raw JSON Lines files into formatted CSVs, meeting specific requirements such as device-specific data, daily file generation, and midnight value recording. Gain insights into the architecture, variable handling, and various notebook functionalities involved in this data transformation process, as well as the integration with Azure DataFactory for daily execution.
Syllabus
Intro
Title
Agenda
Nicol Giso
Customer plants
Tenova Industrial IoT platform
Data security
Data application
Sample file
Requirements
Architecture
Variables
Merge Notebook
Update Last Values Notebook
Read Table Notebook
Update DataFrame
Timestamp Notebook
Data Notebook
Databricks Release Pipeline
Azure Databricks
Next steps
Questions
Taught by
EuroPython Conference
Related Courses
Understanding China, 1700-2000: A Data Analytic Approach, Part 1The Hong Kong University of Science and Technology via Coursera The Analytics Edge
Massachusetts Institute of Technology via edX 大数据与信息传播 Big Data and Information Dissemination
Fudan University via Coursera The Future of Fashion
Marist College via Independent The Mobile Consumer
Marist College via Independent