From Telemetry Data to CSVs with Python, Spark and Azure Databricks
Offered By: EuroPython Conference via YouTube
Course Description
Overview
Explore an ETL solution for transforming telemetry data into CSV files using Python, Spark, and Azure Databricks in this EuroPython 2021 conference talk. Learn how Tenova, an engineering company, collects and processes data from industrial equipment to support data scientists and process engineers in developing analytics solutions and retraining AI models. Discover the implementation of Databricks Notebooks that leverage PySpark and Pandas to manipulate raw JSON Lines files into formatted CSVs, meeting specific requirements such as device-specific data, daily file generation, and midnight value recording. Gain insights into the architecture, variable handling, and various notebook functionalities involved in this data transformation process, as well as the integration with Azure DataFactory for daily execution.
Syllabus
Intro
Title
Agenda
Nicol Giso
Customer plants
Tenova Industrial IoT platform
Data security
Data application
Sample file
Requirements
Architecture
Variables
Merge Notebook
Update Last Values Notebook
Read Table Notebook
Update DataFrame
Timestamp Notebook
Data Notebook
Databricks Release Pipeline
Azure Databricks
Next steps
Questions
Taught by
EuroPython Conference
Related Courses
Azure Data Engineer con Databricks y Azure Data FactoryCoursera Project Network via Coursera Operationalizing Microsoft Azure AI Solutions
Pluralsight Building Your First ETL Pipeline Using Azure Databricks
Pluralsight Implementing an Azure Databricks Environment in Microsoft Azure
Pluralsight Building Batch Data Processing Solutions in Microsoft Azure
Pluralsight