YoVDO

From Telemetry Data to CSVs with Python, Spark and Azure Databricks

Offered By: EuroPython Conference via YouTube

Tags

EuroPython Courses Python Courses pandas Courses PySpark Courses Data Transformation Courses Data Analytics Courses Azure Databricks Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore an ETL solution for transforming telemetry data into CSV files using Python, Spark, and Azure Databricks in this EuroPython 2021 conference talk. Learn how Tenova, an engineering company, collects and processes data from industrial equipment to support data scientists and process engineers in developing analytics solutions and retraining AI models. Discover the implementation of Databricks Notebooks that leverage PySpark and Pandas to manipulate raw JSON Lines files into formatted CSVs, meeting specific requirements such as device-specific data, daily file generation, and midnight value recording. Gain insights into the architecture, variable handling, and various notebook functionalities involved in this data transformation process, as well as the integration with Azure DataFactory for daily execution.

Syllabus

Intro
Title
Agenda
Nicol Giso
Customer plants
Tenova Industrial IoT platform
Data security
Data application
Sample file
Requirements
Architecture
Variables
Merge Notebook
Update Last Values Notebook
Read Table Notebook
Update DataFrame
Timestamp Notebook
Data Notebook
Databricks Release Pipeline
Azure Databricks
Next steps
Questions


Taught by

EuroPython Conference

Related Courses

Understanding China, 1700-2000: A Data Analytic Approach, Part 1
The Hong Kong University of Science and Technology via Coursera
The Analytics Edge
Massachusetts Institute of Technology via edX
大数据与信息传播 Big Data and Information Dissemination
Fudan University via Coursera
The Future of Fashion
Marist College via Independent
The Mobile Consumer
Marist College via Independent