YoVDO

From Telemetry Data to CSVs with Python, Spark and Azure Databricks

Offered By: EuroPython Conference via YouTube

Tags

EuroPython Courses Python Courses pandas Courses PySpark Courses Data Transformation Courses Data Analytics Courses Azure Databricks Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore an ETL solution for transforming telemetry data into CSV files using Python, Spark, and Azure Databricks in this EuroPython 2021 conference talk. Learn how Tenova, an engineering company, collects and processes data from industrial equipment to support data scientists and process engineers in developing analytics solutions and retraining AI models. Discover the implementation of Databricks Notebooks that leverage PySpark and Pandas to manipulate raw JSON Lines files into formatted CSVs, meeting specific requirements such as device-specific data, daily file generation, and midnight value recording. Gain insights into the architecture, variable handling, and various notebook functionalities involved in this data transformation process, as well as the integration with Azure DataFactory for daily execution.

Syllabus

Intro
Title
Agenda
Nicol Giso
Customer plants
Tenova Industrial IoT platform
Data security
Data application
Sample file
Requirements
Architecture
Variables
Merge Notebook
Update Last Values Notebook
Read Table Notebook
Update DataFrame
Timestamp Notebook
Data Notebook
Databricks Release Pipeline
Azure Databricks
Next steps
Questions


Taught by

EuroPython Conference

Related Courses

Interprofessional Healthcare Informatics
University of Minnesota via Coursera
Data Science at Scale - Capstone Project
University of Washington via Coursera
Implementing ETL with SQL Server Integration Services
Microsoft via edX
Introduzione a R
University of Modena and Reggio Emilia via EduOpen
Практики работы с данными средствами Power Query и Power Pivot
Saint Petersburg State University via Coursera