YoVDO

Using Open Source Tech to Swap Out Components of Your Data Pipeline

Offered By: Devoxx via YouTube

Tags

Devoxx Courses Java Courses Cloud Computing Courses Apache Airflow Courses Apache Kafka Courses Apache Beam Courses Data Processing Courses Data Pipelines Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the evolution of data pipelines in this 48-minute conference talk from Devoxx. Learn how open-source technologies like Apache Beam and Apache Airflow have revolutionized data processing, offering flexibility and cost-effectiveness compared to traditional monolithic stacks. Discover how to schedule and run both streaming and batch data processing jobs using the same underlying code. Follow a practical demonstration of building a data pipeline that connects Apache Kafka, Hadoop Flink, and Hive, and see how to easily transition to Pub/Sub, Dataflow, and BigQuery by modifying a few lines of Java in Apache Beam. Gain insights into deploying these solutions across various cloud platforms, including Oracle Cloud. Presented by Rustam Mehmandarov, a Java Champion and Google Developers Expert for Cloud, this talk covers pipeline definition, data overview, parallel processing, file structures, local and cloud runners, and key takeaways for implementing flexible, scalable data pipelines using open-source technologies.

Syllabus

Introduction
About Rustam
Pipeline definition
Data overview
tabular form
parallel processing
Paralysis
File Structure
Local Runner
Cloud Runner
Data Flow
File System
Build Success
Takeaways


Taught by

Devoxx

Related Courses

Coding the Matrix: Linear Algebra through Computer Science Applications
Brown University via Coursera
كيف تفكر الآلات - مقدمة في تقنيات الحوسبة
King Fahd University of Petroleum and Minerals via Rwaq (رواق)
Datascience et Analyse situationnelle : dans les coulisses du Big Data
IONIS via IONIS
Data Lakes for Big Data
EdCast
統計学Ⅰ:データ分析の基礎 (ga014)
University of Tokyo via gacco