YoVDO

Tools and Approaches for Migrating Big Datasets to the Cloud

Offered By: Devoxx via YouTube

Tags

Devoxx Courses Big Data Courses Cloud Architecture Courses Cloud Migration Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore tools and strategies for migrating large-scale datasets to cloud platforms in this 47-minute Devoxx conference talk. Delve into the experiences of the Hotels.com big data platform team as they tackle the challenges of moving extensive data sets and pipelines from on-premises clusters to cloud-based solutions. Discover two open-source tools developed to overcome unexpected obstacles: Circus Train, a dataset replication tool for copying Hive tables between clusters and clouds, and Waggle Dance, a federated Hive query service enabling data querying across multiple Hive metastores. Learn about the unique features of these tools, their advantages over existing solutions, and how they've been successfully implemented to build a petabyte-scale platform now utilized by other Expedia brands. Gain insights into real-world problems and solutions encountered in a large, organically grown corporation, moving beyond idealized architectures to practical applications in big data migration.

Syllabus

Introduction
Agenda
Company structure
Data processing
Migrating jobs first
Its going to take years
Data search replication
Finding an open source solution
Naming your project
Configuration
Distributed Copy
High of Diff
Other features
Bridging multiple clusters
Waggle Dance
Hive CLI example
Priori pattern
Cloud architecture


Taught by

Devoxx

Related Courses

Cloud Computing Engineering and Management
University System of Maryland via edX
Migrating Workloads to Azure
Microsoft via edX
Exam Readiness: AWS Certified Solutions Architect - Professional (Digital)
Amazon via Independent
AWS Fundamentals: Migrating to the Cloud
Amazon Web Services via Coursera
Upgrade2Success – Mastering HCM Migration
SAP Learning