Tools and Approaches for Migrating Big Datasets to the Cloud
Offered By: Devoxx via YouTube
Course Description
Overview
Explore tools and strategies for migrating large-scale datasets to cloud platforms in this 47-minute Devoxx conference talk. Delve into the experiences of the Hotels.com big data platform team as they tackle the challenges of moving extensive data sets and pipelines from on-premises clusters to cloud-based solutions. Discover two open-source tools developed to overcome unexpected obstacles: Circus Train, a dataset replication tool for copying Hive tables between clusters and clouds, and Waggle Dance, a federated Hive query service enabling data querying across multiple Hive metastores. Learn about the unique features of these tools, their advantages over existing solutions, and how they've been successfully implemented to build a petabyte-scale platform now utilized by other Expedia brands. Gain insights into real-world problems and solutions encountered in a large, organically grown corporation, moving beyond idealized architectures to practical applications in big data migration.
Syllabus
Introduction
Agenda
Company structure
Data processing
Migrating jobs first
Its going to take years
Data search replication
Finding an open source solution
Naming your project
Configuration
Distributed Copy
High of Diff
Other features
Bridging multiple clusters
Waggle Dance
Hive CLI example
Priori pattern
Cloud architecture
Taught by
Devoxx
Related Courses
Cloud Computing Engineering and ManagementUniversity System of Maryland via edX Migrating Workloads to Azure
Microsoft via edX Exam Readiness: AWS Certified Solutions Architect - Professional (Digital)
Amazon via Independent AWS Fundamentals: Migrating to the Cloud
Amazon Web Services via Coursera Upgrade2Success – Mastering HCM Migration
SAP Learning