YoVDO

Chameleon: Expanding Open-Source Ambari for HPC

Offered By: Linux Foundation via YouTube

Tags

High Performance Computing Courses Scientific Computing Courses Infiniband Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a conference talk detailing the Chameleon project, which expands Apache Ambari for High Performance Computing (HPC) environments. Learn about the convergence of HPC and big data processing, and how Chameleon addresses the growing demand for advanced data processing capabilities in scientific computing. Discover the project's key features, including Lustre filesystem management, HPC resource monitoring for GPUs and Infiniband, and enhanced YARN application monitoring using Linux performance tools. Gain insights into the motivation behind Chameleon, its architecture for Hadoop-on-Lustre execution, and the implementation of dynamic metrics management. Understand how Chameleon streamlines HPC-based big data platform operations and management through its dynamic dashboard, bridging the gap between traditional HPC simulations and modern big data analytics.

Syllabus

Chameleon Expanding Open-Source Ambari for HPC
Motivation (Trend Perspective) HPC and Bigdata is converging
Motivation Application Perspective Genome analysis by Our scientist Some stages of data pipeline begin to support bigdata platform
Ambari Overview Apache Ambari is a 100% open source platform for provisioning managing and monitoring Hadoop clusters
Extension Points for custom service development Ambari view is a plugin that provides a way to connect custom functions to the web UI Ambari stack defines a set of everything needed to define services such as HDFS and YARN
lustrefs Management Service Lustre Kernel Installation function(LustrekernelUpdater)
Account Management Service Hadoop does not support strong authentication by default • Hadoop supports Kerberes for that, but, causes performance
Hadoop-on-Lustre architecture Comparison between HDFS and Lustre
Related works for Hadoop-on-Lustre Xyrates • MapReduce Job shows theoretical performance gains on an appropriately designed Lustre based HIPC cluster with Infinband network Seagate's lustrels plugin
2 Hadoop-on-Lustre execution environment Hadoop-on-Lustre execution environment • Works for diskless cluster backed by Lustre • Uses secure container configuration for multi-tenancy
Motivation Dynamic metrics management is required
YARN Application Monitoring Service Time-series data monitoring
3 TimeScaleDB Open-Source time-series database optimized for fast
3 Data management structure Alter Table
HPC Resources Monitoring Provides HPC monitoring information through web UI
Summary HPC and big data convergence makes the distinction between data analytics and computational science's ecosystem disappear. Chameleon is a Bigdata platform operation management system considering HPC environment Chameleon helps to merge Hadoop ecosystem and Lustres.


Taught by

Linux Foundation

Tags

Related Courses

Scientific Computing
University of Washington via Coursera
Biology Meets Programming: Bioinformatics for Beginners
University of California, San Diego via Coursera
High Performance Scientific Computing
University of Washington via Coursera
Practical Numerical Methods with Python
George Washington University via Independent
Julia Scientific Programming
University of Cape Town via Coursera