Production Machine Learning Systems
Offered By: Google via Google Cloud Skills Boost
Course Description
Overview
This course covers how to implement the various flavors of production ML systems— static, dynamic, and continuous training; static and dynamic inference; and batch and online processing. You delve into TensorFlow abstraction levels, the various options for doing distributed training, and how to write distributed training models with custom estimators. This is the second course of the Advanced Machine Learning on Google Cloud series. After completing this course, enroll in the Image Understanding with TensorFlow on Google Cloud course.
Syllabus
- Introduction to Advanced Machine Learning on Google Cloud
- Advanced Machine Learning on Google Cloud
- Welcome
- Architecting Production ML Systems
- Architecting ML systems
- Data extraction, analysis, and preparation
- Model training, evaluation, and validation
- Trained model, prediction service, and performance monitoring
- Training design decisions
- Serving design decisions
- Designing from scratch
- Using Vertex AI
- Lab introduction: Structured data prediction
- Structured data prediction using Vertex AI Platform
- Quiz: Architecting production ML systems
- Readings: Architecting production ML systems
- Designing Adaptable ML Systems
- Introduction
- Adapting to data
- Changing distributions
- Lab: Adapting to data
- Right and wrong decisions
- System failure
- Concept drift
- Actions to mitigate concept drift
- TensorFlow data validation
- Components of TensorFlow data validation
- Lab Introduction: Introduction to TensorFlow Data Validation
- Introduction to TensorFlow Data Validation
- Lab Introduction: Advanced Visualizations with TensorFlow Data Validation
- Advanced Visualizations with TensorFlow Data Validation
- Mitigating training-serving skew through design
- Vertex AI: Training and Serving a Custom Model
- Diagnosing a production model
- Quiz: Designing adaptable ML systems
- Readings: Designing adaptable ML systems
- Designing High-Performance ML Systems
- Introduction
- Training
- Predictions
- Why distributed training is needed
- Distributed training architectures
- TensorFlow distributed training strategies
- Mirrored strategy
- Multi-worker mirrored strategy
- TPU strategy
- Parameter server strategy
- Lab Introduction: Distributed Training with Keras
- Distributed Training with Keras
- Training on large datasets with tf.data API
- Lab Introduction: TPU-speed Data Pipelines
- TPU Speed Data Pipelines
- Inference
- Quiz: Designing high-performance ML systems
- Readings: Designing high-performance ML systems
- Building Hybrid ML Systems
- Introduction
- Machine Learning on Hybrid Cloud
- Kubeflow
- Lab Introduction: Kubeflow Pipelines with AI Platform
- Running Pipelines on Vertex AI 2.5
- TensorFlow Lite
- Optimizing TensorFlow for mobile
- Summary
- Quiz: Hybrid ML systems
- Readings: Hybrid ML systems
- Summary
- Course summary
- Production Machine learning systems - readings
- All quiz questions and answers
- Course Resources
- Architecting Production ML Systems Course Resources
- Your Next Steps
- Course Badge
Tags
Related Courses
Google Cloud Fundamentals: Core InfrastructureGoogle via Coursera Google Cloud Big Data and Machine Learning Fundamentals
Google Cloud via Coursera Serverless Data Analysis with Google BigQuery and Cloud Dataflow en Français
Google Cloud via Coursera Essential Google Cloud Infrastructure: Foundation
Google Cloud via Coursera Elastic Google Cloud Infrastructure: Scaling and Automation
Google Cloud via Coursera