Distributed Multi-worker TensorFlow Training on Kubernetes
Offered By: Google via Google Cloud Skills Boost
Course Description
Overview
In this hands-on lab you will explore using Google Cloud Kubernetes Engine and Kubeflow TFJob to scale out TensorFlow distributed training.
Syllabus
- GSP775
- Overview
- Setup and requirements
- Lab tasks
- Task 1. Creating a GKE cluster
- Task 2. Deploying
- Task 3. Creating a Cloud Storage bucket
- Task 4. Preparing TFJob
- Task 5. Submitting the TFJob
- Task 6. Monitoring the TFJob
- Congratulations
Tags
Related Courses
Custom and Distributed Training with TensorFlowDeepLearning.AI via Coursera Architecting Production-ready ML Models Using Google Cloud ML Engine
Pluralsight Building End-to-end Machine Learning Workflows with Kubeflow
Pluralsight Deploying PyTorch Models in Production: PyTorch Playbook
Pluralsight Inside TensorFlow
TensorFlow via YouTube