YoVDO

Multimodal Generative AI: Vision, Speech, and Assistants

Offered By: Codio via Coursera

Tags

Generative AI Courses Computer Vision Courses ChatGPT Courses Speech to Text Courses Text to Speech Courses Multimodal AI Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
We are introducing a new course to replace the "Coding with ChatGPT" course in the Generative AI specialization. This updated course will cover materials, models, and content released in 2024. Some of the new additions include material on using AI for image-to-text (vision), text-to-speech, speech-to-text, and the Assistant API. All these topics come with new labs, lessons, and exercises.

Syllabus

  • Image to text
    • Welcome to Week 1 of the course. These assignments cover vision and image-to-text capabilities. You'll learn how to analyze and interpret images using AI. The module ends with graded summative assessments.
  • Text to Speech
    • Welcome to Week 2 of the course. This week focuses on understanding the fundamentals of text-to-speech (TTS). These assignments cover generating spoken audio in different voices. The module ends with graded summative assessments.
  • Speech to Text
    • Welcome to Week 3 of the course. You'll understand the basics of Whisper and interact with ChatGPT to enhance and optimize the Whisper API. The module ends with graded summative assessments.
  • Assistants
    • Welcome to Week 4 of the course. These assignments cover understanding the basics of the Assistants API, including their purpose, primary components, and available tools like Code Interpreter, File Search, and Function Calling. The module ends with graded summative assessments.

Taught by

Kevin Noelsaint

Related Courses

The AI Engineer Path
Scrimba
AWS Cloud Quest: Machine Learning
Amazon Web Services via AWS Skill Builder
AWS SimuLearn: Text-to-Speech
Amazon Web Services via AWS Skill Builder
Build a Serverless Text-to-Speech Application with Amazon Polly
Amazon Web Services via AWS Skill Builder
Build a Serverless Text-to-Speech Application with Amazon Polly (Korean)
Amazon Web Services via AWS Skill Builder