YoVDO

Multimodal Generative AI: Technology Overview and Business Implications

Offered By: Applied Singularity via YouTube

Tags

Multimodal AI Courses Data Science Courses Machine Learning Courses Deep Learning Courses Computer Vision Courses Neural Networks Courses Generative AI Courses LLaVA Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore multimodal generative AI in this comprehensive 1-hour 38-minute conference talk from Microsoft Reactor Bengaluru. Delve into the technical aspects of training generative AI systems that handle multiple input types simultaneously, including text, image, and audio. Learn about business applications, limitations, and associated costs of these advanced systems. Gain insights into the open-source LLaVA (Large Language-and-Vision Assistant) multimodal system. Discover key concepts such as data gathering, outliers, nonlinearities, and the differences between statistics and AI. Examine practical examples like analog gauges and conversational systems. Understand the architecture, training data sets, and challenges like catastrophic forgetting and repeatability crisis. Investigate advanced topics including eigenvalue decomposition and visual inspection techniques. Access the accompanying presentation slides for a deeper understanding of the material covered.

Syllabus

Intro
What is a Mode
Why Everyone Gathers Data
Outliers
Assumption of Distribution
NonLinearities
Statistics vs AI
Vision vs Text
Netnet
Demo
Analog Gauges
Conversational Systems
Architecture
Example
How did they get the yes
Where did they get the data
Training data set
Mechanical Turk
Catastrophic forgetting
repeatability crisis
why I love this space
eigenvalue decomposition
Visual inspection


Taught by

Applied Singularity

Related Courses

LLaVA: The New Open Access Multimodal AI Model
1littlecoder via YouTube
Autogen and Local LLMs Create Realistic Stable Diffusion Model Autonomously
kasukanra via YouTube
Image Annotation with LLaVA and Ollama
Sam Witteveen via YouTube
Unraveling Multimodality with Large Language Models
Linux Foundation via YouTube
Efficient and Portable AI/LLM Inference on the Edge Cloud - Workshop
Linux Foundation via YouTube