YoVDO

Autogen and Local LLMs Create Realistic Stable Diffusion Model Autonomously

Offered By: kasukanra via YouTube

Tags

Stable Diffusion Courses Artificial Intelligence Courses Image Processing Courses Selenium WebDriver Courses AI Agents Courses AutoGen Courses LLaVA Courses llama.cpp Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore an in-depth video tutorial on harnessing AI Agents and local large language models to autonomously create a realistic style model for SDXL. Dive into the technical aspects of setting up and utilizing various AI frameworks, including LLaVA, Mistral, and llama.cpp. Learn about installing and configuring Chrome and Chromedriver, working with Selenium WebDriver, and implementing Autogen code. Discover techniques for image fetching, processing, and upscaling using Topaz Gigapixel AI. Follow along as the presenter demonstrates the installation and usage of text-generation-webui, explores API interactions, and compares different prompting strategies. Gain insights into troubleshooting common issues and optimizing model performance. This comprehensive guide covers everything from initial setup to advanced implementation, providing valuable knowledge for AI enthusiasts and developers working with SDXL and large language models.

Syllabus

Introduction
Technical Design Flowchart
Installing Chrome
Chromedriver not available and how to fix it
Testing selenium webdriver
Autogen code overview
AI Agents more in-depth
Fetch image overview
Accessing page source
Gotcha with page source
Renaming low resolution link to highest resolution link
Testing the fetch_images script
Revisiting the Autogen code
Autogen in action
Checking the downloaded images
Organizing the images
Using Topaz Gigapixel AI to upscale images
Loading LLM framework overview
Installing text-generation-webui
Showing git hash for text-generation-webui
Downloading llava-v1.5-13b-GPTQ
Support LLaVA v1.5 pull request
Commit log for LLaVA v1.5
Original LLaVA v1.5-13b repository
Possible ? to load llava-v1.5-13b using --load-in-4bit flag in the readme
Downloading the model through CLI
Model Placement in text-generation-webui directory
Multimodal documentation for starting up API
Command to start the server
text-generation-webui GUI
Looking at pull request to see suggested settings
Changing presets in text-generation-webui
Initial trials in the GUI
Comparing concise and verbose prompt instruction
Testing out the text-generation-webui API
Get the IP address of windows from inside Linux
Finding the endpoint/API examples
Testing the API request
Comparing results between the API and the GUI
llava v1.5-13b responding in another language, hallucination ?
Using replicate's original llava v1.5-13b model
Bringing up concise vs. verbose prompt again
Setting up replicate API key locally
Setting up python call to replicate
Running iterate replicate code
Download llava-v1.5-7b model
Setting up llama.cpp framework
Adding models to llama.cpp
Showing llama.cpp commit hash
Starting up llama.cpp server
llama.cpp GUI
llama.cpp API code overview
llama.cpp server API documentation


Taught by

kasukanra

Related Courses

LLaVA: The New Open Access Multimodal AI Model
1littlecoder via YouTube
Image Annotation with LLaVA and Ollama
Sam Witteveen via YouTube
Unraveling Multimodality with Large Language Models
Linux Foundation via YouTube
Efficient and Portable AI/LLM Inference on the Edge Cloud - Workshop
Linux Foundation via YouTube
Training and Serving Custom Multi-modal Models - IDEFICS 2 and LLaVA Llama 3
Trelis Research via YouTube