YoVDO

Preprocessing Unstructured Data for LLM Applications

Offered By: DeepLearning.AI via Coursera

Tags

Unstructured Data Courses JSON Courses Data Normalization Courses Vision Transformers Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Enhancing a RAG system’s performance depends on efficiently processing diverse unstructured data sources. In this course, you’ll learn techniques for representing all sorts of unstructured data, like text, images, and tables, from many different sources and implement them to extend your LLM RAG pipeline to include Excel, Word, PowerPoint, PDF, and EPUB files. 1. How to preprocess data for your LLM application development, focusing on how to work with different document types. 2. How to extract and normalize various documents into a common JSON format and enrich it with metadata to improve search results. 3. Techniques for document image analysis, including layout detection and vision transformers, to extract and understand PDFs, images, and tables. 4. How to build a RAG bot that is able to ingest different documents like PDFs, PowerPoints, and Markdown files. Apply the skills you’ll learn in this course to real-world scenarios, enhancing your RAG application and expanding its versatility.

Syllabus

  • Preprocessing Unstructured Data for LLM Applications
    • Enhancing a RAG system’s performance depends on efficiently processing diverse unstructured data sources. In this course, you’ll learn techniques for representing all sorts of unstructured data, like text, images, and tables, from many different sources and implement them to extend your LLM RAG pipeline to include Excel, Word, PowerPoint, PDF, and EPUB files. Join this course and learn: 1. How to preprocess data for your LLM application development, focusing on how to work with different document types. 2. How to extract and normalize various documents into a common JSON format and enrich it with metadata to improve search results. 3. Techniques for document image analysis, including layout detection and vision transformers, to extract and understand PDFs, images, and tables. 4. How to build a RAG bot that is able to ingest different documents like PDFs, PowerPoints, and Markdown files. Apply the skills you’ll learn in this course to real-world scenarios, enhancing your RAG application and expanding its versatility.

Taught by

Matthew Robinson

Related Courses

MongoDB for DBAs
MongoDB University
MongoDB for Node.js Developers
MongoDB University
Web Engineering II: Developing Mobile HTML5 Apps
Technische Hochschule Mittelhessen via iversity
Programming Mobile Services for Android Handheld Systems: Communication
Vanderbilt University via Coursera
HTML, CSS, and Javascript for Web Developers
Johns Hopkins University via Coursera