YoVDO

Let's Talk About Raw Documents - Extracting Structured Data for ML Pipelines

Offered By: MLOps.community via YouTube

Tags

MLOps Courses Sentiment Analysis Courses Data Extraction Courses Data Preprocessing Courses API Development Courses Open Source Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Dive into a comprehensive 50-minute MLOps Community Meetup talk featuring Crag Wolfe, Infrastructure Team Lead at Unstructured.io. Explore the world of raw document processing in modern ML pipelines, focusing on extracting structured data from various file formats. Learn about Unstructured.io's open-source libraries and their NLP-focused approach. Discover how to rapidly build custom preprocessing APIs, understand the SEC Filing Section Pipeline, and gain insights into sentiment analysis models. Follow along with a detailed demo, developer quick start guide, and discussions on scaling issues, document editing, and future directions beyond NLP. Connect with the MLOps community through provided links and engage with Crag Wolfe's expertise in back-end engineering and NLP startups.

Syllabus

[] Introduction to Crag Wolfe
[] Agenda
[] Unstructured.io introduction
[] Then open-source community
[] The goal
[] Rapidly build custom preprocessing API
[] Staging
[] Demo
[] Developer quick start
[] SEC Filing Section Pipeline
[] Section 1: Pulling in Raw Documents
[] Section 2: Reading the Document
[] Section 3: Custom Partitioning Bricks
[] Section 4: Cleaning Bricks
[] Section 5: Staging Bricks
[] Section 6: Define the Pipeline API
[] SEC Sentiment Analysis Model notebook
[] Stage for transformers
[] Training a summarization model with Unstructured + Argilla + Huggingface
[] Crag's previous engineering experience
[] Deciding what to tackle next
[] Editing documents
[] Scaling issues
[] Moving out of NLP
[] Wrap up


Taught by

MLOps.community

Related Courses

Crie sua página pessoal usando React e Github Pages
Coursera Project Network via Coursera
Introduction to RISC-V
Linux Foundation via edX
C# Framework Design
LinkedIn Learning
GitHub Basics Course (How To)
Treehouse
Android Development from Scratch to Create Cool Apps!
Udemy