Let's Talk About Raw Documents - Extracting Structured Data for ML Pipelines

Offered By: MLOps.community via YouTube

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Dive into a comprehensive 50-minute MLOps Community Meetup talk featuring Crag Wolfe, Infrastructure Team Lead at Unstructured.io. Explore the world of raw document processing in modern ML pipelines, focusing on extracting structured data from various file formats. Learn about Unstructured.io's open-source libraries and their NLP-focused approach. Discover how to rapidly build custom preprocessing APIs, understand the SEC Filing Section Pipeline, and gain insights into sentiment analysis models. Follow along with a detailed demo, developer quick start guide, and discussions on scaling issues, document editing, and future directions beyond NLP. Connect with the MLOps community through provided links and engage with Crag Wolfe's expertise in back-end engineering and NLP startups.

Syllabus

[] Introduction to Crag Wolfe
[] Agenda
[] Unstructured.io introduction
[] Then open-source community
[] The goal
[] Rapidly build custom preprocessing API
[] Staging
[] Demo
[] Developer quick start
[] SEC Filing Section Pipeline
[] Section 1: Pulling in Raw Documents
[] Section 2: Reading the Document
[] Section 3: Custom Partitioning Bricks
[] Section 4: Cleaning Bricks
[] Section 5: Staging Bricks
[] Section 6: Define the Pipeline API
[] SEC Sentiment Analysis Model notebook
[] Stage for transformers
[] Training a summarization model with Unstructured + Argilla + Huggingface
[] Crag's previous engineering experience
[] Deciding what to tackle next
[] Editing documents
[] Scaling issues
[] Moving out of NLP
[] Wrap up

Taught by

MLOps.community

Let's Talk About Raw Documents - Extracting Structured Data for ML Pipelines

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Let's Talk About Raw Documents - Extracting Structured Data for ML Pipelines

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Login to Continue