YoVDO

Let's Talk About Raw Documents - Extracting Structured Data for ML Pipelines

Offered By: MLOps.community via YouTube

Tags

MLOps Courses Sentiment Analysis Courses Data Extraction Courses Data Preprocessing Courses API Development Courses Open Source Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Dive into a comprehensive 50-minute MLOps Community Meetup talk featuring Crag Wolfe, Infrastructure Team Lead at Unstructured.io. Explore the world of raw document processing in modern ML pipelines, focusing on extracting structured data from various file formats. Learn about Unstructured.io's open-source libraries and their NLP-focused approach. Discover how to rapidly build custom preprocessing APIs, understand the SEC Filing Section Pipeline, and gain insights into sentiment analysis models. Follow along with a detailed demo, developer quick start guide, and discussions on scaling issues, document editing, and future directions beyond NLP. Connect with the MLOps community through provided links and engage with Crag Wolfe's expertise in back-end engineering and NLP startups.

Syllabus

[] Introduction to Crag Wolfe
[] Agenda
[] Unstructured.io introduction
[] Then open-source community
[] The goal
[] Rapidly build custom preprocessing API
[] Staging
[] Demo
[] Developer quick start
[] SEC Filing Section Pipeline
[] Section 1: Pulling in Raw Documents
[] Section 2: Reading the Document
[] Section 3: Custom Partitioning Bricks
[] Section 4: Cleaning Bricks
[] Section 5: Staging Bricks
[] Section 6: Define the Pipeline API
[] SEC Sentiment Analysis Model notebook
[] Stage for transformers
[] Training a summarization model with Unstructured + Argilla + Huggingface
[] Crag's previous engineering experience
[] Deciding what to tackle next
[] Editing documents
[] Scaling issues
[] Moving out of NLP
[] Wrap up


Taught by

MLOps.community

Related Courses

Text Mining and Analytics
University of Illinois at Urbana-Champaign via Coursera
Introduction to Natural Language Processing
University of Michigan via Coursera
Enabling Technologies for Data Science and Analytics: The Internet of Things
Columbia University via edX
Machine Learning Capstone: An Intelligent Application with Deep Learning
University of Washington via Coursera
moocTLH: Nuevos retos en las tecnologĂ­as del lenguaje humano
Universidad de Alicante via MirĂ­adax