High Volume PDF Text Extraction Using Python Open-Source Tools
Offered By: EuroPython Conference via YouTube
Course Description
Overview
Explore high-volume PDF text extraction techniques using Python open-source tools in this EuroPython 2023 conference talk. Learn about the importance of extracting information from large volumes of PDF documents for corporate decision-making and long-term forecasting. Discover how to tackle the challenges of processing unstructured data and integrating OCR capabilities. Gain insights into achieving top-tier performance and maximum extraction detail using an open-source toolset designed for Big Data scenarios. Understand the "need for speed" in text extraction and how to effectively recreate structured information from millions of pages of documents.
Syllabus
High Volume PDF Text Extraction using Python Open-Source Tools — Harald Lieder
Taught by
EuroPython Conference
Related Courses
A Brief History of Data StorageEuroPython Conference via YouTube Breaking the Stereotype - Evolution & Persistence of Gender Bias in Tech
EuroPython Conference via YouTube We Can Get More from Spatial, GIS, and Public Domain Datasets
EuroPython Conference via YouTube Using NLP to Detect Knots in Protein Structures
EuroPython Conference via YouTube The Challenges of Doing Infra-As-Code Without "The Cloud"
EuroPython Conference via YouTube