YoVDO

Extracting Tabular Data from PDFs with Camelot and Excalibur

Offered By: EuroPython Conference via YouTube

Tags

EuroPython Courses Python Courses Data Extraction Courses Batch Processing Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore techniques for extracting tabular data from PDFs using open-source Python tools in this EuroPython 2019 conference talk. Learn about the challenges of working with PDF tables and discover how Camelot and Excalibur can provide efficient solutions. Gain hands-on experience with installing and using these tools to extract, process, and export tabular data from PDFs. Understand the process of defining extraction rules, automating batch processing, and exporting data in various formats including CSV, Excel, JSON, HTML, and pandas DataFrames. Discover how to maintain control over sensitive PDF documents while efficiently extracting structured data for further analysis and processing.

Syllabus

Introduction
Outline
Camelot
PDF Structure
CopyPaste from PDF
Tabular
Camelot Excalibur
Camelot features
Installing Camelot
Demo
How to use Camelot
Excalibur table list
Excalibur table object
Exporting
Output
Export
Plotting
Grid Plot
Excalibur
Refresh
Excalibur UI
Exporting Data
Rules
Results
Future Improvements
Questions


Taught by

EuroPython Conference

Related Courses

A Brief History of Data Storage
EuroPython Conference via YouTube
Breaking the Stereotype - Evolution & Persistence of Gender Bias in Tech
EuroPython Conference via YouTube
We Can Get More from Spatial, GIS, and Public Domain Datasets
EuroPython Conference via YouTube
Using NLP to Detect Knots in Protein Structures
EuroPython Conference via YouTube
The Challenges of Doing Infra-As-Code Without "The Cloud"
EuroPython Conference via YouTube