YoVDO

State of the Art Natural Language Understanding at Scale - David Talby

Offered By: Open Data Science via YouTube

Tags

Apache Spark Courses Machine Learning Courses Deep Learning Courses PySpark Courses Lemmatization Courses

Course Description

Overview

Explore state-of-the-art natural language understanding at scale in this 50-minute conference talk from ODSC West 2018. Dive into the challenges of processing language and learn about the NLP library for Apache Spark, which extends Spark ML pipeline APIs for distributed, optimized NLP and ML pipelines. Discover core NLP algorithms including lemmatization, part of speech tagging, dependency parsing, named entity recognition, spell checking, and sentiment detection. Follow along with demonstrations of building common pipelines using PySpark on notebooks. Gain insights into benchmarks, design best practices, and performance optimizations for NLP, ML, and deep learning pipelines on Spark. Understand the latest improvements in Spark NLP, including native Spark extensions, embedded TensorFlow, and advanced word embeddings. Learn about practical applications like e-discovery and domain-specific sentiment analysis models. Get guidance on starting your first NLP project and setting realistic expectations for working with natural language processing at scale.

Syllabus

Intro
CONTENTS
THEN, YOU NEED TO UNDERSTAND LANGUAGE
WHAT MAKES LANGUAGE HARD
INTRODUCING SPARK NLP
THE PERFORMANCE BOTTLENECK
SPARK NLP 2017: NATIVE SPARK EXTENSION
BENCHMARK: TRAINING
BENCHMARK: SCALING
SPARK NLP 2018 IMPROVEMENTS
FRICTIONLESS REUSE & OPTIMIZATION
WHAT EXACTLY IS "STATE OF THE ART"?
NAMED ENTITY RECOGNITION
NER WITH DEEP LEARNING
WORD EMBEDDINGS
EMBEDDINGS: THE NEXT GENERATION
SPARK NLP 2018: EMBEDDED TENSORFLOW
PERFORMANCE: THE NEXT LEVEL
SENTIMENT ANALYSIS
SO, TRAIN YOUR OWN DOMAIN-SPECIFIC MODELS
E-DISCOVERY
WHAT'S A GOOD FIRST NLP PROJECT?
WHAT EXPECTATIONS SHOULD I SET?


Taught by

Open Data Science

Related Courses

Fundamentals of Scalable Data Science
IBM via Coursera
Data Science and Engineering with Spark
Berkeley University of California via edX
Master of Machine Learning and Data Science
Imperial College London via Coursera
Data Analysis Using Pyspark
Coursera Project Network via Coursera
Building Machine Learning Pipelines in PySpark MLlib
Coursera Project Network via Coursera