Unveiling Clustering in BERTopic Topic Modeling
Offered By: Conf42 via YouTube
Course Description
Overview
Explore the intricacies of clustering in BERTopic topic modeling through this 27-minute conference talk from Conf42 ML 2023. Delve into the world of topic modeling use cases, understand why BERTopic is a preferred choice, and examine its end-to-end flow. Learn about HDBSCAN clustering algorithm, its foundations in DBSCAN, and how it utilizes k-NN and minimum spanning trees to define density-based spatial clustering. Discover the concept of stability score "λ" and its role in determining final clusters. Analyze HDBSCAN's performance, strengths, and weaknesses through a practical demo and comprehensive explanation. Gain insights into future scope and access valuable references for further exploration of this powerful topic modeling technique.
Syllabus
intro
preface
who are we?
agenda
topic modeling use case
why bertopic?
bertopic end-to-end flow
clustering
dataset description
demo
what is hdbscan?
to understand hdbscan we need to know dbscan
what if there was no fixed radius?
k-nn algorithm to define radius
minimum spanning tree finds density and hierachy
density based spatial clustering
stability score "λ"
final clusters
hdbscan steps
hdbscan - performance comparison
hdbscan - strenghts and weaknesses
conclusion and future scope
references & ressources
thank you
Taught by
Conf42
Related Courses
Graph Partitioning and ExpandersStanford University via NovoEd The Analytics Edge
Massachusetts Institute of Technology via edX More Data Mining with Weka
University of Waikato via Independent Mining Massive Datasets
Stanford University via edX The Caltech-JPL Summer School on Big Data Analytics
California Institute of Technology via Coursera