Building a Knowledge Graph with Spark and NLP for Novel Drug Recommendations
Offered By: Databricks via YouTube
Course Description
Overview
Syllabus
Intro
Drug discovery is hard
AstraZeneca introduced the "5R" framework
5R has had a significant impact in improving our efficiency
We are investing in new sources of data and faster validation
We need tools to make sense of data & make better and faster decisions
Finding a drug target can be formulated as a hybrid recommendation problem • Scientists need to parse large amount of information and make a ranking prediction • Different formats, data models, locations
Multiple objective optimization
Traditional recsys approaches
We assemble a large scale knowledge graph from public and AZ internal data
KG pipeline on
Pipeline - series of notebooks
Pipeline stages
Node dictionary
Mappings table
Edge assertions
Keep evidence & context for each assertion
Focus on NLP
Use natural language processing to extract precise information at scale
NLP Termite on Spark
Syntax parsing increases precision of entity recognition
Relationship from literatures reduce sparsity of biological KG
Language models lead to improvements in recall and precision
Learned sentence representation can be used for downstream tasks
Graph embedding pipeline
Approximate nearest neighbor search
Lessons learned
Acknowledgements
Taught by
Databricks
Related Courses
Mining Massive DatasetsStanford University via edX Nearest Neighbor Collaborative Filtering
University of Minnesota via Coursera Practical Deep Learning For Coders
fast.ai via Independent Data Mining: Theories and Algorithms for Tackling Big Data | 数据挖掘:理论与算法
Tsinghua University via edX ความรู้พื้นฐานเกี่ยวกับบิ๊กดาตา | Big Data Concept
Sukhothai Thammathirat Open University via ThaiMOOC