Data Science
Offered By: University of California, San Diego via edX
Course Description
Overview
Excel in Data Science, one of the hottest fields in tech today. Learn how to gain new insights from big data by asking the right questions, manipulating data sets and visualizing your findings in compelling ways.
In this MicroMasters program, you will develop a well-rounded understanding of the mathematical and computational tools that form the basis of data science and how to use those tools to make data-driven business recommendations.
This MicroMasters program encompasses two sides of data science learning: the mathematical and the applied.
Mathematical courses cover probability, statistics, and machine learning. The applied courses cover the use of specific toolkit and languages such as Python, Numpy, Matplotlib, pandas and Scipy, the Jupyter notebook environment and Apache Spark to delve into real world data.
You will learn how to collect, clean and analyse big data using popular open source software will allow you to perform large-scale data analysis and present your findings in a convincing, visual way. When combined with expertise in a particular type of business, it will make you a highly desirable employee.
Syllabus
Course 1: Python for Data Science
Learn to use powerful, open-source, Python tools, including Pandas, Git and Matplotlib, to manipulate, analyze, and visualize complex datasets.
Course 2: Probability and Statistics in Data Science using Python
Using Python, learn statistical and probabilistic approaches to understand and gain insights from data.
Course 3: Machine Learning Fundamentals
Understand machine learning's role in data-driven modeling, prediction, and decision-making.
Course 4: Big Data Analytics Using Spark
Learn how to analyze large datasets using Jupyter notebooks, MapReduce and Spark as a platform.
Courses
-
The job of a data scientist is to glean knowledge from complex and noisy datasets.
Reasoning about uncertainty is inherent in the analysis of noisy data. Probability and Statistics provide the mathematical foundation for such reasoning.
In this course, part of the Data Science MicroMasters program, you will learn the foundations of probability and statistics. You will learn both the mathematical theory, and get a hands-on experience of applying this theory to actual data using Jupyter notebooks.
Concepts covered included: random variables, dependence, correlation, regression, PCA, entropy and MDL.
-
Do you want to build systems that learn from experience? Or exploit data to create simple predictive models of the world?
In this course, part of the Data Science MicroMasters program, you will learn a variety of supervised and unsupervised learning algorithms, and the theory behind those algorithms.
Using real-world case studies, you will learn how to classify images, identify salient topics in a corpus of documents, partition people according to personality profiles, and automatically capture the semantic structure of words and use it to categorize documents.
Armed with the knowledge from this course, you will be able to analyze many different types of data and to build descriptive and predictive models.
All programming examples and assignments will be in Python, using Jupyter notebooks.
-
In data science, data is called "big" if it cannot fit into the memory of a single standard laptop or workstation.
The analysis of big datasets requires using a cluster of tens, hundreds or thousands of computers. Effectively using such clusters requires the use of distributed files systems, such as the Hadoop Distributed File System (HDFS) and corresponding computational models, such as Hadoop, MapReduce and Spark.
In this course, part of the Data Science MicroMasters program, you will learn what the bottlenecks are in massive parallel computation and how to use spark to minimize these bottlenecks.
You will learn how to perform supervised an unsupervised machine learning on massive datasets using the Machine Learning Library (MLlib).
In this course, as in the other ones in this MicroMasters program, you will gain hands-on experience using PySpark within the Jupyter notebooks environment.
Taught by
Leo Porter, Alon Orlitsky, Yoav Freund, Sanjoy Dasgupta and Ilkay Altintas
Tags
Related Courses
Big Data EssentialsA Cloud Guru Big Data
University of Adelaide via edX Advanced Data Science with IBM
IBM via Coursera Amazon EMR Getting Started (Indonesian)
Amazon Web Services via AWS Skill Builder Analisar e preparar dados com o Amazon SageMaker Data Wrangler e o Amazon EMR (Português (Brasil)) | Lab - Analyze and Prepare Data with Amazon SageMaker Data Wrangler and Amazon EMR (Portuguese (Brazil))
Amazon Web Services via AWS Skill Builder