Foundations of Data Science
Offered By: Berkeley University of California via edX
Course Description
Overview
As the demand for data science skills rises around the world, this Professional Certificate by BerkeleyX will teach you how to combine data with Python programming skills to ask questions and explore problems that you may encounter in a future job, in any field of study, and even in everyday life. This course will give you a new lens to explore the issues and problems that you care about.
Among others, Berkeley’s online data science program has been endorsed by Google’s Vice President of Education and Microsoft’s Corporate Vice President of Cloud AI. The program is based on Data 8, UC Berkeley’s fastest-growing class, taken by more than 3300 students each year as they start their data science journey. Data 8 is recognized amongst institutions as the preeminent introductory data science course has helped place UC Berkeley at the forefront of democratizing data science for all. Top-ranked universities from around the world, such as Yale, Cornell, and NYU, have followed Berkeley’s lead by creating their own versions of UC Berkeley’s groundbreaking data science course. Through this professional certificate, students all over the world can now take UC Berkeley’s most popular course.
This course is accessible for students who have not previously taken statistics or computer science courses. No prior programming experience is assumed or necessary. Through instructor guided videos and labs, you will learn about topics starting from fundamental data science concepts to machine learning methods.
This program will help you become a data scientist by teaching you how to analyze a diverse set of real data sets including economic data, geographic data, and public health data. Typically, the information will be incomplete and there will be some uncertainty involved. You will learn how to conduct inference, which will help you quantify uncertainty and measure the accuracy of your estimates. Finally, you will put all of your knowledge together and learn about prediction using machine learning. We all have to be able to think critically and make decisions based on data. Thus, the program aims to make data science accessible to everyone. The program focuses on a set of core concepts and techniques that have broad applicability. Unlike “bootcamps” for programmers, this program presents data science as a way of thinking, in which interpretation and communication are as important as computation and statistical methods.
You don’t have to download any software – a browser is all you need. Open up a window and prepare to have some fun.
Syllabus
Course 1: Data Science: Computational Thinking with Python
Learn the basics of computational thinking, an essential skill in today’s data-driven world, using the popular programming language, Python.
Course 2: Data Science: Inferential Thinking through Simulations
Learn how to test hypotheses, draw inferences, and make robust conclusions based on data.
Course 3: Data Science: Machine Learning and Predictions
Learn how to use machine learning, with a focus on regression and classification, to automatically identify patterns in your data and make better predictions.
Courses
-
We live in an era of unprecedented access to data. Understanding how to organize and leverage the vast amounts of information at our disposal are critical skills that allow us to infer upon the world and make informed decisions. This course will introduce you to such skills.
To work with large amounts of data, you will need to harness the power of computation through programming. This course teaches you basic programming skills for manipulating data. You will learn how to use Python to organize and manipulate data in tables, and to visualize data effectively. No prior experience with programming or Python is needed, nor is any statistics background necessary.
The examples given in the course involve real world data from diverse settings. Not all data is numerical – you will work with different types of data from a variety of domains. Though the term “data science” is relatively new, the fundamental ideas of data science are not. The course includes powerful examples that span the centuries from the Victorian era to the present day.
This course emphasizes learning through doing: you will work on large real-world data sets through interactive assignments to apply the skills you learn. Throughout, the underlying thread is that data science is a way of thinking, not just an assortment of methods. You will also hone your interpretation and communication skills, which are essential skills for data scientists.
-
One of the principal responsibilities of a data scientist is to make reliable predictions based on data. When the amount of data available is enormous, it helps if some of the analysis can be automated. Machine learning is a way of identifying patterns in data and using them to automatically make predictions or decisions. In this data science course, you will learn basic concepts and elements of machine learning.
The two main methods of machine learning you will focus on are regression and classification. Regression is used when you seek to predict a numerical quantity. Classification is used when you try to predict a category (e.g., given information about a financial transaction, predict whether it is fraudulent or legitimate).
For regression, you will learn how to measure the correlation between two variables and compute a best-fit line for making predictions when the underlying relationship is linear. The course will also teach you how to quantify the uncertainty in your prediction using the bootstrap method. These techniques will be motivated by a wide range of examples.
For classification, you will learn the k-nearest neighbor classification algorithm, learn how to measure the effectiveness of your classifier, and apply it to real-world tasks including medical diagnoses and predicting genres of movies.
The course will highlight the assumptions underlying the techniques, and will provide ways to assess whether those assumptions are good. It will also point out pitfalls that lead to overly optimistic or inaccurate predictions.
-
Using real-world examples from a wide range of domains including law, medicine, and football, you’ll learn how data scientists make conclusions about unknowns based on the data available.
Often, the data we have are not complete, yet we’d still like to draw inferences about the world and quantify the uncertainty in our conclusions. This is called statistical inference. In this course, you will learn the framework for statistical inference and apply them to real-world data sets.
Notably, you will learn how to conduct hypothesis testing—comparing theoretical predictions to actual data, and choosing whether to accept those predictions. You will utilize the power of computation to conduct simulations by which you can evaluate theories or hypotheses about how the world works. This course will teach you the power of statistical inference: given a random sample, how do we predict some quantity that we cannot observe directly?
You will also learn how to by quantifying the uncertainty in the conclusions you draw from hypothesis testing. This helps assess whether patterns that appear to be present in the data actually represent a true relationship in the world, or whether they might merely reflect random fluctuations due to chance. Throughout this course, we will go over multiple methods for estimation and hypothesis testing, based on simulations and the bootstrap method. Finally, you will learn about randomized controlled experiments and how to draw conclusions about causality.
The course emphasizes the conceptual basis of inference, the logic of the decision-making process, and the sound interpretation of results.
Taught by
John DeNero, David Wagner and Ani Adhikari
Tags
Related Courses
FinTech for Finance and Business LeadersACCA via edX Accounting Data Analytics
University of Illinois at Urbana-Champaign via Coursera Advanced AI on Microsoft Azure: Ethics and Laws, Research Methods and Machine Learning
Cloudswyft via FutureLearn Ethics, Laws and Implementing an AI Solution on Microsoft Azure
Cloudswyft via FutureLearn Post Graduate Certificate in Advanced Machine Learning & AI
Indian Institute of Technology Roorkee via Coursera