数据科学 | Data Science
Offered By: Tsinghua University via edX
Course Description
Overview
现如今,数据无时无刻在影响着我们每天的生活。“数据科学”,是一门与数据打交道的艺术,是能够助力你在未来的职业生涯中(金融、电讯、信息技术、互联网、人工智能、教育、运输以及医疗等)走的更远的核心技能。
在这个简明而全面的项目中,您将对数据分析有一个完整的了解,并对如今最前沿的数据挖掘算法以及系统有一个很好的理解。不仅如此,你还能够在第一时间实际应用这些先进技术。最重要的是,数据科学项目为所有想要在就业市场中具有更强的竞争力,为未来学习铺设更加坚实道路的学习者起到一个良好开端的作用。
清华大学是中国的顶尖学府,并在世界高校中名列前茅。成立于2001年的清华大学深圳研究生院旨在将清华大学的研究力量与深圳先进的信息技术以及繁荣的互联网产业相结合进行高级研究生培训。
Data is becoming ubiquitous in the everyday life and greatly affecting the way that we interact with the surrounding world. Data Science, the art of working with data, represents an essential skill for empowering your future career in finance, telecommunication, IT, Internet, AI, consulting, education, transportation, healthcare and so on.
In this concise yet comprehensive program, you will gain a complete picture of data analytics and a good understanding of popular data mining algorithms and systems. You will also have the first-hand experience of implementing and applying these techniques. Most importantly, the program will serve as a good starting point for anyone interested in becoming competitive in the job market and pave a solid road for you to continue your future education.
Tsinghua University is one of the very top universities in China and is also highly ranked in the world. Founded in 2001, its Graduate School at Shenzhen is aimed at high quality research student training by combining the research strength of Tsinghua University and the booming IT & Internet industry in Shenzhen, one of the most innovative cities in China.
Syllabus
Course 1: Data Mining: Theories and Algorithms for Tackling Big Data | 数据挖掘:理论与算法
Unraveling the mysteries of Data Mining and Big Data, this course is a must-have for any budding Data Scientist. 最有趣的理论+最有用的算法=不得不学的数据科学。
Course 2: Data Science: A New Way of Thinking | 数据科学导论
本课程作为数据科学的先导课和认知类课程,致力于以形象生动的教学模式为学生普及数据挖掘、大数据相关的基础知识、核心概念和思维模式,从工程技术、法律规范、应用实践等不同角度描绘数据科学的美好蓝图。
Course 3: Big Data Machine Learning | 大数据机器学习
《大数据机器学习》课程是面向信息学科的高年级本科生或研究生开设的基础理论课,目的是培养学生深入理解大数据机器学习理论基础,牢固掌握大数据机器学习方法,并能够解决实际问题等综合能力。课程的主要内容包括:统计学习基本理论,机器学习基本方法,深度学习理论和方法。
Course 4: Advanced Big Data Systems | 高级大数据系统
高级大数据系统的实现、优化和应用,包括分布式文件系统、MapReduce/Spark、Storm/Spark streaming、Mahout等系统的原理、实现、策略优化。
Course 5: 知识产权法律及实务|Big Data and Intellectual Property Law and Practice
懂得在中国如何运用和保护知识产权,为迎接知识经济时代的全球竞争做好准备。
Understand how to use and protect intellectual property as it pertains data analytics and practices in China. Be prepared for the global competition in the knowledge economy era.
Course 6: 数据可视化|Data Visualization
数据可视化是一项致力于把抽象的数据或概念转化为适于人类理解和接受的视觉化的信息技术,是一个典型的交叉学科。
Courses
-
本课程完整覆盖数据挖掘领域的各项核心技术,包括数据预处理、分类、聚类、回归、关联、推荐、集成学习、进化计算等。强调在知识的广度、深度和趣味性之间寻找最佳平衡点,在生动幽默中讲述数据挖掘的核心思想、关键技术以及一些在其它相关课程和教科书中少有涉及的重要知识点。本课程适合对大数据和数据科学感兴趣的各专业学生以及工程技术人员学习,不追求纯粹的理论推导,而是把理论与实践有机结合,让学生学到活的知识、有用的知识和真正属于自己的知识,特别是数据分析领域的研究方法和思维方式。
Despite the large volume of data mining papers and tutorials available on the web, aspiring data scientists find it surprisingly difficult to locate an overview that blends clarity, technical depth and breadth with enough amusement to make big data analytics engaging. This course does just that.
Each module starts with an interesting real-world example that gives rise to the specific research question of interest.
Students are then presented with a general idea of how to tackle this problem along with some intuitive and straightforward approaches.
Finally, a number of representative algorithms are introduced along with concrete examples that show how they function in practice.
While theoretical analysis sometimes overcomplicates things for students, here it’s applied to help them better understand the key features of the techniques.
-
As a pilot course and cognitive course for data science, this course is dedicated to popularizing the basic knowledge, core concepts and thinking models related to data mining and big data for students through a vivid teaching model, from engineering technology, legal norms, and application practice. Describe the beautiful blueprint of data science from different angles. This course is suitable for college students from various backgrounds who are interested in the fascinating field of data science. Existing online data science courses mainly focus on purely technical content such as learning specific algorithms. In contrast, data science is an application-oriented, highly interdisciplinary field that requires systematic knowledge from multiple domains. In addition to algorithmic learning, students also need to recognize the challenges people may face in the real world and the relationship between data and human society. The purpose of this course is to comprehensively understand the key issues in the big data era, improve data awareness, and help students lay a solid foundation for subsequent data science courses.
This is an introductory course suitable for university students with diverse backgrounds interested in getting into the fascinating world of data science. Existing online data science courses mainly focus on learning specific algorithms and other purely technical contents. By contrast, data science is an application-oriented, highly interdisciplinary domain, which requires systematic knowledge from a variety of sources. In addition to algorithm learning, students also need to appreciate the challenges that people may face in the real world as well as the relationship between data and human society. The purpose of this course is to provide a comprehensive understanding of the key issues in the era of big data and promote data awareness to help students lay a solid foundation for subsequent data science courses.
-
本课程将重点讲解高级大数据系统的实现、优化和应用,包括分布式文件系统、MapReduce/Spark、Storm/Spark streaming、Mahout等系统的原理、实现、策略优化。
近年来,人工智能技术正在快速地渗透进各个不同领域。因大数据系统是当今数据驱动人工智能的基础,而变得至关重要。本课程旨在引导学生了解大数据系统的基本概念,包括如何有效地存储、处理和分析数据。课程从分布式系统设计的一般原理出发。之后我们提供了如何在大数据系统中评定存储、计算和网络功能的框架。最后,为了使这些设计原则便于理解,我们的案例研究将使用真实的工业系统来演示基本设计原则如何应用于实际系统,以及该如何分析它们的性能以及局限性。
Recent years have witnessed the rapid increase of the penetration of AI technology into different areas in the industry. Big data systems, the foundation that enables today’s data-driven AI, are thus becoming critically important. This course is dedicated to lead students into the basic concepts of big data systems, covering how data is effectively stored, processed and analyzed. We start from the general principles in the design of distributed systems; then we provide frameworks on how storage, computation, and network capabilities are scaled in big data systems; finally, to make such design principles easy to follow, our case studies use real industrial systems to demonstrate how the basic design principles are applied in real-world systems as well as how their performance and limitation are analyzed.
-
How to process big data is an ongoing challenge facing machine learning. Currently, the problem of machine learning processing large-scale data is very common. How to propose a machine learning algorithm that meets the needs of big data processing is a hot research topic in the era of big data. The "Big Data Machine Learning" course is a basic theoretical course for senior undergraduates and graduate students in the Department of Information Science. Its purpose is to train students to comprehensively understand the theoretical basis of big data machine learning and firmly master the methods and solutions of big data machine learning. Ability to solve practical problems. This course mainly studies machine learning and deep learning methods, aiming to realize the application of big data machine learning. The main contents of this course include:
- Basic theory of statistical learning
- .Basic methods of machine learning
- Deep learning theories and methods
An ongoing challenge for machine learning is how to deal with big data. At present, the problem of machine learning dealing with large-scale data is widespread. How to propose a machine learning algorithm to meet the needs of big data processing is a hot research topic in the big data era. The course " Big Data Machine Learning" is a basic theory course for senior undergraduates and postgraduates in information science department. Its purpose is to cultivate students' comprehensive ability to understand the theoretical basis of Big Data Machine Learning, master the methods of Big Data Machine Learning firmly, and solve practical problems. This course focuses on the methodsof machine learning and deep learning, and aims to realize the application of big data machine learning. The main contents of the course include:
- The basic theories of statistical learning
- The basic methods of machine learning
- The theories and methods of deep learning
-
本课程旨在提供一套分析工具来帮助学生理解和应用知识产权法律制度。将从本地和全球的角度讨论中国知识产权保护的热点和关键问题。
本课程采用案例教学法,对知识产权进行了广泛的介绍,探讨了知识产权的内容和特征,并对如何识别、管理、运用和保护知识产权进行了实践训练。本课程重点介绍专利权,并介绍商标、版权和商业秘密。为了加深和拓宽学生的知识和理解,本课程还特别关注国际知识产权制度、基因排他性、专利池、技术标准等前沿问题。本课程旨在为学生理解和应用知识产权法律制度提供一套分析工具。将从本地和全球的角度讨论中国知识产权保护的热点话题和关键问题。
本课程采用案例教学法,对知识产权进行广泛介绍,探讨其内容和特点,提供实践训练——教授如何识别、管理、使用和保护知识产权。本课程以专利为重点,还介绍了商标、版权和商业秘密。为了加深和拓宽学生的知识和理解,本课程还特别关注国际知识产权制度、基因专利、专利池和技术标准等前沿问题。
-
本课程适对数据进行可视化挖掘和理解大的各专业大学生和各学科实践者。数据可视化是一项致力于把抽象的数据或概念转化为适于人类理解和接受的视觉化的信息技术,是一个典型的交叉学科。其目的是利用有图形清晰有效地传递信息。
它不只是追求理论知识和抽象概念,而是将理论与实践无缝连接起来,通过一系列精心设计的案例研究,让学生学习数据可视化的有用技术。它系统地介绍了可视化的基本知识,以及可视化的历史和现状。通过本课程的学习,学生将领略到数据可视化的魅力和力量,并获得丰富的实践经验。This course is suitable for university students of all majors and practitioners in various disciplines who are interested in visually exploring and understanding the data of interest. Data visualization is an interdisciplinary field about the visual representation of data and information, aiming to communicate messages clearly and effectively using principled graphical means. Instead of solely pursuing theoretical knowledge and abstract concepts, it seamlessly connects theory with practice to enable students to learn useful techniques about data visualization through a series of well-designed case studies. It systematically covers the fundamental knowledge of visualization as well as the history and the state of the art of visualization. By completing this course, students will appreciate both the beauty and power of data visualization and have rich hands-on experiences on implementing popular visualization techniques.
Taught by
Chun Yuan, Juan He, Zhi Wang and Bo Yuan
Tags
Related Courses
Data Science BasicsA Cloud Guru Introduction to Machine Learning
A Cloud Guru Address Business Issues with Data Science
CertNexus via Coursera Advanced Clinical Data Science
University of Colorado System via Coursera Advanced Data Science Capstone
IBM via Coursera