刘传海教授短期课程通知
通讯员:  发布人:沈彤  发布时间:2018-05-08   浏览次数:628

  

刘传海教授短期课程通知

课程题目:Introduction to a Multithreaded and Distributed R for Big Data Analysis

主讲人: 刘传海教授(美国普渡大学)

授课时间:2018515日下午19:00-21:30

                     2018517日下午19:00-21:30

                     2018年5月22日下午19:00-21:30   

                2018年5月24日下午19:00-21:30                    

                 2018年5月26日下午14: 00-17:00

                    2018年5月26日下午19:00-21:30

                     2018年5月28日下午19:00-21:30

                     2018年5月30日下午19:00-21:30


授课地点:文波楼4楼统计与数学学院会议室

课程简介:

The computer software R is one of the most popular computing tools for data analysis. In the past decade or so, tremendous efforts have been made to make R useful for big data analysis. These include Tessera, Revolution-R, and SparkR, to name a few. As we know, they are all making use of JAVA-based softwares such as Hadoop and Spark.

In this workshop, we introduce an entirely new alternative, a multithreaded and distributed R, called SupR. The prototype of SupR (http://www.stat.purdue.edu/~chuanhai/SupR/index.html) was made possible by modifying R (R-3.1.1) existing internal system implementation in C. The key features of the prototype include (1) a R-style front-end obtained by maintaining the existing R syntax and internal basic data structures, (2) a Java-like multithreading model, (3) a Spark-like cluster computing environment, and (4) a built-in simple distributed file system.

Students are expected to bring their own laptop computers to install Virtual Box, Ubuntu, and the current version SupR.More information on installation and course materials is available athttp://www.stat.purdue.edu/~chuanhai/SupR/release/

Course Outline:

1. Introduction to SupR

2. R Fundamentals

3. Object-Oriented Programming in SupR

4. Multithreading

5. Advanced Multithreading

6. Multithreaded Graphics and Iterative Parallel Algorithms

7. Distributed File Systems and Databases

8. Cluster Computing

9. MapReduce Functions

10. Embedding Python and Other Systems

11. Apache Hadoop and Spark (optional)

12. C Interface and Beyond

主讲人简介:

刘传海,普渡大学统计系教授。1994年毕业于哈佛大学统计系,博士学位。刘传海教授是国际计算和统计推断学方面的专家,在国际统计学刊物发表逾70余篇,发表英文专著两本。刘传海教授是:美国统计学会(ASAFELLOW和国际统计协会(ISI)会员。现/历任: Journal of the American Statistical Association Statistica SinicaJournal of Computational and Graphical Statistics Journal of Multivariate Analysis等国际知名统计学杂志副主编(Associate Editor)。在业界实践方面,刘传海教授有着十余年在贝尔实验室工作的经验,是国际上最早关注并研究大数据的统计学家之一, 在此期间积累了丰富的统计软件平台开发经验。