讲座题目:Interaction-based learning in big data: A method of Partitions and I-score
主讲人:美国哥伦比亚大学统计系教授Shaw-Hwa Lo(羅小華)
时间:2015年6月11日上午10:00
地点:九里校区4404B教室
Abstract: We consider a computer intensive approach (Partition Retention (PR), Chernoff, Lo and Zheng (09)), based on an earlier method (Lo and Zheng (2002) for detecting which, of many potential explanatory variables, have an influence on a dependent variable Y. This approach is suited to detect influential variables in groups, where causal effects depend on the confluence of values of several variables. It has the advantage of avoiding a difficult direct analysis, involving possibly thousands of variables, guided by a measure of influence I. The main objective is to discover the influential variables, rather than to measure their effects. Once they are detected, the problem of dealing with a much smaller group of influential variables should be vulnerable to standard analysis. We are confining our attention to locating a few needles in a haystack.
The quality of variables selected is evaluated in two ways: first by classification error rates, then by functional relevance using external biological knowledge. We demonstrate that (1) the classification error rates can be significantly reduced by considering interactions; (2) incorporating interaction information into data analysis can be very rewarding in generating novel scientific findings. Heuristic explanations why and when the proposed methods may lead to such a dramatic (classification/ predictive) gain are briefly discussed.
主讲人简介:
羅小華教授毕业于University of California,Berkeley,CA,曾经任教于Department of Statistics, Rutgers University与Department of Statistics, Harvard University。后任教于美国哥伦比亚大学统计系,1998到2004年担任哥伦比亚大学统计系主任。
羅小華教授具有卓越的学术造诣,是统计学领域、数学领域、大数据处理领域具有影响力的科学家,美国统计协会Fellow。羅小華教授的重要成果HTHA方法,是统计学、计算机科学重要方法。
欢迎对机器学习、大数据处理、统计学、数学相关领域感兴趣的老师、同学参加。