Seminar

Wei-Yin Loh

Selection for Nonparametric regression when the number of predictor variables is large

Data sets with very many predictor variables often present special difficulties for model fitting. If some of the variables are irrelevant and their number grows, the prediction accuracy of most model-fitting algorithms will deteriorate. One way to slow the rate of deterioration is to reduce the number of variables by first eliminating those that do not significantly affect the response variable. Although there are numerous variable selection techniques for linear models, it is only very recently that corresponding techniques for nonparametric models have been proposed. One technique uses the variable importance scores from Random Forest (Tuv, Borisov, and Torkkola, 2006) and another is a local polynomial-based tube-hunting method called EARTH (Doksum, Tang, and Tsui, 2007). We introduce yet another approach based on the GUIDE (Loh, 2002) regression tree algorithm and evaluate its computational and statistical effectiveness of the methods on real and simulated datasets.





Seminar Date:
October 29, 2008