Hostname: page-component-77c89778f8-vpsfw Total loading time: 0 Render date: 2024-07-23T13:04:35.563Z Has data issue: false hasContentIssue false

Recursive bias estimation for multivariate regression smoothers

Published online by Cambridge University Press:  10 October 2014

Pierre-André Cornillon
Affiliation:
IRMAR, UMR 6625, Univ. Rennes 2, 35043 Rennes, France
N. W. Hengartner
Affiliation:
Stochastics Group, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
E. Matzner-Løber
Affiliation:
Lab. Mathématiques Appliquées, Agrocampus Ouest et Univ. Rennes 2, 35043 Rennes, France. eml@uhb.fr
Get access

Abstract

This paper presents a practical and simple fully nonparametric multivariate smoothing procedure that adapts to the underlying smoothness of the true regression function. Our estimator is easily computed by successive application of existing base smoothers (without the need of selecting an optimal smoothing parameter), such as thin-plate spline or kernel smoothers. The resulting smoother has better out of sample predictive capabilities than the underlying base smoother, or competing structurally constrained models (MARS, GAM) for small dimension (3 ≤ d ≤ 7) and moderate sample size n ≤ 1000. Moreover our estimator is still useful when d > 10 and to our knowledge, no other adaptive fully nonparametric regression estimator is available without constrained assumption such as additivity for example. On a real example, the Boston Housing Data, our method reduces the out of sample prediction error by 20%. An R package ibr, available at CRAN, implements the proposed multivariate nonparametric method in R.

Type
Research Article
Copyright
© EDP Sciences, SMAI 2014

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abdous, B., Computationally efficient classes of higher-order kernel functions. Can. J. Statist. 23 (1995) 2127. Google Scholar
L. Breiman, Using adaptive bagging to debias regressions. Technical Report 547, Dpt of Statist., UC Berkeley (1999).
Breiman, L. and Friedman, J., Estimating optimal transformation for multiple regression and correlation. J. Amer. Stat. Assoc. 80 (1995) 580598. Google Scholar
Bühlmann, P. and Yu, B., Boosting with the l 2 loss: Regression and classification. J. Amer. Stat. Assoc. 98 (2003) 324339. Google Scholar
P.-A. Cornillon, N. Hengartner and E. Matzner-Løber, Recursive bias estimation and l 2 boosting. Technical report, ArXiv:0801.4629 (2008).
P.-A. Cornillon, N. Hengartner and Matzner-Løber, ibr: Iterative Bias Reduction. CRAN (2010). http://cran.r-project.org/web/packages/ibr/index.html.
P.-A. Cornillon, N. Hengartner, N. Jégou and Matzner-Løber, Iterative bias reduction: a comparative study. Statist. Comput. (2012).
Craven, P. and Wahba, G., Smoothing noisy data with spline functions. Numer. Math. 31 (1979) 377403. Google Scholar
Di Marzio, M. and Taylor, C., On boosting kernel regression. J. Statist. Plan. Infer. 138 (2008) 24832498. Google Scholar
R. Eubank, Nonparametric regression and spline smoothing. Dekker, 2nd edition (1999).
W. Feller, An introduction to probability and its applications, vol. 2. Wiley (1966).
Friedman, J., Multivariate adaptive regression splines. Ann. Statist. 19 (1991) 337407. Google Scholar
Friedman, J., Greedy function approximation: A gradient boosting machine. Ann. Statist. 28 (11891232) (2001). Google Scholar
Friedman, J. and Stuetzle, W., Projection pursuit regression. J. Amer. Statist. Assoc. 76 (817823) (1981). Google Scholar
Friedman, J., Hastie, T. and Tibshirani, R., Additive logistic regression: a statistical view of boosting. Ann. Statist. 28 (2000) 337407. Google Scholar
C. Gu, Smoothing spline ANOVA models. Springer (2002).
L. Gyorfi, M. Kohler, A. Krzyzak and H. Walk, A Distribution-Free Theory of Nonparametric Regression. Springer Verlag (2002).
D. Harrison and D. Rubinfeld, Hedonic prices and the demand for clean air. J. Environ. Econ. Manag. (1978) 81–102.
T. Hastie and R. Tibshirani, Generalized Additive Models. Chapman & Hall (1995).
R.A. Horn and C.R. Johnson, Matrix analysis. Cambridge (1985).
Hurvich, C., Simonoff, G. and Tsai, C.L., Smoothing parameter selection in nonparametric regression using and improved akaike information criterion. J. Roy. Stat. Soc. B 60 (1998) 271294. Google Scholar
Lepski, O., Asymptotically minimax adaptive estimation. I: upper bounds. optimally adaptive estimates. Theory Probab. Appl. 37 (1991) 682697. Google Scholar
Li, K.-C., Asymptotic optimality for C p, C L, cross-validation and generalized cross-validation: Discrete index set. Ann. Statist. 15 (1987) 958975. Google Scholar
Ridgeway, G., Additive logistic regression: a statistical view of boosting: Discussion. Ann. Statist. 28 (2000) 393400. Google Scholar
L. Schwartz, Analyse IV applications à la théorie de la mesure. Hermann (1993).
W. Stuetzle and Y. Mittal, Some comments on the asymptotic behavior of robust smoothers, in Smoothing Techniques for Curve Estimation, edited by T. Gasser and M. Rosenblatt. Springer-Verlag (1979) 191–195.
J. Tukey, Explanatory Data Analysis. Addison-Wesley (1977).
F. Utreras, Convergence rates for multivariate smoothing spline functions. J. Approx. Theory (1988) 1–27.
J. Wendelberger, Smoothing Noisy Data with Multivariate Splines and Generalized Cross-Validation. PhD thesis, University of Wisconsin (1982).
Wood, S., Thin plate regression splines. J. R. Statist. Soc. B 65 (2003) 95114. Google Scholar
Yang, Y., Combining different procedures for adaptive regression. J. Mult. Analysis 74 (2000) 135161. Google Scholar