Skip to main content Accessibility help
×
Home
Hostname: page-component-684899dbb8-p6h7k Total loading time: 0.239 Render date: 2022-05-27T07:41:34.027Z Has data issue: true Feature Flags: { "shouldUseShareProductTool": true, "shouldUseHypothesis": true, "isUnsiloEnabled": true, "useRatesEcommerce": false, "useNewApi": true }

Variable selection through CART

Published online by Cambridge University Press:  22 October 2014

Marie Sauve
Affiliation:
Lycée Jules Haag, 25000 Besançon, France. m.sauve@sfr.fr
Christine Tuleau-Malot
Affiliation:
Laboratoire Jean-Alexandre Dieudonné, CNRS UMR 6621, Université de Nice Sophia-Antipolis Parc Valrose, 06108 Nice cedex 2, France; christine.malot@unice.fr
Get access

Abstract

This paper deals with variable selection in regression and binary classification frameworks. It proposes an automatic and exhaustive procedure which relies on the use of the CART algorithm and on model selection via penalization. This work, of theoretical nature, aims at determining adequate penalties, i.e. penalties which allow achievement of oracle type inequalities justifying the performance of the proposed procedure. Since the exhaustive procedure cannot be realized when the number of variables is too large, a more practical procedure is also proposed and still theoretically validated. A simulation study completes the theoretical results.

Type
Research Article
Copyright
© EDP Sciences, SMAI 2014

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Arlot, S. and Bartlett, P., Margin adaptive model selection in statistical learning. Bernoulli 17 (2011) 687713. Google Scholar
Birgé, L. and Massart, P., Minimal penalties for gaussian model selection. Probab. Theory Relat. Fields 138 (2007) 3373. Google Scholar
Breiman, L., Random forests. Mach. Learn. 45 (2001) 532. Google Scholar
L. Breiman and A. Cutler, Random forests. http://www.stat.berkeley.edu/users/breiman/RandomForests/ (2005).
L. Breiman, J. Friedman, R. Olshen and C. Stone, Classification and Regression Trees. Chapman et Hall (1984).
Díaz-Uriarte, R. and Alvarez de Andrés, S., Gene selection and classification of microarray data using random forest. BMC Bioinform. 7 (2006) 113. Google ScholarPubMed
Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R., Least angle regression. Ann. Stat. 32 (2004) 407499. Google Scholar
Fan, J. and Lv, J., A selective overview of variable selection in high dimensional feature space. Stat. Sin. 20 (2010) 101148. Google ScholarPubMed
Furnival, G.M. and Wilson, R.W., Regression by leaps and bounds. Technometrics 16 (1974) 499511. Google Scholar
Genuer, R., Poggi, J.M. and Tuleau-Malot, C., Variable selection using random forests. Pattern Recognit. Lett. 31 (2010) 22252236. Google Scholar
S. Gey, Margin adaptive risk bounds for classification trees, hal-00362281.
Gey, S. and Nédélec, E., Model Selection for CART Regression Trees. IEEE Trans. Inf. Theory 51 (2005) 658670. Google Scholar
Ghattas, B. and Ben Ishak, A., Sélection de variables pour la classification binaire en grande dimension: comparaisons et application aux données de biopuces. Journal de la société française de statistique 149 (2008) 4366. Google Scholar
Grömping, U., Estimators of relative importance in linear regression based on variance decomposition. The American Statistician 61 (2007) 139147. Google Scholar
Guyon, I. and Elisseff, A., An introduction to variable and feature selection. J. Mach. Learn. Res. 3 (2003) 11571182. Google Scholar
Guyon, I., Weston, J., Barnhill, S. and Vapnik, V.N., Gene selection for cancer classification using support vector machines. Mach. Learn. 46 (2002) 389422. Google Scholar
T. Hastié, R. Tibshirani and J. Friedman, The Elements of Statistical Learning. Springer (2001).
Hesterberg, T., Choi, N.H., Meier, L. and Fraley, C., Least angle regresion and l1 penalized regression: A review. Stat. Surv. 2 (2008) 6193. Google Scholar
Kohavi, R. and John, G.H., Wrappers for feature subset selection. Artificial Intelligence 97 (1997) 273324. Google Scholar
Koltchinskii, V., Local rademacher complexities and oracle inequalities in risk minimization. Ann. Stat. 34 (2004) 25932656. Google Scholar
Mammen, E. and Tsybakov, A., Smooth discrimination analysis. Ann. Stat. 27 (1999) 18081829. Google Scholar
Massart, P., Some applications of concentration inequalities to statistics. Annales de la faculté des sciences de Toulouse 2 (2000) 245303. Google Scholar
P. Massart, Concentration Inequlaities and Model Selection. Lect. Notes Math. Springer (2003).
P. Massart and E. Nédélec, Risk bounds for statistical learning. Ann. Stat. 34 (2006).
Poggi, J.M. and Tuleau, C., Classification supervisée en grande dimension. Application à l’agrément de conduite automobile. Revue de Statistique Appliquée LIV (2006) 4160. Google Scholar
Rio, E., Une inégalité de bennett pour les maxima de processus empiriques. Ann. Inst. Henri Poincaré, Probab. Stat. 38 (2002) 10531057. Google Scholar
A. Saltelli, K. Chan and M. Scott, Sensitivity Analysis. Wiley (2000).
Sauvé, M., Histogram selection in non gaussian regression. ESAIM PS 13 (2009) 7086. Google Scholar
M. Sauvé and C. Tuleau-Malot, Variable selection through CART, hal-00551375.
Sobol, I.M., Sensitivity estimates for nonlinear mathematical models. Math. Mod. Comput. Experiment 1 (1993) 271280. Google Scholar
Tibshirani, R., Regression shrinkage and selection via Lasso. J. R. Stat. Soc. Ser. B 58 (1996) 267288. Google Scholar
Tsybakov, A.B., Optimal aggregation of classifiers in statistical learning. Ann. Stat. 32 (2004) 135166. Google Scholar

Save article to Kindle

To save this article to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Variable selection through CART
Available formats
×

Save article to Dropbox

To save this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Dropbox account. Find out more about saving content to Dropbox.

Variable selection through CART
Available formats
×

Save article to Google Drive

To save this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Google Drive account. Find out more about saving content to Google Drive.

Variable selection through CART
Available formats
×
×

Reply to: Submit a response

Please enter your response.

Your details

Please enter a valid email address.

Conflicting interests

Do you have any conflicting interests? *