Skip to main content Accessibility help
×
Home

Robust Wasserstein profile inference and applications to machine learning

  • Jose Blanchet (a1), Yang Kang (a2) and Karthyek Murthy (a3)

Abstract

We show that several machine learning estimators, including square-root least absolute shrinkage and selection and regularized logistic regression, can be represented as solutions to distributionally robust optimization problems. The associated uncertainty regions are based on suitably defined Wasserstein distances. Hence, our representations allow us to view regularization as a result of introducing an artificial adversary that perturbs the empirical distribution to account for out-of-sample effects in loss estimation. In addition, we introduce RWPI (robust Wasserstein profile inference), a novel inference methodology which extends the use of methods inspired by empirical likelihood to the setting of optimal transport costs (of which Wasserstein distances are a particular case). We use RWPI to show how to optimally select the size of uncertainty regions, and as a consequence we are able to choose regularization parameters for these machine learning estimators without the use of cross validation. Numerical experiments are also given to validate our theoretical findings.

Copyright

Corresponding author

*Postal address: Management Science and Engineering, Stanford University, 475 Via Ortega, Stanford, CA 94305, USA.
**Postal address: Columbia University, 1255 Amsterdam Avenue, Rm 1005, New York, NY 10027, USA.
***Postal address: Singapore University of Technology and Design, 8 Somapah Road, Singapore 487372, Singapore.

Footnotes

Hide All

The supplementary material for this article can be found at http://doi.org/10.1017/jpr.2019.49

Footnotes

References

Hide All
[1] Banerjee, A., Chen, S., Fazayeli, F. and Sivakumar, V. (2014). Estimation with norm regularization. In Proc. Advances in Neural Information Processing Systems 27, Neural Information Processing Systems Foundation, pp. 1556–1564.
[2] Belloni, A., Chernozhukov, V. and Wang, L. (2011). Square-root LASSO: pivotal recovery of sparse signals via conic programming. Biometrika 98, 791806.
[3] Bertsimas, D. and Copenhaver, M. S. (2018). Characterization of the equivalence of robustification and regularization in linear and matrix regression. Europ. J. Operat. Res. 270, 931942.
[4] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of LASSO and Dantzig selector. Ann. Statist. 37, 17051732.
[5] Billingsley, P. (2013). Convergence of Probability Measures. John Wiley & Sons, Chichester.
[6] Blanchet, J. and Kang, Y. (2016). Sample out-of-sample inference based on Wasserstein distance. Preprint, arXiv:1605.01340.
[7] Blanchet, J. and Kang, Y. (2017). Semi-supervised learning based on distributionally robust optimization. Preprint, arXiv:1702.08848.
[8] Blanchet, J. and Murthy, K. (2016). Quantifying distributional model risk via optimal transport. Preprint, arXiv:1604.01446.
[9] Blanchet, J., Kang, Y. and Murthy, K. (2019). Robust Wasserstein profile inference and applications to machine learning. Supplementary material. Available at http://doi.org/jpr.2019.49.
[10] Blanchet, J., Murthy, K. and Si, N. (2018). Confidence regions for optimal transport based distributionally robust optimization problems. In preparation.
[11] Bravo, F. (2004). Empirical likelihood based inference with applications to some econometric models. Econometric Theory 20, 231264.
[12] Candes, E. and Tao, T. (2007). The Dantzig selector: statistical estimation when p is much larger than n . Ann. Statist. 35, 23132351.
[13] Chen, S. X. and Hall, P. (1993). Smoothed empirical likelihood confidence intervals for quantiles. Ann. Statist. 21, 11661181.
[14] Duchi, J., Glynn, P. and Namkoong, H. (2016). Statistics of robust optimization: a generalized empirical likelihood approach. Preprint, arXiv:1610.03425.
[15] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32, 407499.
[16] Esfahani, P. and Kuhn, D. (2015). Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations. Preprint, arXiv:1505.05116.
[17] Fournier, N. and Guillin, A. (2015). On the rate of convergence in Wasserstein distance of the empirical measure. Prob. Theory Relat. Fields 162, 707738.
[18] Frogner, C., Zhang, C., Mobahi, H., Araya, M. and Poggio, T. A. (2015). Learning with a Wasserstein loss. In Proc. Advances in Neural Information Processing Systems 28, Neural Information Processing Systems Foundation, pp. 2053–2061.
[19] Gao, R. and Kleywegt, A. J. (2016). Distributionally robust stochastic optimization with Wasserstein distance. Preprint, arXiv:1604.02199v1.
[20] Gotoh, J.-Y., Kim, M. J. and Lim, A. E. (2017). Calibration of distributionally robust empirical optimization models. Preprint, arXiv:1711.06565.
[21] Hastie, T., Tibshirani, R., Friedman, J. and Franklin, J. (2005). The elements of statistical learning: data mining, inference and prediction. Math. Intell. 27, 8385.
[22] Hjort, N. L., McKeague, I. and Van Keilegom, I. (2009). Extending the scope of empirical likelihood. Ann. Statist. 37, 10791111.
[23] Isii, K. (1962). On sharpness of Tchebycheff-type inequalities. Ann. Inst. Statist. Math. 14, 185197.
[24] Knight, K. and Fu, W. (2000). Asymptotics for LASSO-type estimators. Ann. Statist. 28, 13561378.
[25] Lam, H. (2016). Recovering best statistical guarantees via the empirical divergence-based distributionally robust optimization. Preprint, arXiv:1605.09349.
[26] Lam, H. and Zhou, E. (2016). The empirical likelihood approach to quantifying uncertainty in sample average approximation. Preprint, arXiv:1604.02573.
[27] Li, X., Zhao, T., Yuan, X. and Liu, H. (2015). The flare package for high dimensional linear regression and precision matrix estimation in R. J. Mach. Learn. Res. 16, 553557.
[28] Mohajerin Esfahani, P. and Kuhn, D. (2018). Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations. Math. Program. 171, 115166.
[29] Negahban, S., Ravikumar, P., Wainwright, M. and Yu, B. (2012). A unified framework for high-dimensional analysis of M-Estimators with decomposable regularizers. Statist. Sci. 27, 538557.
[30] Owen, A. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika 75, 237249.
[31] Owen, A. (1990). Empirical likelihood ratio confidence regions. Ann. Statist. 18 90120.
[32] Owen, A. (1991). Empirical likelihood for linear models. Ann. Statist. 19, 17251747.
[33] Owen, A. (2001). Empirical Likelihood. CRC Press, Boca Raton, FL.
[34] Peyré, G., Cuturi, M. and Solomon, J. (2016). Gromov–Wasserstein averaging of kernel and distance matrices. In Proc. Int. Conf. Machine Learning, Vol. 48. International Machine Learning Society, pp. 2664–2672.
[35] Qin, J. and Lawless, J. (1994). Empirical likelihood and general estimating equations. Ann. Statist. 22, 300325.
[36] Rachev, S. T. and Rüschendorf, L. (1998). Mass Transportation Problems. Volume II: Applications. Springer Science & Business Media, New York.
[37] Rachev, S. T. and Rüschendorf, L. (1998). Mass Transportation Problems. Volume I: Theory. Springer Science & Business Media, New York.
[38] Rubner, Y., Tomasi, C. and Guibas, L. J. (2000). The earth mover’s distance as a metric for image retrieval. Internat. J. Comput. Vision 40, 99121.
[39] Seguy, V. and Cuturi, M. (2015). Principal geodesic analysis for probability measures under the optimal transport metric. In Advances in Neural Information Processing Systems, Vol. 28. Neural Information Processing Systems Foundation, pp. 33123320.
[40] Shafieezadeh-Abadeh, S., Esfahani, P. and Kuhn, D. (2015). Distributionally robust logistic regression. In Advances in Neural Information Processing Systems, Vol. 28. Neural Information Processing Systems Foundation, pp. 15761584.
[41] Shafieezadeh-Abadeh, S., Kuhn, D. and Esfahani, P. M. (2017). Regularization via mass transportation. Preprint, arXiv:1710.10016.
[42] Shapiro, A. (2001). On duality theory of conic linear problems. In Semi-Infinite Programming, eds Goberna, M. Á. and López, M. A., Springer, New York, pp. 135165.
[43] Smith, J. (1995). Generalized Chebychev inequalities: theory and applications in decision analysis. Operat. Res. 43, 807825.
[44] Solomon, J., Rustamov, R., Guibas, L. and Butscher, A. (2014). Earth mover’s distances on discrete surfaces. ACM Trans. Graph. 33, 67:167:12.
[45] Srivastava, S., Cevher, V., Tran-Dinh, Q. and Dunson, D. B. (2015). WASP: scalable Bayes via barycenters of subset posteriors. In Proc. Machine Learning Research, Vol. 38, pp. 912–920.
[46] Talagrand, M. (1992). Matching random samples in many dimensions. Ann. Appl. Probab. 2, 846856.
[47] Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. J. R. Statist. Soc. B [Statist. Methodology] 58, 267288.
[48] Villani, C. (2008). Optimal Transport: Old and New. Springer Science & Business Media, New York.
[49] Wu, C. (2004). Weighted empirical likelihood inference. Statist. Prob. Lett. 66, 6779.
[50] Xu, H., Caramanis, C. and Mannor, S. (2009). Robustness and regularization of support vector machines. J. Mach. Learn. Res. 10, 14851510.
[51] Xu, H., Caramanis, C. and Mannor, S. (2009). Robust regression and LASSO. In Advances in Neural Information Processing Systems, Vol. 21. Neural Information Processing Systems Foundation, pp. 18011808.
[52] Zhou, M. (2015). Empirical Likelihood Method in Survival Analysis. CRC Press, Boca Raton, FL.

Keywords

MSC classification

Type Description Title
PDF
Supplementary materials

Blanchet et al. supplementary material
Blanchet et al. supplementary material

 PDF (816 KB)
816 KB

Robust Wasserstein profile inference and applications to machine learning

  • Jose Blanchet (a1), Yang Kang (a2) and Karthyek Murthy (a3)

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed