Hostname: page-component-848d4c4894-cjp7w Total loading time: 0 Render date: 2024-07-03T19:41:29.888Z Has data issue: false hasContentIssue false

FUNCTIONAL SEQUENTIAL TREATMENT ALLOCATION WITH COVARIATES

Published online by Cambridge University Press:  16 March 2023

Anders Bredahl Kock
Affiliation:
University of Oxford
David Preinerstorfer
Affiliation:
University of St.Gallen
Bezirgen Veliyev*
Affiliation:
Aarhus University
*
Address correspondence to Bezirgen Veliyev, Department of Economics and Business Economics, Aarhus University, Fuglesangs Alle 4, 8210 Aarhus V, Denmark; e-mail: bveliyev@econ.au.dk..

Abstract

We consider a sequential treatment problem with covariates. Given a realization of the covariate vector, instead of targeting the treatment with highest conditional expectation, the decision-maker targets the treatment which maximizes a general functional of the conditional potential outcome distribution, e.g., a conditional quantile, trimmed mean, or a socioeconomic functional such as an inequality, welfare, or poverty measure. We develop expected regret lower bounds for this problem and construct a near minimax optimal sequential assignment policy.

Type
ARTICLES
Copyright
© The Author(s), 2023. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

We are grateful to the Editor, a Co-Editor, and three anonymous referees for their valuable comments.

References

REFERENCES

Athey, S. & Wager, S. (2021) Policy learning with observational data. Econometrica 89, 133161.CrossRefGoogle Scholar
Audibert, J.-Y. & Tsybakov, A.B. (2007) Fast learning rates for plug-in classifiers. Annals of Statistics 35, 608633.CrossRefGoogle Scholar
Besson, L. & Kaufmann, E. (2018) What doubling tricks can and can’t do for multi-armed bandits. Preprint, arXiv:1803.06971.Google Scholar
Bhattacharya, D. & Dupas, P. (2012) Inferring welfare maximizing treatment assignment under budget constraints. Journal of Econometrics 167, 168196.CrossRefGoogle Scholar
Bitler, M.P., Gelbach, J.B., & Hoynes, H.W. (2006) What mean impacts miss: Distributional effects of welfare reform experiments. American Economic Review 96, 9881012.CrossRefGoogle Scholar
Cassel, A., Mannor, S., & Zeevi, A. (2018) A general approach to multi-armed bandits under risk criteria. In S. Bubeck, V. Perchet, & P. Rigollet (eds.), Proceedings of the 31st Conference on Learning Theory, vol. 75, pp. 12951306. PMLR.Google Scholar
Chakravarty, S.R. (2009) Inequality, Polarization and Poverty. Springer.CrossRefGoogle Scholar
Chamberlain, G. (2000) Econometrics and decision theory. Journal of Econometrics 95, 255283.CrossRefGoogle Scholar
Cowell, F. (2011) Measuring Inequality. Oxford University Press.CrossRefGoogle Scholar
Currie, J.M. & MacLeod, W.B. (2020) Understanding doctor decision making: The case of depression treatment. Econometrica 88, 847878.CrossRefGoogle ScholarPubMed
Dehejia, R.H. (2005) Program evaluation as a decision problem. Journal of Econometrics 125, 141173.CrossRefGoogle Scholar
Folland, G.B. (1999) Real Analysis: Modern Techniques and Their Applications. Wiley.Google Scholar
Hirano, K. & Porter, J.R. (2009) Asymptotics for statistical treatment rules. Econometrica 77, 16831701.Google Scholar
Hirano, K. & Porter, J.R. (2020) Asymptotic analysis of statistical decision rules in econometrics. In S.N. Durlauf, L. Peter Hansen, J.J. Heckman, & R.L. Matzkin (eds.), Handbook of Econometrics, vol. 7A, pp. 283354. Elsevier.CrossRefGoogle Scholar
Kallenberg, O. (2001) Foundations of Modern Probability, 2nd Edition. Springer Science & Business Media.Google Scholar
Kasy, M. & Sautmann, A. (2021) Adaptive treatment assignment in experiments for policy choice. Econometrica 89, 113132.CrossRefGoogle Scholar
Kitagawa, T. & Tetenov, A. (2018) Who should be treated? Empirical welfare maximization methods for treatment choice. Econometrica 86, 591616.CrossRefGoogle Scholar
Kitagawa, T. & Tetenov, A. (2019) Equality-minded treatment choice. Journal of Business & Economic Statistics 39, 561574.CrossRefGoogle Scholar
Kock, A.B., Preinerstorfer, D., & Veliyev, B. (2022) Functional sequential treatment allocation. Journal of the American Statistical Association 117, 13111323.CrossRefGoogle Scholar
Kock, A.B. & Thyrsgaard, M. (2018) Optimal sequential treatment allocation. Preprint, arXiv:1705.09952.Google Scholar
Lambert, P.J. (2001) The Distribution and Redistribution of Income. Manchester University Press.Google Scholar
Liese, F. & Miescke, K.J. (2008) Statistical Decision Theory. Springer.Google Scholar
Ma, X., Zhang, Q., Xia, L., Zhou, Z., Yang, J., & Zhao, Q. (2020) Distributional soft actor critic for risk sensitive learning. Preprint, arXiv:2004.14547.Google Scholar
Maillard, O.-A. (2013) Robust risk-averse stochastic multi-armed bandits. In Jain, S., Munos, R., Stephan, F., & Zeugmann, T. (eds), Algorithmic Learning Theory, pp. 218233. Springer.CrossRefGoogle Scholar
Mammen, E. & Tsybakov, A.B. (1999) Smooth discrimination analysis. Annals of Statistics 27, 18081829.CrossRefGoogle Scholar
Manski, C.F. (2004) Statistical treatment rules for heterogeneous populations. Econometrica 72, 12211246.CrossRefGoogle Scholar
Manski, C.F. (2019) Treatment choice with trial data: Statistical decision theory should supplant hypothesis testing. American Statistician 73, 296304.CrossRefGoogle Scholar
Manski, C.F. & Tetenov, A. (2016) Sufficient trial size to inform clinical practice. Proceedings of the National Academy of Sciences 113, 1051810523.CrossRefGoogle ScholarPubMed
McDonald, J.B. (1984) Some generalized functions for the size distribution of income. Econometrica 52, 647663.CrossRefGoogle Scholar
McDonald, J.B. & Ransom, M. (2008) The generalized beta distribution as a model for the distribution of income: Estimation of related measures of inequality. In D. Chotikapanich (ed.), Modeling Income Distributions and Lorenz Curves, pp. 147166. Springer.CrossRefGoogle Scholar
Perchet, V. & Rigollet, P. (2013) The multi-armed bandit problem with covariates. Annals of Statistics 41, 693721.CrossRefGoogle Scholar
Rigollet, P. & Zeevi, A. (2010) Nonparametric bandits with covariates. In: In A.T. Kalai & M. Mohri (eds.), Proceedings title: 23rd Annual Conference on Learning Theory, pp. 5466. Omnipress.Google Scholar
Rostek, M. (2010) Quantile maximization in decision theory. Review of Economic Studies 77, 339371.CrossRefGoogle Scholar
Sani, A., Lazaric, A., & Munos, R. (2012) Risk-aversion in multi-armed bandits. In Pereira, F., Burges, C.J.C., Bottou, L., & Weinberger, K.Q. (eds), Advances in Neural Information Processing Systems 25, pp. 32753283. Curran Associates, Inc. Google Scholar
Sen, A. (1974) Informational bases of alternative welfare approaches: Aggregation and income distribution. Journal of Public Economics 3, 387403.CrossRefGoogle Scholar
Serfling, R.J. (1984) Generalized L-, M-, and R-statistics. Annals of Statistics 12, 7686.CrossRefGoogle Scholar
Shalev-Shwartz, S. (2012) Online learning and online convex optimization . Foundations and Trends® in Machine Learning 4, 107194.CrossRefGoogle Scholar
Si, N., Zhang, F., Zhou, Z., & Blanchet, J. (2020b) Distributionally robust policy evaluation and learning in offline contextual bandits. In H. Daume III & A. Singh (eds.), International Conference on Machine Learning, pp. 88848894. PMLR.Google Scholar
Si, N., Zhang, F., Zhou, Z., & Blanchet, J. (2020a) Distributional robust batch contextual bandits. Preprint, arXiv:2006.05630.Google Scholar
Stoye, J. (2009) Minimax regret treatment choice with finite samples. Journal of Econometrics 151, 7081.CrossRefGoogle Scholar
Stoye, J. (2012) Minimax regret treatment choice with covariates or with limited validity of experiments. Journal of Econometrics 166, 138156.CrossRefGoogle Scholar
Tetenov, A. (2012) Statistical treatment choice based on asymmetric minimax regret criteria. Journal of Econometrics 166, 157165.CrossRefGoogle Scholar
Thurow, L.C. (1970) Analyzing the American income distribution. American Economic Review 60, 261269.Google Scholar
Tran-Thanh, L. & Yu, J.Y. (2014) Functional bandits. Preprint, arXiv:1405.2432.Google Scholar
Tsybakov, A.B. (2004) Optimal aggregation of classifiers in statistical learning. Annals of Statistics 32, 135166.CrossRefGoogle Scholar
Tsybakov, A.B. (2009) Introduction to Nonparametric Estimation. Springer.CrossRefGoogle Scholar
Vakili, S., Boukouvalas, A., & Zhao, Q. (2019) Decision variance in online learning. Preprint, arXiv:1807.09089.Google Scholar
Vakili, S. & Zhao, Q. (2016) Risk-averse multi-armed bandit problems under mean-variance measure. IEEE Journal of Selected Topics in Signal Processing 10, 10931111.CrossRefGoogle Scholar
Woodroofe, M. (1979) A one-armed bandit problem with a concomitant variable. Journal of the American Statistical Association 74, 799806.CrossRefGoogle Scholar
Yang, Y. & Zhu, D. (2002) Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates. Annals of Statistics 30, 100121.CrossRefGoogle Scholar
Zhou, Z., Athey, S., & Wager, S. (2022) Offline multi-action policy learning: Generalization and optimization. Operations Research 71, 148183.CrossRefGoogle Scholar
Zhou, Z., Zhou, Z., Bai, Q., Qiu, L., Blanchet, J., & Glynn, P. (2021b) Finite-sample regret bound for distributionally robust offline tabular reinforcement learning. In A. Banerjee & K. Fukumizu (eds.), International Conference on Artificial Intelligence and Statistics, pp. 33313339. PMLR.Google Scholar
Zimin, A., Ibsen-Jensen, R., & Chatterjee, K. (2014) Generalized risk-aversion in stochastic multi-armed bandits. Preprint, arXiv:1405.0833.Google Scholar