Skip to main content Accessibility help
Partially Observed Markov Decision Processes
  • Get access
    Check if you have access via personal or institutional login
  • Cited by 56
  • Export citation
  • Recommend to librarian
  • Buy the print book

Book description

Covering formulation, algorithms, and structural results, and linking theory to real-world applications in controlled sensing (including social learning, adaptive radars and sequential detection), this book focuses on the conceptual foundations of partially observed Markov decision processes (POMDPs). It emphasizes structural results in stochastic dynamic programming, enabling graduate students and researchers in engineering, operations research, and economics to understand the underlying unifying themes without getting weighed down by mathematical technicalities. Bringing together research from across the literature, the book provides an introduction to nonlinear filtering followed by a systematic development of stochastic dynamic programming, lattice programming and reinforcement learning for POMDPs. Questions addressed in the book include: when does a POMDP have a threshold optimal policy? When are myopic policies optimal? How do local and global decision makers interact in adaptive decision making in multi-agent social learning where there is herding and data incest? And how can sophisticated radars and sensors adapt their sensing in real time?

Refine List

Actions for selected content:

Select all | Deselect all
  • View selected items
  • Export citations
  • Download PDF (zip)
  • Send to Kindle
  • Send to Dropbox
  • Send to Google Drive

Save Search

You can save your searches here and later view and run them again in "My saved searches".

Please provide a title, maximum of 40 characters.


Page 1 of 2

Page 1 of 2

[1] D., Aberdeen and J., Baxter, Scaling internal-state policy-gradient methods for POMDPs. In International Conference on Machine Learning, pp. 3–10, 2002.
[2] J., Abounadi, D. P., Bertsekas and V., Borkar, Learning algorithms for Markov decision processes with average cost. SIAM Journal on Control and Optimization, 40(3):681–98, 2001.
[3] D., Acemoglu and A., Ozdaglar, Opinion dynamics and learning in social networks. Dynamic Games and Applications, 1(1):3–49, 2011.
[4] S., Afriat, The construction of utility functions from expenditure data. International Economic Review, 8(1):67–77, 1967.
[5] S., Afriat, Logic of Choice and Economic Theory. (Oxford: Clarendon Press, 1987).
[6] S., Agrawal and N., Goyal, Analysis of Thompson sampling for the multi-armed bandit problem. In Proceedings 25th Annual Conference Learning Theory, volume 23, 2012.
[7] R., Ahuja and J., Orlin, Inverse optimization. Operations Research, 49(5):771–83, 2001.
[8] I. F., Akyildiz, W., Su, Y., Sankarasubramaniam and E., Cayirci, Wireless sensor networks: A survey. Computer Networks, 38(4):393–422, 2002.
[9] A., Albore, H., Palacios and H., Geffner, A translation-based approach to contingent planning. In International Joint Conference on Artificial Intelligence, pp. 1623–28, 2009.
[10] S. C., Albright, Structural results for partially observed Markov decision processes. Operations Research, 27(5):1041–53, Sept.–Oct. 1979.
[11] E., Altman, Constrained Markov Decision Processes. (London: Chapman and Hall, 1999).
[12] E., Altman, B., Gaujal and A., Hordijk, Discrete-Event Control of Stochastic Networks: Multimodularity and Regularity. (Springer-Verlag, 2004).
[13] T., Ben-Zvi and A., Grosfeld-Nir, Partially observed Markov decision processes with binomial observations. Operations Research Letters, 41(2):201–6, 2013.
[14] M., Dorigo and M., Gambardella, Ant-q: A reinforcement learning approach to the traveling salesman problem. In Proceedings of the 12th International Conference on Machine Learning, pp. 252–60, 2014.
[15] R., Amir, Supermodularity and complementarity in economics: An elementary survey. Southern Economic Journal, 71(3):636–60, 2005.
[16] M. S., Andersland and D., Teneketzis, Measurement scheduling for recursive team estimation. Journal of Optimization Theory and Applications, 89(3):615–36, June 1996.
[17] B. D. O., Anderson and J. B., Moore, Optimal Filtering. (Englewood Cliffs, NJ: Prentice Hall, 1979).
[18] B. D. O., Anderson and J. B., Moore, Optimal Control: Linear Quadratic Methods. (Englewood Cliffs, NJ: Prentice Hall, 1989).
[19] S., Andradottir, A global search method for discrete stochastic optimization. SIAM Journal on Optimization, 6(2):513–30, May 1996.
[20] S., Andradottir, Accelerating the convergence of random search methods for discrete stochastic optimization. ACM Transactions on Modelling and Computer Simulation, 9(4):349–80, Oct. 1999.
[21] A., Arapostathis, V., Borkar, E., Fernández-Gaucherand, M. K., Ghosh and S. I., Marcus, Discrete-time controlled Markov processes with average cost criterion: A survey. SIAM Journal on Control and Optimization, 31(2):282–344, 1993.
[22] P., Artzner, F., Delbaen, J., Eber and D., Heath, Coherent measures of risk. Mathematical Finance, 9(3):203–28, July 1999.
[23] P., Artzner, F., Delbaen, J., Eber, D., Heath and H., Ku, Coherent multiperiod risk adjusted values and bellmans principle. Annals of Operations Research, 152(1):5–22, 2007.
[24] K. J., Åström, Optimal control of Markov processes with incomplete state information. Journal of Mathematical Analysis and Applications, 10(1):174–205, 1965.
[25] R., Atar and O., Zeitouni, Lyapunov exponents for finite state nonlinear filtering. SIAM Journal on Control and Optimization, 35(1):36–55, 1997.
[26] S., Athey, Monotone comparative statics under uncertainty. The Quarterly Journal of Economics, 117(1):187–223, 2002.
[27] P., Auer, N., Cesa-Bianchi and P., Fischer, Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2-3):235–56, 2002.
[28] A., Young and S., Russell, Algorithms for inverse reinforcement learning. In Proceedings of the 17th International Conference on Machine Learning, pp. 663–70, 2000.
[29] A., Banerjee, A simple model of herd behavior. Quaterly Journal of Economics, 107(3):797–817, August 1992.
[30] A., Banerjee, X., Guo and H., Wang, On the optimality of conditional expectation as a Bregman predictor. IEEE Transactions on Information Theory, 51(7):2664–9, 2005.
[31] T., Banerjee and V., Veeravalli, Data-efficient quickest change detection with on-off observation control. Sequential Analysis, 31:40–77, 2012.
[32] Y., Bar-Shalom, X. R., Li and T., Kirubarajan, Estimation with Applications to Tracking and Navigation. John Wiley, New York, 2008.
[33] J. S., Baras and A., Bensoussan, Optimal sensor scheduling in nonlinear filtering of diffusion processes. SIAM Journal Control and Optimization, 27(4):786–813, July 1989.
[34] G., Barles and P. E., Souganidis, Convergence of approximation schemes for fully nonlinear second order equations. In Asymptotic Analysis, number 4, pp. 2347–9, 1991.
[35] P., Bartlett and J., Baxter, Estimation and approximation bounds for gradient-based reinforcement learning. Journal of Computer and System Sciences, 64(1):133–50, 2002.
[36] M., Basseville and I.V., Nikiforov, Detection of Abrupt Changes — Theory and Applications. Information and System Sciences Series. (Englewood Cliffs, NJ: Prentice Hall, 1993).
[37] N., Bäuerle and U., Rieder, More risk-sensitive Markov decision processes. Mathematics of Operations Research, 39(1):105–20, 2013.
[38] L. E., Baum and T., Petrie, Statistical inference for probabilistic functions of finite state Markov chains. Annals of Mathematical Statistics, 37:1554–63, 1966.
[39] L. E., Baum, T., Petrie, G., Soules and N., Weiss, A maximisation technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics, 41(1):164–71, 1970.
[40] R., Bellman, Dynamic Programming. 1st edition (Princeton, NJ: Princeton University Press, 1957).
[41] M., Benaim and M., Faure, Consistency of vanishingly smooth fictitious play. Mathematics of Operations Research, 38(3):437–50, Aug. 2013.
[42] M., Benaim, J., Hofbauer and S., Sorin, Stochastic approximations and differential inclusions. SIAM Journal on Control and Optimization, 44(1):328–48, 2005.
[43] M., Benaim, J., Hofbauer and S., Sorin, Stochastic approximations and differential inclusions, Part II: Applications. Mathematics of Operations Research, 31(3):673–95, 2006.
[44] M., Benaim and J., Weibull, Deterministic approximation of stochastic evolution in games. Econometrica, 71(3):873–903, 2003.
[45] V. E., Benes̆, Exact finite-dimensional filters for certain diffusions with nonlinear drift. Stochastics, 5:65–92, 1981.
[46] A., Bensoussan, Stochastic Control of Partially Observable Systems. (Cambridge University Press, 1992).
[47] A., Bensoussan and J., Lions, Impulsive Control and Quasi-Variational Inequalities. (Paris: Gauthier-Villars, 1984).
[48] A., Benveniste, M., Metivier and P., Priouret, Adaptive Algorithms and Stochastic Approximations, volume 22 of Applications of Mathematics. (Springer-Verlag, 1990).
[49] D. P., Bertsekas, Dynamic Programming and Optimal Control, volume 1 and 2. (Belmont, MA: Athena Scientific, 2000).
[50] D. P., Bertsekas, Nonlinear Programming. (Belmont, MA: Athena Scientific, 2000).
[51] D. P., Bertsekas, Dynamic programming and suboptimal control: A survey from ADP to MPC. European Journal of Control, 11(4):310–34, 2005.
[52] D. P., Bertsekas and S. E., Shreve, Stochastic Optimal Control: The Discrete-Time Case. (New York, NY: Academic Press, 1978).
[53] D. P., Bertsekas and J. N., Tsitsiklis, Neuro-Dynamic Programming. (Belmont,MA: Athena Scientific, 1996).
[54] D. P., Bertsekas and H., Yu, Q-learning and enhanced policy iteration in discounted dynamic programming. Mathematics of Operations Research, 37(1):66–94, 2012.
[55] L., Bianchi, M., Dorigo, L., Gambardella and W., Gutjahr, A survey on metaheuristics for stochastic combinatorial optimization. Natural Computing: An International Journal, 8(2):239–87, 2009.
[56] S., Bikchandani, D., Hirshleifer and I., Welch, A theory of fads, fashion, custom, and cultural change as information cascades. Journal of Political Economy, 100(5):992–1026, October 1992.
[57] P., Billingsley, Statistical inference for Markov processes, volume 2. (University of Chicago Press, 1961).
[58] P., Billingsley, Convergence of Probability Measures. (New York, NY: John Wiley, 1968).
[59] P., Billingsley, Probability and Measure. (New York, NY: John Wiley, 1986).
[60] S., Blackman and R., Popoli, Design and Analysis of Modern Tracking Systems. (Artech House, 1999).
[61] R., Bond, C., Fariss, J., Jones, A., Kramer, C., Marlow, J., Settle and J., Fowler, A 61-millionperson experiment in social influence and political mobilization. Nature, 489:295–8, September 2012.
[62] J. G., Booth and J. P., Hobert, Maximizing generalized linear mixed model likelihoods with an automated monte carlo em algorithm. Journal Royal Statistical Society, 61:265–85, 1999.
[63] V. S., Borkar, Stochastic Approximation. A Dynamical Systems Viewpoint. (Cambridge University Press, 2008).
[64] S., Bose, G., Orosel, M., Ottaviani and L., Vesterlund, Dynamic monopoly pricing and herding. The RAND Journal of Economics, 37(4):910–28, 2006.
[65] S., Boucheron, G., Lugosi and P., Massart, Concentration Inequalities: A Nonasymptotic Theory of Independence. (Oxford University Press, 2013).
[66] S., Boyd, P., Diaconis and L., Xiao, Fastest mixing Markov chain on a graph. SIAM Review, 46(4):667–89, 2004.
[67] S., Boyd, N., Parikh, E., Chu, B., Peleato and J., Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1):1–122, 2011.
[68] S., Boyd and L., Vandenberghe, Convex Optimization. (Cambridge University Press, 2004).
[69] P., Bremaud. Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues. (Springer-Verlag, 1999).
[70] R. W., Brockett and J. M. C., Clarke. The geometry of the conditional density equation. In O. L. R., Jacobs et al., editor, Analysis and Optimization of Stochastic Systems, pp. 299–309 (New York, 1980).
[71] S., Bubeck and N., Cesa-Bianchi, Regret analysis of stochastic and nonstochastic multiarmed bandit problems. arXiv preprint arXiv:1204.5721, 2012.
[72] S., Bundfuss and M., Dür, Algorithmic copositivity detection by simplicial partition. Linear Algebra and Its Applications, 428(7):1511–23, 2008.
[73] S., Bundfuss and M., Dür, An adaptive linear approximation algorithm for copositive programs. SIAM Journal on Optimization, 20(1):30–53, 2009.
[74] P. E., Caines. Linear Stochastic Systems. (John Wiley, 1988).
[75] E. J., Candès and T., Tao, The power of convex relaxation: Near-optimal matrix completion. IEEE Transactions on Information Theory, 56(5):2053–80, May 2009.
[76] O., Cappe, E., Moulines and T., Ryden, Inference in Hidden Markov Models. (Springer-Verlag, 2005).
[77] A. R., Cassandra, Tony's POMDP page.
[78] A. R., Cassandra, Exact and Approximate Algorithms for Partially Observed Markov Decision Process. PhD thesis, Dept. Computer Science, Brown University, 1998.
[79] A. R., Cassandra, A survey of POMDP applications. In Working Notes of AAAI 1998 Fall Symposium on Planning with Partially ObservableMarkov Decision Processes, pp. 17–24, 1998.
[80] A. R., Cassandra, L., Kaelbling and M. L., Littman, Acting optimally in partially observable stochastic domains. In AAAI, volume 94, pp. 1023–8, 1994.
[81] A. R., Cassandra, M. L., Littman and N. L., Zhang, Incremental pruning: A simple fast exact method for partially observed Markov decision processes. In Proceedings of the 13th Annual Conference on Uncertainty in Artificial Intelligence (UAI-97). (Providence, RI 1997).
[82] C. G., Cassandras and S., Lafortune, Introduction to Discrete Event Systems. (Springer-Verlag, 2008).
[83] O., Cavus and A., Ruszczynski, Risk-averse control of undiscounted transient Markov models. SIAM Journal on Control and Optimization, 52(6):3935–66, 2014.
[84] C., Chamley, Rational Herds: Economic Models of Social Learning. (Cambridge University Press, 2004).
[85] C., Chamley, A., Scaglione and L., Li, Models for the diffusion of beliefs in social networks: An overview. IEEE Signal Processing Magazine, 30(3):16–29, 2013.
[86] W., Chiou, A note on estimation algebras on nonlinear filtering theory. Systems and Control Letters, 28:55–63, 1996.
[87] J. M. C., Clark, The design of robust approximations to the stochastic differential equations of nonlinear filtering. In J. K., Skwirzynski, editor, Communication Systems and Random Processes Theory, Darlington 1977. (Alphen aan den Rijn: Sijthoff and Noordhoff, 1978).
[88] T. F., Coleman and Y., Li, An interior trust region approach for nonlinear minimization subject to bounds. SIAM Journal on Optimization, 6(2):418–45, 1996.
[89] T. M., Cover and M. E., Hellman, The two-armed-bandit problem with time-invariant finite memory. IEEE Transactions on Information Theory, 16(2):185–95, 1970.
[90] T. M., Cover and J. A., Thomas, Elements of Information Theory. (Wiley-Interscience, 2006).
[91] A., Dasgupta, R., Kumar and D., Sivakumar, Social sampling. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, 235–43, (Beijing, 2012). ACM.
[92] M. H. A., Davis, On a multiplicative functional transformation arising in nonlinear filtering theory. Z. Wahrscheinlichkeitstheorie verw. Gebiete, 54:125–39, 1980.
[93] S., Dayanik and C., Goulding, Detection and identification of an unobservable change in the distribution of a Markov-modulated random sequence. IEEE Transactions on Information Theory, 55(7):3323–45, 2009.
[94] A. P., Dempster, N. M., Laird and D. B., Rubin, Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B, 39:1–38, 1977.
[95] E., Denardo and U., Rothblum, Optimal stopping, exponential utility, and linear programming. Mathematical Programming, 16(1):228–44, 1979.
[96] C., Derman, G. J., Lieberman and S.M., Ross, Optimal system allocations with penalty cost. Management Science, 23(4):399–403, December 1976.
[97] R., Douc, E., Moulines and T., Ryden, Asymptotic properties of the maximum likelihood estimator in autoregressive models with Markov regime. The Annals of Statistics, 32(5):2254–304, 2004.
[98] A., Doucet, N., De Freitas and N., Gordon, editors, Sequential Monte Carlo Methods in Practice. (Springer-Verlag, 2001).
[99] A., Doucet, S., Godsill and C., Andrieu, On sequential Monte-Carlo sampling methods for Bayesian filtering. Statistics and Computing, 10:197–208, 2000.
[100] A., Doucet, N., Gordon and V., Krishnamurthy, Particle filters for state estimation of jump Markov linear systems. IEEE Transactions on Signal Processing, 49:613–24, 2001.
[101] A., Doucet and A. M., Johansen, A tutorial on particle filtering and smoothing: Fiteen years later. In D., Crisan and B., Rozovsky, editors, Oxford Handbook on Nonlinear Filtering. (Oxford University Press, 2011).
[102] E., Dynkin, Controlled random sequences. Theory of Probability & Its Applications, 10(1):1–14, 1965.
[103] J. N., Eagle, The optimal search for a moving target when the search path is constrained. Operations Research, 32:1107–15, 1984.
[104] R. J., Elliott, L., Aggoun and J. B., Moore, Hidden Markov Models – Estimation and Control. (New York, NY: Springer-Verlag, 1995).
[105] R. J., Elliott and V., Krishnamurthy, Exact finite-dimensional filters for maximum likelihood parameter estimation of continuous-time linear Gaussian systems. SIAM Journal on Control and Optimization, 35(6):1908–23, November 1997.
[106] R. J., Elliott and V., Krishnamurthy, New finite dimensional filters for estimation of discretetime linear Gaussian models. IEEE Transactions on Automatic Control, 44(5):938–51, May 1999.
[107] Y., Ephraim and N., Merhav, Hidden Markov processes. IEEE Transactions on Information Theory, 48:1518–69, June 2002.
[108] S. N., Ethier and T. G., Kurtz, Markov Processes—Characterization and Convergence. (Wiley, 1986).
[109] J., Evans and V., Krishnamurthy, Hidden Markov model state estimation over a packet switched network. IEEE Transactions on Signal Processing, 42(8):2157–66, August 1999.
[110] R., Evans, V., Krishnamurthy and G., Nair, Networked sensor management and data rate control for tracking maneuvering targets. IEEE Transactions on Signal Processing, 53(6):1979–91, June 2005.
[111] M., Fanaswala and V., Krishnamurthy, Syntactic models for trajectory constrained trackbefore-detect. IEEE Transactions on Signal Processing, 62(23):6130–42, 2014.
[112] M., Fanaswalla and V., Krishnamurthy, Detection of anomalous trajectory patterns in target tracking via stochastic context-free grammars and reciprocal process models. IEEE Journal on Selected Topics Signal Processing, 7(1):76–90, Feb. 2013.
[113] M., Fazel, H., Hindi and S. P., Boyd, Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices. In Proceedings of the 2003 American Control Conference, 2003.
[114] E., Feinberg and A., Shwartz, editors, Handbook of Markov Decision Processes. (Springer-Verlag, 2002).
[115] J. A., Fessler and A. O., Hero. Space–Alternating Generalized Expectation–Maximization algorithm. IEEE Transactions on Signal Processing, 42(10):2664–77, 1994.
[116] J., Filar, L., Kallenberg and H., Lee, Variance-penalized Markov decision processes. Mathematics of Operations Research, 14(1):147–61, 1989.
[117] W. H., Fleming and H. M., Soner, Controlled Markov Processes and Viscosity Solutions, volume 25. (Springer Science & Business Media, 2006).
[118] A., Fostel, H., Scarf and M., Todd. Two new proofs of Afriat's theorem. Economic Theory, 24(1):211–19, 2004.
[119] D., Fudenberg and D. K., Levine, The Theory of Learning in Games. (MIT Press, 1998).
[120] D., Fudenberg and D. K., Levine, Consistency and cautious fictitious play. Journal of Economic Dynamics and Control, 19(5-7):1065–89, 1995.
[121] F. R., Gantmacher, Matrix Theory, volume 2. (New York, NY: Chelsea Publishing Company, 1960).
[122] A., Garivier and E., Moulines. On upper-confidence bound policies for switching bandit problems. In Algorithmic Learning Theory, pages 174–188. Springer, 2011.
[123] E., Gassiat and S., Boucherone, Optimal error exponents in hidden Markov models order estimation. IEEE Transactions on Information Theory, 49(4):964–80, 2003.
[124] D., Ghosh, Maximum likelihood estimation of the dynamic shock-error model. Journal of Econometrics, 41(1):121–43, 1989.
[125] J. C., Gittins, Multi–Armed Bandit Allocation Indices. (Wiley, 1989).
[126] S., Goel and M. J., Salganik, Respondent-driven sampling as Markov chain Monte Carlo. Statistics in Medicine, 28:2209–29, 2009.
[127] G., Golubev and R., Khasminskii, Asymptotic optimal filtering for a hidden Markov model. Math. Methods Statist., 7(2):192–208, 1998.
[128] N. J., Gordon, D. J., Salmond and A. F. M., Smith, Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proceedings-F, 140(2):107–13, 1993.
[129] M., Granovetter, Threshold models of collective behavior. American Journal of Sociology, 83(6):1420–43, May 1978.
[130] A., Grosfeld-Nir, Control limits for two-state partially observable Markov decision processes. European Journal of Operational Research, 182(1):300–4, 2007.
[131] D., Guo, S., Shamai and S., Verdú, Mutual information and minimum mean-square error in Gaussian channels. IEEE Transactions on Information Theory, 51(4):1261–82, 2005.
[132] M., Hamdi, G., Solman, A., Kingstone and V., Krishnamurthy, Social learning in a human society: An experimental study. arXiv preprint arXiv:1408.5378, 2014.
[133] J. D., Hamilton and R., Susmel, Autoregressive conditional heteroskedasticity and changes in regime. Journal of Econometrics, 64(2):307–33, 1994.
[134] J. E., Handschin and D. Q., Mayne, Monte Carlo techniques to estimate the conditional expectation in multi-stage non-linear filtering. International Journal Control, 9(5):547–59, 1969.
[135] E. J., Hannan and M., Deistler, The Statistical Theory of Linear Systems.Wiley series in probability and mathematical statistics. Probability and mathematical statistics. (New York, NY: John Wiley, 1988).
[136] T., Hastie, R., Tibshirani and J., Friedman, The Elements of Statistical Learning. (Springer-Verlag, 2009).
[137] M., Hauskrecht, Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research, 13(1):33–94, 2000.
[138] S., Haykin, Cognitive radio: Brain-empowered wireless communications. IEEE Journal on Selected Areas Communications, 23(2):201–20, Feb. 2005.
[139] S., Haykin, Adaptive Filter Theory 5th edition.Information and System Sciences Series. (Prentice Hall, 2013).
[140] D. D., Heckathorn, Respondent-driven sampling: A new approach to the study of hidden populations. Social Problems, 44:174–99, 1997.
[141] D. D., Heckathorn, Respondent-driven sampling ii: Deriving valid population estimates from chain-referral samples of hidden populations. Social Problems, 49:11–34, 2002.
[142] M. E., Hellman and T.M., Cover, Learning with finite memory. The Annals ofMathematical Statistics, 41(3):765–82, 1970.
[143] O., Hernández-Lerma and J., Bernard Laserre, Discrete-Time Markov Control Processes: Basic Optimality Criteria. (New York, NY: Springer-Verlag, 1996).
[144] D. P., Heyman and M. J., Sobel, Stochastic Models in Operations Research, volume 2. (McGraw-Hill, 1984).
[145] N., Higham and L., Lin, On pth roots of stochastic matrices. Linear Algebra and Its Applications, 435(3):448–63, 2011.
[146] Y.-C., Ho and X.-R., Cao. Discrete Event Dynamic Systems and Perturbation Analysis. (Boston, MA: Kluwer Academic, 1991).
[147] J., Hofbauer and W., Sandholm, On the global convergence of stochastic fictitious play. Econometrica, 70(6):2265–94, November 2002.
[148] R. A., Horn and C. R., Johnson, Matrix Analysis. (Cambridge University Press, 2012).
[149] R. A., Howard, Dynamic Probabilistic Systems, volume 1: Markov Models. (New York: John Wiley, 1971).
[150] R. A., Howard, Dynamic Probabilistic Systems, volume 2: Semi-Markov and Decision Processes. (New York: John Wiley, 1971).
[151] D., Hsu, S., Kakade and T., Zhang, A spectral algorithm for learning hidden Markov models. Journal of Computer and System Sciences, 78(5):1460–80, 2012.
[152] S. Hsu, Chuang and A., Arapostathis, On the existence of stationary optimal policies for partially observed mdps under the long-run average cost criterion. Systems & Control Letters, 55(2):165–73, 2006.
[153] M., Huang and S., Dey, Stability of Kalman filtering with Markovian packet losses. Automatica, 43(4):598–607, 2007.
[154] Ienkaran I., Arasaratnam and S., Haykin, Cubature Kalman filters. IEEE Transactions on Automatic Control, 54(6):1254–69, 2009.
[155] K., Iida, Studies on the Optimal Search Plan, volume 70 of Lecture Notes in Statistics. (Springer-Verlag, 1990).
[156] M. O., Jackson. Social and Economic Networks. (Princeton, NJ: Princeton University Press, 2010).
[157] M. R., James, V., Krishnamurthy and F., LeGland, Time discretization of continuous-time filters and smoothers for HMM parameter estimation. IEEE Transactions on Information Theory, 42(2):593–605, March 1996.
[158] M. R., James, J. S., Baras and R. J., Elliott, Risk-sensitive control and dynamic games for partially observed discrete-time nonlinear systems. IEEE Transactions on Automatic Control, 39(4):780–92, April 1994.
[159] B., Jamison, Reciprocal processes. Probability Theory and Related Fields, 30(1):65–86, 1974.
[160] A. H., Jazwinski, Stochastic Processes and Filtering Theory. (NJ: Academic Press, 1970).
[161] A., Jobert and L. C. G., Rogers, Valuations and dynamic convex risk measures. Mathematical Finance, 18(1):1–22, 2008.
[162] L., Johnston and V., Krishnamurthy, Opportunistic file transfer over a fading channel – a POMDP search theory formulation with optimal threshold policies. IEEE Transactions on Wireless Commun., 5(2):394–405, Feb. 2006.
[163] T., Kailath, Linear Systems. (NJ: Prentice Hall, 1980).
[164] R. E., Kalman, A new approach to linear filtering and prediction problems. Trans. ASME, Series D (J. Basic Engineering), 82:35–45, March 1960.
[165] R. E., Kalman, When is a linear control system optimal?J. Basic Engineering, 51–60, April 1964.
[166] R. E., Kalman and R. S., Bucy, New results in linear filtering and prediction theory. Trans. ASME, Series D (J. Basic Engineering), 83:95–108, March 1961.
[167] I., Karatzas and S., Shreve, Brownian Motion and Stochastic Calculus, 2nd edition. (Springer, 1991).
[168] S., Karlin, Total Positivity, volume 1. (Stanford Univrsity, 1968).
[169] S., Karlin and Y., Rinott, Classes of orderings of measures and related correlation inequalities. I. Multivariate totally positive distributions. Journal of Multivariate Analysis, 10(4):467–98, December 1980.
[170] S., Karlin and H. M., Taylor, A Second Course in Stochastic Processes. (Academic Press, 1981).
[171] K. V., Katsikopoulos and S. E., Engelbrecht, Markov decision processes with delays and asynchronous cost collection. IEEE Transactions on Automatic Control, 48(4):568–74, 2003.
[172] J., Keilson and A., Kester, Monotone matrices and monotone Markov processes. Stochastic Processes and Their Applications, 5(3):231–41, 1977.
[173] H. K., Khalil, Nonlinear Systems 3rd edition. (Prentice Hall, 2002).
[174] M., Kijima, Markov Processes for Stochastic Modelling. (Chapman and Hall, 1997).
[175] A. N., Kolmogorov, Interpolation and extrapolation of stationary random sequences. Bull. Acad. Sci. U.S.S.R, Ser. Math., 5:3–14, 1941.
[176] A. N., Kolmogorov, Stationary sequences in Hilbert space. Bull. Math. Univ. Moscow, 2(6), 1941.
[177] L., Kontorovich and K., Ramanan, Concentration inequalities for dependent random variables via the martingale method. The Annals of Probability, 36(6):2126–58, 2008.
[178] V., Krishnamurthy, Algorithms for optimal scheduling and management of hidden Markov model sensors. IEEE Transactions on Signal Processing, 50(6):1382–97, June 2002.
[179] V., Krishnamurthy, Bayesian sequential detection with phase-distributed change time and nonlinear penalty – A lattice programming POMDP approach. IEEE Transactions on Information Theory, 57(3):7096–124, October 2011.
[180] V., Krishnamurthy, How to schedule measurements of a noisy Markov chain in decision making?IEEE Transactions on Information Theory, 59(9):4440–61, July 2013.
[181] V., Krishnamurthy and F. Vazquez, Abad, Gradient based policy optimization of constrained unichain Markov decision processes. In S., Cohen, D., Madan, and T., Siu, editors, Stochastic Processes, Finance and Control: A Festschrift in Honor of Robert J. Elliott. (World Scientific, 2012).
[182] V., Krishnamurthy, R., Bitmead, M., Gevers and E., Miehling, Sequential detection with mutual information stopping cost: Application in GMTI radar. IEEE Transactions on Signal Processing, 60(2):700–14, 2012.
[183] V., Krishnamurthy and D., Djonin, Structured threshold policies for dynamic sensor scheduling: A partially observed Markov decision process approach. IEEE Transactions on Signal Processing, 55(10):4938–57, Oct. 2007.
[184] V., Krishnamurthy and D.V., Djonin, Optimal threshold policies for multivariate POMDPs in radar resource management. IEEE Transactions on Signal Processing, 57(10), 2009.
[185] V., Krishnamurthy, O. Namvar, Gharehshiran and M., Hamdi, Interactive sensing and decision making in social networks. Foundations and Trends in Signal Processing, 7(1-2):1–196, 2014.
[186] V., Krishnamurthy and W., Hoiles, Online reputation and polling systems: Data incest, social learning and revealed preferences. IEEE Transactions Computational Social Systems, 1(3):164–79, January 2015.
[187] V., Krishnamurthy and U., Pareek, Myopic bounds for optimal policy of POMDPs: An extension of Lovejoy's structural results. Operations Research, 62(2):428–34, 2015.
[188] V., Krishnamurthy and H. V., Poor, Social learning and Bayesian games in multiagent signal processing: How do local and global decision makers interact?IEEE Signal Processing Magazine, 30(3):43–57, 2013.
[189] V., Krishnamurthy and C., Rojas, Reduced complexity HMM filtering with stochastic dominance bounds: A convex optimization approach. IEEE Transactions on Signal Processing, 62(23):6309–22, 2014.
[190] V., Krishnamurthy, C., Rojas and B., Wahlberg, Computing monotone policies for Markov decision processes by exploiting sparsity. In 3rd Australian Control Conference (AUCC), 1–6. IEEE, 2013.
[191] V., Krishnamurthy and B., Wahlberg, POMDP multiarmed bandits – structural results. Mathematics of Operations Research, 34(2):287–302, May 2009.
[192] V., Krishnamurthy and G., Yin, Recursive algorithms for estimation of hiddenMarkov models and autoregressive models with Markov regime. IEEE Transactions on Information Theory, 48(2):458–76, February 2002.
[193] P. R., Kumar and P., Varaiya, Stochastic Systems – Estimation, Identification and Adaptive Control. (Prentice Hall, 1986).
[194] H., Kurniawati, D., Hsu and W. S., Lee, Sarsop: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. In 2008 Robotics: Science and Systems Conference, Zurich, Switzerland, 2008.
[195] T. G., Kurtz, Approximation of Population Processes, volume 36. SIAM, 1981.
[196] H. J., Kushner, Dynamical equations for optimal nonlinear filtering. Journal of Differential Equations, 3:179–90, 1967.
[197] H. J., Kushner, A robust discrete state approximation to the optimal nonlinear filter for a diffusion. Stochastics, 3(2):75–83, 1979.
[198] H. J., Kushner, Approximation and Weak Convergence Methods for Random Processes, with Applications to Stochastic Systems Theory. (Cambridge, MA: MIT Press, 1984).
[199] H. J., Kushner and D. S., Clark, Stochastic Approximation Methods for Constrained and Unconstrained Systems. (Springer-Verlag, 1978).
[200] H. J., Kushner and G., Yin, Stochastic Approximation Algorithms and Recursive Algorithms and Applications, 2nd edition. (Springer-Verlag, 2003).
[201] T., Lai and H., Robbins, Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1):4–22, 1985.
[202] A., Lansky, A., Abdul-Quader, M., Cribbin, T., Hall, T. J., Finlayson, R., Garffin, L. S., Lin and P., Sullivan, Developing an HIV behavioral surveillance system for injecting drug users: the National HIV Behavioral Surveillance System.Public Health Reports, 122(S1):48–55, 2007.
[203] S., Lee, Understanding respondent driven sampling from a total survey error perspective.Survey Practice, 2(6), 2009.
[204] F., LeGland and L., Mevel, Exponential forgetting and geometric ergodicity in hidden Markov models.Mathematics of Controls, Signals and Systems, 13(1):63–93, 2000.
[205] B. G., Leroux, Maximum-likelihood estimation for hidden Markov models.Stochastic Processes and Its Applications, 40:127–43, 1992.
[206] R., Levine and G., Casella, Implementations of the Monte Carlo EM algorithm.Journal of Computational and Graphical Statistics, 10(3):422–39, September 2001.
[207] T., Lindvall. Lectures on the Coupling Method. (Courier Dover Publications, 2002).
[208] M., Littman, A. R., Cassandra and L., Kaelbling, Learning policies for partially observable environments: Scaling up.In ICML, volume 95, pages 362–70. Citeseer, 1995.
[209] M. L., Littman, Algorithms for Sequential Decision Making. PhD thesis, Brown University, 1996.
[210] M. L., Littman, A tutorial on partially observable Markov decision processes.Journal of Mathematical Psychology, 53(3):119–25, 2009.
[211] C., Liu and D.B., Rubin, The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence.Biometrica, 81(4):633–48, 1994.
[212] J. S., Liu, Monte Carlo Strategies in Scientific Computing. (Springer-Verlag, 2001).
[213] J. S., Liu and R., Chen, Sequential monte carlo methods for dynamic systems.Journal American Statistical Association, 93:1032–44, 1998.
[214] K., Liu and Q., Zhao, Indexability of restless bandit problems and optimality of Whittle index for dynamic multichannel access.IEEE Transactions on Information Theory, 56(11):5547–67, 2010.
[215] Z., Liu and L., Vandenberghe, Interior-point method for nuclear norm approximation with application to system identification.SIAM Journal on Matrix Analysis and Applications, 31(3):1235–56, 2009.
[216] L., Ljung, Analysis of recursive stochastic algorithms.IEEE Transactions on Auto. Control, AC-22(4):551–75, 1977.
[217] L., Ljung, System Identification, 2nd edition. (Prentice Hall, 1999).
[218] L., Ljung and T., Söderström, Theory and Practice of Recursive Identification. (Cambridge, MA: MIT Press, 1983).
[219] I., Lobel, D., Acemoglu, M., Dahleh and A. E., Ozdaglar, Preliminary results on social learning with partial observations.In Proceedings of the 2nd International Conference on Performance Evaluation Methodolgies and Tools. (Nantes, France, 2007). ACM.
[220] A., Logothetis and A., Isaksson. On sensor scheduling via information theoretic criteria.In Proc. American Control Conf., pages 2402–06, (San Diego, 1999).
[221] D., López-Pintado, Diffusion in complex social networks.Games and Economic Behavior, 62(2):573–90, 2008.
[222] T. A., Louis, Finding the observed information matrix when using the EM algorithm.Journal of the Royal Statistical Society, 44(B):226–33, 1982.
[223] W. S., Lovejoy, On the convexity of policy regions in partially observed systems. Operations Research, 35(4):619–21, July–August 1987.
[224] W. S., Lovejoy, Ordered solutions for dynamic programs.Mathematics of Operations Research, 12(2):269–76, 1987.
[225] W. S., Lovejoy, Some monotonicity results for partially observed Markov decision processes.Operations Research, 35(5):736–43, September–October 1987.
[226] W. S., Lovejoy, Computationally feasible bounds for partially observed Markov decision processes.Operations Research, 39(1):162–75, January–February 1991.
[227] W. S., Lovejoy, A survey of algorithmic methods for partially observed Markov decision processes.Annals of Operations Research, 28:47–66, 1991.
[228] M., Luca, Reviews, Reputation, and Revenue: The Case of, Technical Report 12- 016. Harvard Business School, September 2011.
[229] D. G., Luenberger, Optimization by Vector Space Methods. (New York, NY: John Wiley, 1969).
[230] I., MacPhee and B., Jordan, Optimal search for a moving target.Probability in the Engineering and Information Sciences, 9:159–82, 1995.
[231] C. D., Manning and H., Schütze, Foundations of Statistical Natural Language Processing. (Cambridge, MA: The MIT Press, 1999).
[232] S. I., Marcus, Algebraic and geometric methods in nonlinear filtering.SIAM Journal on Control and Optimization, 22(6):817–44, November 1984.
[233] S. I., Marcus and A. S., Willsky, Algebraic structure and finite dimensional nonlinear estimation.SIAM J. Math. Anal., 9(2):312–27, April 1978.
[234] H., Markowitz, Portfolio selection. The Journal of Finance, 7(1):77–91, 1952.
[235] D. Q., Mayne, J. B., Rawlings, C. V., Rao and P., Scokaert, Constrained model predictive control: Stability and optimality.Automatica, 36(6):789–814, 2000.
[236] G. J., McLachlan and T., Krishnan, The EM Algorithm and Extensions.Wiley series in probability and statistics. Applied probability and statistics. (New York, NY: John Wiley, 1996).
[237] L., Meier, J., Perschon and R. M., Dressler, Optimal control of measurement subsystems.IEEE Transactions on Automatic Control, 12(5):528–36, October 1967.
[238] J. M., Mendel, Maximum-Likelihood Deconvolution: A Journey into Model-Based Signal Processing. (Springer-Verlag, 1990).
[239] X. L., Meng, On the rate of convergence of the ecm algorithm.The Annals of Statistics, 22(1):326–39, 1994.
[240] S. P., Meyn and R. L., Tweedie, Markov Chains and Stochastic Stability. (Cambridge University Press, 2009).
[241] P., ilgrom, Good news and bad news: Representation theorems and applications.Bell Journal of Economics, 12(2):380–91, 1981.
[242] P., Milgrom and C., Shannon, Monotone comparative statics.Econometrica, 62(1):157–180, 1994.
[243] R. R., Mohler and C. S., Hwang, Nonlinear data observability and information.Journal of Franklin Institute, 325(4):443–64, 1988.
[244] G. E., Monahan, A survey of partially observable Markov decision processes: Theory, models and algorithms.Management Science, 28(1), January 1982.
[245] P., Del Moral, Feynman-Kac Formulae – Genealogical and Interacting Particle Systems with Applications. (Springer-Verlag, 2004).
[246] W., Moran, S., Suvorova and S., Howard, Application of sensor scheduling concepts to radar. In A., Hero, D., Castanon, D., Cochran and K., Kastella, editors, Foundations and Applications for Sensor Management, pages 221–56. (Springer-Verlag, 2006).
[247] G. B., Moustakides, Optimal stopping times for detecting changes in distributions.Annals of Statistics, 14:1379–87, 1986.
[248] A., Muller, How does the value function of a Markov decision process depend on the transition probabilities?Mathematics of Operations Research, 22:872–85, 1997.
[249] A., Muller and D., Stoyan, Comparison Methods for Stochastic Models and Risk. (Wiley, 2002).
[250] M. F., Neuts, Structured Stochastic Matrices of M/G/1 Type and Their Applications. (Marcel Dekker, 1989).
[251] A., Ng and M., Jordan, Pegasus: A policy search method for large MDPs and POMDPs.In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, pages 406–15. (Morgan Kaufmann Publishers Inc., 2000).
[252] M. H., Ngo and V., Krishnamurthy, Optimality of threshold policies for transmission scheduling in correlated fading channels.IEEE Transactions on Communications, 57(8):2474–83, 2009.
[253] M. H., Ngo and V., Krishnamurthy, Monotonicity of constrained optimal transmission policies in correlated fading channels with ARQ.IEEE Transactions on Signal Processing, 58(1):438–51, 2010.
[254] N., Noels, C., Herzet, A., Dejonghe, V., Lottici, H., Steendam, M., Moeneclaey, M., Luise and L., Vandendorpe, Turbo synchronization: an EM algorithm interpretation.In Proceedings of IEEE International Conference on Communications ICC'03, volume 4, 2933–7. IEEE, 2003.
[255] M., Ottaviani and P., Sørensen, Information aggregation in debate: Who should speak first?Journal of Public Economics, 81(3):393–421, 2001.
[256] C. H., Papadimitriou and J. N., Tsitsiklis, The complexity of Markov decision processes.Mathematics of Operations Research, 12(3):441–50, 1987.
[257] E., Pardoux, Equations du filtrage nonlineaire de la prediction et du lissage.Stochastics, 6:193–231, 1982.
[258] R., Parr and S., Russell, Approximating optimal policies for partially observable stochastic domains.In IJCAI, volume 95, pages 1088–94. (Citeseer, 1995).
[259] R., Pastor-Satorras and A., Vespignani, Epidemic spreading in scale-free networks.Physical Review Letters, 86(14):3200, 2001.
[260] S., Patek, On partially observed stochastic shortest path problems. In Proceedings of 40th IEEE Conference on Decision and Control, pages 5050–5, Orlando, Florida, 2001.
[261] G., Pflug, Optimization of Stochastic Models: The Interface between Simulation and Optimization.Kluwer Academic Publishers, 1996.
[262] J., Pineau, G., Gordon and T., Sebastian, Point-based value iteration: An anytime algorithm for POMDPs.In IJCAI, volume 3, 1025–32, 2003.
[263] M. L., Pinedo, Scheduling: Theory, Algorithms, and Systems. (Springer-Verlag, 2012).
[264] L. K., Platzman, Optimal infinite-horizon undiscounted control of finite probabilistic systems.SIAM Journal on Control and Optimization, 18:362–80, 1980.
[265] S. M., Pollock, A simple model of search for a moving target.Operations Research, 18:893–903, 1970.
[266] B. T., Polyak and A. B., Juditsky, Acceleration of stochastic approximation by averaging.SIAM Journal of Control and Optimization, 30(4):838–55, July 1992.
[267] H. V., Poor, Quickest detection with exponential penalty for delay.Annals of Statistics, 26(6):2179–205, 1998.
[268] H. V., Poor and O., Hadjiliadis, Quickest Detection. (Cambridge University Press, 2008).
[269] H. V., Poor, An Introduction to Signal Detection and Estimation, 2nd edition. (Springer-Verlag, 1993).
[270] B. M., Pötscher and I. R., Prucha, Dynamic Nonlinear Econometric Models: Asymptotic Theory. (Springer-Verlag, 1997).
[271] K., Premkumar, A., Kumar and V. V., Veeravalli, Bayesian Quickest Transient Change Detection.In Proceedings of InternationalWorkshop in Applied Probability,Madrid, 2010.
[272] M., Puterman, Markov Decision Processes. (John Wiley, 1994).
[273] J., Quah and B., Strulovici, Aggregating the single crossing property.Econometrica, 80(5):2333–48, 2012.
[274] L. R., Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition.Proceedings of the IEEE, 77(2):257–85, 1989.
[275] V., Raghavan and V., Veeravalli, Bayesian quickest change process detection.In ISIT, 644–648, Seoul, 2009.
[276] F., Riedel, Dynamic coherent risk measures.Stochastic Processes and Their Applications, 112(2):185–200, 2004.
[277] U., Rieder, Structural results for partially observed control models.Methods and Models of Operations Research, 35(6):473–90, 1991.
[278] U., Rieder and R., Zagst, Monotonicity and bounds for convex stochastic control models.Mathematical Methods of Operations Research, 39(2):187–207, June 1994.
[279] B., Ristic, S., Arulampalam and N., Gordon, Beyond the Kalman Filter: Particle Filters for Tracking Applications. (Artech, 2004).
[280] C. P., Robert and G., Casella, Monte Carlo Statistical Methods. (Springer-Verlag, 2013).
[281] R. T., Rockafellar and S., Uryasev, Optimization of conditional value-at-risk.Journal of Risk, 2:21–42, 2000.
[282] S., Ross, Arbitrary state Markovian decision processes.The Annals of Mathematical Statistics, 2118–22, 1968.
[283] S., Ross, Introduction to Stochastic Dynamic Programming. (San Diego, CA: Academic Press, 1983).
[284] S., Ross, Simulation, 5th edition. (Academic Press, 2013).
[285] D., Rothschild and J., Wolfers, Forecasting elections: Voter intentions versus expectations, 2010.
[286] N., Roy, G., Gordon and S., Thrun, Finding approximate POMDP solutions through belief compression.Journal of Artificial Intelligence Research, 23:1–40, 2005.
[287] W., Rudin, Principles of Mathematical Analysis. (McGraw-Hill, 1976).
[288] A., Ruszczyński, Risk-averse dynamic programming for Markov decision processes.Mathematical Programming, 125(2):235–61, 2010.
[289] T., Sakaki, M., Okazaki and Y., Matsuo, Earthquake shakes Twitter users: Real-time event detection by social sensors.In Proceedings of the 19th International Conference on World Wide Web, pages 851–60. (New York, 2010). ACM.
[290] A., Sayed, Adaptive Filters. (Wiley, 2008).
[291] A. H., Sayed, Adaptation, learning, and optimization over networks.Foundations and Trends in Machine Learning, 7(4–5):311–801, 2014.
[292] M., Segal and E., Weinstein, A new method for evaluating the log-likelihood gradient, the hessian, and the Fisher information matrix for linear dynamic systems.IEEE Transactions on Information Theory, 35(3):682–7, May 1989.
[293] E., Seneta, Non-Negative Matrices and Markov Chains. (Springer-Verlag, 1981).
[294] L. I., Sennott, Stochastic Dynamic Programming and the Control of Queueing Systems. (Wiley, 1999).
[295] M., Shaked and J. G., Shanthikumar, Stochastic Orders. (Springer-Verlag, 2007).
[296] G., Shani, R., Brafman and S., Shimony, Forward search value iteration for POMDPs.In IJCAI, 2619–24, 2007.
[297] G., Shani, J., Pineau and R., Kaplow, A survey of point-based POMDP solvers.Autonomous Agents and Multi-Agent Systems, 27(1):1–51, 2013.
[298] A. N., Shiryaev, On optimum methods in quickest detection problems.Theory of Probability and Its Applications, 8(1):22–46, 1963.
[299] R. H., Shumway and D. S., Stoffer, An approach to time series smoothing and forecasting using the EM algorithm.Journal of Time Series Analysis, 253–64, 1982.
[300] R., Simmons and S., Konig, Probabilistic navigation in partially observable environments.In Proceedings of 14th International Joint Conference on Artificial Intelligence, 1080–87, (Montreal, CA: Morgan Kaufman).
[301] S., Singh and V., Krishnamurthy, The optimal search for a Markovian target when the search path is constrained: the infinite horizon case.IEEE Transactions on Automatic Control, 48(3):487–92, March 2003.
[302] R. D., Smallwood and E. J., Sondik, Optimal control of partially observable Markov processes over a finite horizon.Operations Research, 21:1071–88, 1973.
[303] J. E., Smith and K. F., McCardle, Structural properties of stochastic dynamic programs.Operations Research, 50(5):796–809, 2002.
[304] T., Smith and R., Simmons, Heuristic search value iteration for pomdps.In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, 520–7. (AUAI Press, 2004).
[305] V., Solo and X., Kong, Adaptive Signal Processing Algorithms – Stability and Performance. (NJ: Prentice Hall, 1995).
[306] E. J., Sondik, The Optimal Control of Partially Observed Markov Processes. PhD thesis, Electrical Engineering, Stanford University, 1971.
[307] E. J., Sondik, The optimal control of partially observableMarkov processes over the infinite horizon: discounted costs.Operations Research, 26(2):282–304, March–April 1978.
[308] M., Spaan and N., Vlassis, Perseus: Randomized point-based value iteration for POMDPs.J. Artif. Intell. Res.(JAIR), 24:195–220, 2005.
[309] J., Spall, Introduction to Stochastic Search and Optimization. (Wiley, 2003).
[310] L., Stone, What's happened in search theory since the 1975 Lanchester prize.Operations Research, 37(3):501–06, May–June 1989.
[311] R. L., Stratonovich, Conditional Markov processes.Theory of Probability and Its Applications, 5(2):156–78, 1960.
[312] J., Surowiecki, The Wisdom of Crowds. (New York, NY: Anchor, 2005).
[313] R., Sutton and A., Barto, Reinforcement Learning: An Introduction. (Cambridge, MA: MIT Press, 1998).
[314] M., Taesup and T., Weissman, Universal Filtering Via Hidden Markov Modeling.IEEE Transactions on Information Theory, 54(2):692–708, 2008.
[315] M. A., Tanner, Tools for Statistical Inference: Methods for the Exploration of Posterior Distributions and Likelihood Functions.Springer series in statistics. (New York, NY: Springer-Verlag, 1993).
[316] M. A., Tanner and W. A., Wong, The calculation of posterior distributions by data augmentation.J. Am. Statis. Assoc., 82:528–40, 1987.
[317] A. G., Tartakovsky and V. V., Veeravalli, General asymptotic Bayesian theory of quickest change detection.Theory of Probability and Its Applications, 49(3):458–97, 2005.
[318] R., Tibshirani. Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288, 1996.
[319] P., Tichavsky, C. H., Muravchik and A., Nehorai, Posterior Cramér-Rao bounds for discretetime nonlinear filtering.IEEE Transactions on Signal Processing, 46(5):1386–96, May 1998.
[320] L., Tierney, Markov chains for exploring posterior distributions.The Annals of Statistics, 1701–28, 1994.
[321] D. M., Topkis, Minimizing a submodular function on a lattice.Operations Research, 26:305–21, 1978.
[322] D. M., Topkis, Supermodularity and Complementarity. (Princeton, NJ: Princeton University Press, 1998).
[323] D. van, Dyk and X., Meng, The art of data augmentation.Journal of Computational and Graphical Statistics, 10(1):1–50, 2001.
[324] L., Vandenberghe and S., Boyd, Semidefinite programming.SIAM review, 38(1):49–95, 1996.
[325] V. N., Vapnik, Statistical Learning Theory. (Wiley, 1998).
[326] H., Varian, The nonparametric approach to demand analysis.Econometrica, 50(1):945–73, 1982.
[327] H., Varian, Non-parametric tests of consumer behaviour.The Review of Economic Studies, 50(1):99–110, 1983.
[328] H., Varian, Revealed preference and its applications.The Economic Journal, 122(560):332–8, 2012.
[329] F., Vega-Redondo, Complex Social Networks, volume 44. (Cambridge University Press, 2007).
[330] S., Verdu, Multiuser Detection. (Cambridge University Press, 1998).
[331] B., Wahlberg, S., Boyd, M., Annergren and Y., Wang, An ADMM algorithm for a class of total variation regularized estimation problems. In Proceedings 16th IFAC Symposium on System Identification, July 2012.
[332] A., Wald, Note on the consistency of the maximum likelihood estimate.The Annals of Mathematical Statistics, 595–601, 1949.
[333] E., Wan and R. Van Der, Merwe, The unscented Kalman filter for nonlinear estimation.In Adaptive Systems for Signal Processing, Communications, and Control Symposium 2000. AS-SPCC. The IEEE 2000, pages 153–8. IEEE, 2000.
[334] C. C., White and D. P., Harrington, Application of Jensen's inequality to adaptive suboptimal design.Journal of Optimization Theory and Applications, 32(1):89–99, 1980.
[335] L. B., White and H. X., Vu, Maximum likelihood sequence estimation for hidden reciprocal processes.IEEE Transactions on Automatic Control, 58(10):2670–74, 2013.
[336] W., Whitt, Multivariate monotone likelihood ratio and uniform conditional stochastic order.Journal Applied Probability, 19:695–701, 1982.
[337] P., Whittle, Multi-armed bandits and the Gittins index.J. R. Statist. Soc. B, 42(2):143–9, 1980.
[338] N., Wiener, The Extrapolation, Interpolation and Smoothing of Stationary Time Series. (New York, NY: John Wiley, 1949).
[339] J., Williams, J., Fisher, and A., Willsky, Approximate dynamic programming for communication-constrained sensor network management.IEEE Transactions on Signal Processing, 55(8):4300–11, 2007.
[340] E., Wong and B., Hajek. Stochastic Processes in Engineering Systems, 2nd edition. (Berlin: Springer-Verlag, 1985).
[341] W. M., Wonham, Some applications of stochastic differential equations to optimal nonlinear filtering.SIAM J. Control, 2(3):347–69, 1965.
[342] C. F. J., Wu, On the convergence properties of the EM algorithm.Annals of Statistics, 11(1):95–103, 1983.
[343] J., Xie, S., Sreenivasan, G., Kornis, W., Zhang, C., Lim and B., Szymanski, Social consensus through the influence of committed minorities.Physical Review E, 84(1):011130, 2011.
[344] B., Yakir, A. M., Krieger and M., Pollak, Detecting a change in regression: First-order optimality.Annals of Statistics, 27(6):1896–1913, 1999.
[345] D., Yao and P., Glasserman, Monotone Structure in Discete-Event Systems. (Wiley, 1st edition, 1994).
[346] G., Yin, C., Ion and V., Krishnamurthy, How does a stochastic optimization/approximation algorithm adapt to a randomly evolving optimum/root with jump Markov sample paths.Mathematical Programming, 120(1):67–99, 2009.
[347] G., Yin and V., Krishnamurthy, LMS algorithms for tracking slowMarkov chains with applications to hidden Markov estimation and adaptive multiuser detection.IEEE Transactions on Information Theory, 51(7), July 2005.
[348] G., Yin, V., Krishnamurthy and C., Ion, Regime switching stochastic approximation algorithms with application to adaptive discrete stochastic optimization.SIAM Journal on Optimization, 14(4):117–1215, 2004.
[349] G., Yin and Q., Zhang, Discrete-time Markov Chains: Two-Time-Scale Methods and Applications, volume 55. (Springer, 2006).
[350] S., Young, M., Gasic, B., Thomson and J., Williams, POMDP-based statistical spoken dialog systems: A review.Proceedings of the IEEE, 101(5):1160–79, 2013.
[351] F., Yu and V., Krishnamurthy, Optimal joint session admission control in integrated WLAN and CDMA cellular network.IEEE Transactions Mobile Computing, 6(1):126–39, January 2007.
[352] M., Zakai, On the optimal filtering of diffusion processes.Z. Wahrscheinlichkeitstheorie verw. Gebiete, 11:230–43, 1969.
[353] Q., Zhao, L., Tong, A., Swami and Y., Chen, Decentralized cognitive MAC for opportunistic spectrum access in ad hoc networks: A POMDP framework.IEEE Journal on Selected Areas Communications, pages 589–600, 2007.
[354] K., Zhou, J., Doyle and K., Glover, Robust and Optimal Control, volume 40. (NJ: Prentice Hall, 1996).