Skip to main content Accessibility help
Hostname: page-component-684899dbb8-gblv7 Total loading time: 1.031 Render date: 2022-05-19T13:18:30.889Z Has data issue: true Feature Flags: { "shouldUseShareProductTool": true, "shouldUseHypothesis": true, "isUnsiloEnabled": true, "useRatesEcommerce": false, "useNewApi": true }


Published online by Cambridge University Press:  13 March 2020

Ron Kohavi
Diane Tang
Ya Xu
Get access


Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'
Trustworthy Online Controlled Experiments
A Practical Guide to A/B Testing
, pp. 246 - 265
Publisher: Cambridge University Press
Print publication year: 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


Abadi, Martin, Chu, Andy, Goodfellow, Ian, Mironov, H. Brendan, Mcmahan, Ilya, Talwar, Kunal, and Zhang, Li. 2016. “Deep Learning with Differential Privacy.” Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security.Google Scholar
Abrahamse, Peter. 2016. “How 8 Different A/B Testing Tools Affect Site Speed.” CXL: All Things Data-Driven Marketing. May 16. Scholar
ACM. 2018. ACM Code of Ethics and Professional Conduct. June 22. Scholar
Alvarez, Cindy. 2017. Lean Customer Development: Building Products Your Customers Will Buy. O’Reilly.Google Scholar
Angrist, Joshua D., and Pischke, Jörn-Steffen. 2014. Mastering ‘Metrics: The Path from Cause to Effect. Princeton University Press.Google Scholar
Angrist, Joshua D., and Pischke, Jörn-Steffen. 2009. Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton University Press.CrossRefGoogle Scholar
Apple, Inc. 2017. “Phased Release for Automatic Updates Now Available.” June 5. Scholar
Apple, Inc. 2018. “Use Low Power Mode to Save Battery Life on Your iPhone.” Apple. September 25. Scholar
Athey, Susan, and Imbens, Guido. 2016. “Recursive Partitioning for Heterogeneous Causal Effects.” PNAS: Proceedings of the National Academy of Sciences. 7353–7360. doi: Scholar
Azevedo, Eduardo M., Deng, Alex, Olea, Jose Montiel, Rao, Justin M., and Weyl, E. Glen. 2019. “A/B Testing with Fat Tails.” February 26. Available at SSRN: or Scholar
Backstrom, Lars, and Kleinberg, Jon. 2011. “Network Bucket Testing.” WWW ‘11 Proceedings of the 20th International Conference on World Wide Web. Hyderabad, India: ACM. 615624.CrossRefGoogle Scholar
Bailar, John C. 1983. “Introduction.” In Clinical Trials: Issues and Approaches, by Shapiro, Stuart and Louis, Thomas. Marcel Dekker.Google Scholar
Bakshy, Eytan, Balandat, Max, and Kashin, Kostya. 2019. “Open-sourcing Ax and BoTorch: New AI tools for adaptive experimentation.” Facebook Artificial Intelligence. May 1. Scholar
Bakshy, Eytan, and Frachtenberg, Eitan. 2015. “Design and Analysis of Benchmarking Experiments for Distributed Internet Services.” WWW ‘15: Proceedings of the 24th International Conference on World Wide Web. Florence, Italy: ACM. 108118. doi: Scholar
Bakshy, Eytan, Eckles, Dean, and Bernstein, Michael. 2014. “Designing and Deploying Online Field Experiments.” International World Wide Web Conference (WWW 2014). Scholar
Barajas, Joel, Akella, Ram, Hotan, Marius, and Flores, Aaron. 2016. “Experimental Designs and Estimation for Online Display Advertising Attribution in Marketplaces.” Marketing Science: the Marketing Journal of the Institute for Operations Research and the Management Sciences 35: 465483.Google Scholar
Barrilleaux, Bonnie, and Wang, Dylan. 2018. “Spreading the Love in the LinkedIn Feed with Creator-Side Optimization.” LinkedIn Engineering. October 16. Scholar
Basin, David, Debois, Soren, and Hildebrandt, Thomas. 2018. “On Purpose and by Necessity: Compliance under the GDPR.” Financial Cryptography and Data Security 2018. IFCA. Preproceedings 21.Google Scholar
Benbunan-Fich, Raquel. 2017. “The Ethics of Online Research with Unsuspecting Users: From A/B Testing to C/D Experimentation.” Research Ethics 13 (3–4): 200218. doi: Scholar
Benjamin, Daniel J., Berger, James O., Johannesson, Magnus, Nosek, Brian A., Wagenmakers, E.-J., Berk, Richard, Bollen, Kenneth A., et al. 2017. “Redefine Statistical Significance.” Nature Human Behaviour 2 (1): 610. Scholar
Beshears, John, Choi, James J., Laibson, David, Madrian, Brigitte C., and Milkman, Katherine L.. 2011. The Effect of Providing Peer Information on Retirement Savings Decisions. NBER Working Paper Series, National Bureau of Economic Research. Scholar
Billingsly, Patrick. 1995. Probability and Measure. Wiley.Google Scholar
Blake, Thomas, and Coey, Dominic. 2014. “Why Marketplace Experimentation is Harder Than it Seems: The Role of Test-Control Interference.” EC ’14 Proceedings of the Fifteenth ACM Conference on Economics and Computation. Palo Alto, CA: ACM. 567582.CrossRefGoogle Scholar
Blank, Steven Gary. 2005. The Four Steps to the Epiphany: Successful Strategies for Products that Win. Scholar
Blocker, Craig, Conway, John, Demortier, Luc, Heinrich, Joel, Junk, Tom, Lyons, Louis, and Punzi, Giovanni. 2006. “Simple Facts about P-Values.” The Rockefeller University. January 5. Scholar
Bodlewski, Mike. 2017. “When Slower UX is Better UX.” Web Designer Depot. Sep 25. Scholar
Bojinov, Iavor, and Shephard, Neil. 2017. “Time Series Experiments and Causal Estimands: Exact Randomization Tests and Trading.” arXiv of Cornell University. July 18. arXiv:1706.07840.CrossRefGoogle Scholar
Borden, Peter. 2014. “How Optimizely (Almost) Got Me Fired.” The SumAll Blog: Where E-commerce and Social Media Meet. June 18. Scholar
Bowman, Douglas. 2009. “Goodbye, Google.” stopdesign. March 20. Scholar
Box, George E.P., Hunter, J. Stuart, and Hunter, William G.. 2005. Statistics for Experimenters: Design, Innovation, and Discovery. 2nd edition. John Wiley & Sons, Inc.Google Scholar
Bell, Brooks. 2015. “Click Summit 2015 Keynote Presentation.” Brooks Bell. Scholar
Brown, Morton B. 1975. “A Method for Combining Non-Independent, One-Sided Tests of Signficance.” Biometrics 31 (4) 987992. Scholar
Brutlag, Jake, Abrams, Zoe, and Meenan, Pat. 2011. “Above the Fold Time: Measuring Web Page Performance Visually.” Velocity: Web Performance and Operations Conference.Google Scholar
Buhrmester, Michael, Kwang, Tracy, and Gosling, Samuel. 2011. “Amazon’s Mechanical Turk: A New Source of Inexpensive, Yet High-Quality Data?” Perspectives on Psychological Science, Feb 3.Google Scholar
Campbell, Donald T. 1979. “Assessing the Impact of Planned Social Change.” Evaluation and Program Planning 2: 6790. Scholar
Card, David, and Krueger, Alan B. 1994. “Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania.” The American Economic Review 84 (4): 772793. Scholar
Casella, George, and Berger, Roger L.. 2001. Statistical Inference. 2nd edition. Cengage Learning.Google Scholar
CDC. 2015. The Tuskegee Timeline. December. Scholar
Chamandy, Nicholas. 2016. “Experimentation in a Ridesharing Marketplace.” Lyft Engineering. September 2. https:/ Scholar
Chan, David, Ge, Rong, Gershony, Ori, Hesterberg, Tim, and Lambert, Diane. 2010. “Evaluating Online Ad Campaigns in a Pipeline: Causal Models at Scale.” Proceedings of ACM SIGKDD.CrossRefGoogle Scholar
Chapelle, Olivier, Joachims, Thorsten, Radlinski, Filip, and Yue, Yisong. 2012. “Large-Scale Validation and Analysis of Interleaved Search Evaluation.” ACM Transactions on Information Systems, February.CrossRefGoogle Scholar
Chaplin, Charlie. 1964. My Autobiography. Simon Schuster.Google Scholar
Charles, Reichardt S., and Melvin, Mark M.. 2004. “Quasi Experimentation.” In Handbook of Practical Program Evaluation, by Wholey, Joseph S., Hatry, Harry P. and Newcomer, Kathryn E.. Jossey-Bass.Google Scholar
Chatham, Bob, Temkin, Bruce D., and Amato, Michelle. 2004. A Primer on A/B Testing. Forrester Research.Google Scholar
Chen, Nanyu, Liu, Min, and Xu, Ya. 2019. “How A/B Tests Could Go Wrong: Automatic Diagnosis of Invalid Online Experiments.” WSDM ‘19 Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. Melbourne, VIC, Australia: ACM. 501509. Scholar
Chrystal, K. Alec, and Mizen, Paul D.. 2001. Goodhart’s Law: Its Origins, Meaning and Implications for Monetary Policy. Prepared for the Festschrift in honor of Charles Goodhart held on 15–16 November 2001 at the Bank of England. Scholar
Coey, Dominic, and Cunningham, Tom. 2019. “Improving Treatment Effect Estimators Through Experiment Splitting.” WWW ’19: The Web Conference. San Francisco, CA, USA: ACM. 285295. doi: Scholar
Collis, David. 2016. “Lean Strategy.” Harvard Business Review 62–68. Scholar
Concato, John, Shah, Nirav, and Horwitz, Ralph I. 2000. “Randomized, Controlled Trials, Observational Studies, and the Hierarchy of Research Designs.” The New England Journal of Medicine 342 (25): 18871892. doi: ScholarPubMed
Cox, David Roxbee. 1958. Planning of Experiments. New York: John Wiley.Google Scholar
Croll, Alistair, and Yoskovitz, Benjamin. 2013. Lean Analytics: Use Data to Build a Better Startup Faster. O’Reilly Media.Google Scholar
Crook, Thomas, Frasca, Brian, Kohavi, Ron, and Longbotham, Roger. 2009. “Seven Pitfalls to Avoid when Running Controlled Experiments on the Web.” KDD ’09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, 1105–1114.Google Scholar
Cross, Robert G., and Dixit, Ashutosh. 2005. “Customer-centric Pricing: The Surprising Secret for Profitability.” Business Horizons, 488.Google Scholar
Deb, Anirban, Bhattacharya, Suman, Gu, Jeremey, Zhuo, Tianxia, Feng, Eva, and Liu, Mandie. 2018. “Under the Hood of Uber’s Experimentation Platform.” Uber Engineering. August 28. Scholar
Deng, Alex. 2015. “Objective Bayesian Two Sample Hypothesis Testing for Online Controlled Experiments.” Florence, IT: ACM. 923–928.Google Scholar
Deng, Alex, and Hu, Victor. 2015. “Diluted Treatment Effect Estimation for Trigger Analysis in Online Controlled Experiments.” WSDM ’15: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining. Shanghai, China: ACM. 349358. doi: Scholar
Deng, Alex, Lu, Jiannan, and Chen, Shouyuan. 2016. “Continuous Monitoring of A/B Tests without Pain: Optional Stopping in Bayesian Testing.” 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA). Montreal, QC, Canada: IEEE. doi: Scholar
Deng, Alex, Knoblich, Ulf, and Lu, Jiannan. 2018. “Applying the Delta Method in Metric Analytics: A Practical Guide with Novel Ideas.” 24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.Google Scholar
Deng, Alex, Lu, Jiannan, and Litz, Jonathan. 2017. “Trustworthy Analysis of Online A/B Tests: Pitfalls, Challenges and Solutions.” WSDM: The Tenth International Conference on Web Search and Data Mining. Cambridge, UK.Google Scholar
Deng, Alex, Xu, Ya, Kohavi, Ron, and Walker, Toby. 2013. “Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment Data.” WSDM 2013: Sixth ACM International Conference on Web Search and Data Mining.CrossRefGoogle Scholar
Deng, Shaojie, Longbotham, Roger, Walker, Toby, and Xu, Ya. 2011. “Choice of Randomization Unit in Online Controlled Experiments.” Joint Statistical Meetings Proceedings. 4866–4877.Google Scholar
Denrell, Jerker. 2005. “Selection Bias and the Perils of Benchmarking.” (Harvard Business Review) 83 (4): 114119.Google ScholarPubMed
Dickhaus, Thorsten. 2014. Simultaneous Statistical Inference: With Applications in the Life Sciences. Springer. Scholar
Dickson, Paul. 1999. The Official Rules and Explanations: The Original Guide to Surviving the Electronic Age With Wit, Wisdom, and Laughter. Federal Street Pr.Google Scholar
Djulbegovic, Benjamin, and Hozo, Iztok. 2002. “At What Degree of Belief in a Research Hypothesis Is a Trial in Humans Justified?” Journal of Evaluation in Clinical Practice, June 13.Google Scholar
Dmitriev, Pavel, and Xian, Wu. 2016. “Measuring Metrics.” CIKM: Conference on Information and Knowledge Management. Indianapolis, In. Scholar
Dmitriev, Pavel, Gupta, Somit, Kim, Dong Woo, and Vaz, Garnet. 2017. “A Dirty Dozen: Twelve Common Metric Interpretation Pitfalls in Online Controlled Experiments.” Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2017). Halifax, NS, Canada: ACM. 14271436. Scholar
Dmitriev, Pavel, Frasca, Brian, Gupta, Somit, Kohavi, Ron, and Vaz, Garnet. 2016. “Pitfalls of Long-Term Online Controlled Experiments.” 2016 IEEE International Conference on Big Data (Big Data). Washington DC. 13671376. Scholar
Doerr, John. 2018. Measure What Matters: How Google, Bono, and the Gates Foundation Rock the World with OKRs. Portfolio.Google Scholar
Doll, Richard. 1998. “Controlled Trials: the 1948 Watershed.” BMJ. doi: Scholar
Dutta, Kaushik, and Vadermeer, Debra. 2018. “Caching to Reduce Mobile App Energy Consumption.” ACM Transactions on the Web (TWEB), February 12(1): Article No. 5.Google Scholar
Dwork, Cynthia, and Roth, Aaron. 2014. “The Algorithmic Foundations of Differential Privacy.” Foundations and Trends in Computer Science 211–407.CrossRefGoogle Scholar
Eckles, Dean, Karrer, Brian, and Ugander, Johan. 2017. “Design and Analysis of Experiments in Networks: Reducing Bias from Interference.” Journal of Causal Inference 5(1). Scholar
Edgington, Eugene S. 1972, “An Additive Method for Combining Probablilty Values from Independent Experiments.” The Journal of Psychology 80 (2): 351363.CrossRefGoogle Scholar
Edmonds, Andy, White, Ryan W., Morris, Dan, and Drucker, Steven M.. 2007. “Instrumenting the Dynamic Web.” Journal of Web Engineering. (3): 244260. Scholar
Efron, Bradley, and Tibshriani, Robert J.. 1994. An Introduction to the Bootstrap. Chapman & Hall/CRC.CrossRefGoogle Scholar
EGAP. 2018. “10 Things to Know About Heterogeneous Treatment Effects.” EGAP: Evidence in Government and Politics. Scholar
Ehrenberg, A.S.C. 1975. “The Teaching of Statistics: Corrections and Comments.” Journal of the Royal Statistical Society. Series A 138 (4): 543545. Scholar
Eisenberg, Bryan 2005. “How to Improve A/B Testing.” ClickZ Network. April 29. Scholar
Eisenberg, Bryan. 2004. A/B Testing for the Mathematically Disinclined. May 7. Scholar
Eisenberg, Bryan, and Quarto-vonTivadar, John. 2008. Always Be Testing: The Complete Guide to Google Website Optimizer. Sybex.Google Scholar
eMarketer. 2016. “Microsoft Ad Revenues Continue to Rebound.” April 20. Scholar
European Commission. 2016. EU GDPR.ORG. Scholar
Fabijan, Aleksander, Dmitriev, Pavel, Olsson, Helena Holmstrom, and Bosch, Jan. 2018. “Online Controlled Experimentation at Scale: An Empirical Survey on the Current State of A/B Testing.” Euromicro Conference on Software Engineering and Advanced Applications (SEAA). Prague, Czechia. doi:10.1109/SEAA.2018.00021.Google Scholar
Fabijan, Aleksander, Dmitriev, Pavel, Olsson, Helena Holmstrom, and Bosch, Jan. 2017. “The Evolution of Continuous Experimentation in Software Product Development: from Data to a Data-Driven Organization at Scale.” ICSE ’17 Proceedings of the 39th International Conference on Software Engineering. Buenos Aires, Argentina: IEEE Press. 770780. doi: Scholar
Fabijan, Aleksander, Gupchup, Jayant, Gupta, Somit, Omhover, Jeff, Qin, Wen, Vermeer, Lukas, and Dmitriev, Pavel. 2019. “Diagnosing Sample Ratio Mismatch in Online Controlled Experiments: A Taxonomy and Rules of Thumb for Practitioners.” KDD ‘19: The 25th SIGKDD International Conference on Knowledge Discovery and Data Mining. Anchorage, Alaska, USA: ACM.Google Scholar
Fabijan, Aleksander, Dmitriev, Pavel, McFarland, Colin, Vermeer, Lukas, Olsson, Helena Holmström, and Bosch, Jan. 2018. “Experimentation Growth: Evolving Trustworthy A/B Testing Capabilities in Online Software Companies.” Journal of Software: Evolution and Process 30 (12:e2113). doi: Scholar
FAT/ML. 2019. Fairness, Accountability, and Transparency in Machine Learning. Scholar
Fisher, Ronald Aylmer. 1925. Statistical Methods for Research Workers. Oliver and Boyd. Scholar
Forte, Michael. 2019. “Misadventures in experiments for growth.” The Unofficial Google Data Science Blog. April 16. Scholar
Freedman, Benjamin. 1987. “Equipoise and the Ethics of Clinical Research.” The New England Journal of Medicine 317 (3): 141145. doi: ScholarPubMed
Gelman, Andrew, and Carlin, John. 2014. “Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors.” Perspectives on Psychological Science 9 (6): 641651. doi:10.1177/1745691614551642.CrossRefGoogle ScholarPubMed
Gelman, Andrew, and Little, Thomas C.. 1997. “Poststratification into Many Categories Using Hierarchical Logistic Regression.” Survey Methdology 23 (2): 127135. Scholar
Georgiev, Georgi Zdravkov. 2019. Statistical Methods in Online A/B Testing: Statistics for Data-Driven Business Decisions and Risk Management in e-Commerce. Independently published. www.abtestingstats.comGoogle Scholar
Georgiev, Georgi Zdravkov. 2018. “Analysis of 115 A/B Tests: Average Lift is 4%, Most Lack Statistical Power.” Analytics Toolkit. June 26. Scholar
Gerber, Alan S., and Green, Donald P.. 2012. Field Experiments: Design, Analysis, and Interpretation. W. W. Norton & Company. Scholar
Goldratt, Eliyahu M. 1990. The Haystack Syndrome. North River Press.Google Scholar
Goldstein, Noah J., Martin, Steve J., and Cialdini, Robert B.. 2008. Yes!: 50 Scientifically Proven Ways to Be Persuasive. Free Press.Google Scholar
Goodhart, Charles A. E. 1975. Problems of Monetary Management: The UK Experience. Vol. 1, in Papers in Monetary Economics, by Reserve Bank of Australia.Google Scholar
Google. 2019. Processing Logs at Scale Using Cloud Dataflow. March 19. Scholar
Google. 2011. “Ads Quality Improvements Rolling Out Globally.” Google Inside AdWords. October 3. Scholar
Google Console. 2019. “Release App Updates with Staged Rollouts.” Google Console Help. Scholar
Google, Helping Advertisers Comply with the GDPR. 2019. Google Ads Help. Scholar
Gordon, Brett R., Zettelmeyer, Florian, Bhargava, Neha, and Chapsky, Dan. 2018. “A Comparison of Approaches to Advertising Measurement: Evidence from Big Field Experiments at Facebook (forthcoming at Marketing Science).” Scholar
Goward, Chris. 2015. “Delivering Profitable ‘A-ha!’ Moments Everyday.” Conversion Hotel. Texel, The Netherlands. Scholar
Goward, Chris. 2012. You Should Test That: Conversion Optimization for More Leads, Sales and Profit or The Art and Science of Optimized Marketing. Sybex.Google Scholar
Greenhalgh, Trisha. 2014. How to Read a Paper: The Basics of Evidence-Based Medicine. BMJ Books. Scholar
Greenhalgh, Trisha. 1997. “How to Read a Paper : Getting Your Bearings (deciding what the paper is about).” BMJ 315 (7102): 243246. doi:10.1136/bmj.315.7102.243.CrossRefGoogle Scholar
Greenland, Sander, Senn, Stephen J., Rothman, Kenneth J., Carlin, John B., Poole, Charles, Goodman, Steven N., and Altman, Douglas G.. 2016. “Statistical Tests, P Values, Confidence Intervals, and Power: a Guide to Misinterpretations.” European Journal of Epidemiology 31 (4): 337350.–016-0149-3.CrossRefGoogle ScholarPubMed
Grimes, Carrie, Tang, Diane, and Russell, Daniel M.. 2007. “Query Logs Alone are not Enough.” International Conference of the World Wide Web, May.Google Scholar
Grove, Andrew S. 1995. High Output Management. 2nd edition. Vintage.Google Scholar
Groves, Robert M., Fowler, Floyd J. Jr, Couper, Mick P., Lepkowski, James M., Eleanor, Singer, and Tourangeau, Roger. 2009. Survey Methodology, 2nd edition. Wiley.Google Scholar
Gui, Han, Xu, Ya, Bhasin, Anmol, and Han, Jiawei. 2015. “Network A/B Testing From Sampling to Estimation.” WWW ’15 Proceedings of the 24th International Conference on World Wide Web. Florence, IT: ACM. 399409.Google Scholar
Gupta, Somit, Ulanova, Lucy, Bhardwaj, Sumit, Dmitriev, Pavel, Raff, Paul, and Fabijan, Aleksander. 2018. “The Anatomy of a Large-Scale Online Experimentation Platform.” IEEE International Conference on Software Architecture.CrossRefGoogle Scholar
Gupta, Somit, Kohavi, Ronny, Tang, Diane, Xu, Ya, and etal. 2019. “Top Challenges from the first Practical Online Controlled Experiments Summit.” Edited by Dong, Xin Luna, Teredesai, Ankur and Zafarani, Reza. SIGKDD Explorations (ACM) 21 (1). Scholar
Guyatt, Gordon H., Sackett, David L., Sinclair, John C., Hayward, Robert, Cook, Deborah J., and Cook, Richard J.. 1995. “Users’ Guides to the Medical Literature: IX. A method for Grading Health Care Recommendations.” Journal of the American Medical Association (JAMA) 274 (22): 18001804. doi: Scholar
Harden, K. Paige, Mendle, Jane, Hill, Jennifer E., Turkheimer, Eric, and Emery, Robert E.. 2008. “Rethinking Timing of First Sex and Delinquency.” Journal of Youth and Adolescence 37 (4): 373385. doi: ScholarPubMed
Harford, Tim. 2014. The Undercover Economist Strikes Back: How to Run – or Ruin – an Economy. Riverhead Books.Google Scholar
Hauser, John R., and Katz, Gerry. 1998. “Metrics: You Are What You Measure!European Management Journal 16 (5): 516528. Scholar
Health and Human Services. 2018a. Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. Scholar
Health and Human Services. 2018b. Health Information Privacy. Scholar
Health and Human Services. 2018c. Summary of the HIPAA Privacy Rule. Scholar
Hedges, Larry, and Olkin, Ingram. 2014. Statistical Methods for Meta-Analysis. Academic Press.Google Scholar
Hemkens, Lars, Contopoulos-Ioannidis, Despina, and Ioannidis, John. 2016. “Routinely Collected Data and Comparative Effectiveness Evidence: Promises and Limitations.” CMAJ, May 17.CrossRefGoogle Scholar
Journal, HIPAA. 2018. What is Considered Protected Health Information Under HIPAA. April 2. Scholar
Hochberg, Yosef, and Benjamini, Yoav. 1995. “Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing Series B.” Journal of the Royal Statistical Society 57 (1): 289300.Google Scholar
Hodge, Victoria, and Austin, Jim. 2004. “A Survey of Outlier Detection Methodologies.” Journal of Artificial Intelligence Review. 85–126.CrossRefGoogle Scholar
Hohnhold, Henning, O’Brien, Deirdre, and Tang, Diane. 2015. “Focus on the Long-Term: It’s better for Users and Business.” Proceedings 21st Conference on Knowledge Discovery and Data Mining (KDD 2015). Sydney, Australia: ACM. Scholar
Holson, Laura M. 2009. “Putting a Bolder Face on Google.” NY Times. February 28. Scholar
Holtz, David Michael. 2018. “Limiting Bias from Test-Control Interference In Online Marketplace Experiments.” DSpace@MIT. Scholar
Hoover, Kevin D. 2008. “Phillips Curve.” In Henderson, R. David, Concise Encyclopedia of Economics. Scholar
Huang, Jason, Reiley, David, and Raibov, Nickolai M.. 2018. “David Reiley, Jr.” Measuring Consumer Sensitivity to Audio Advertising: A Field Experiment on Pandora Internet Radio. April 21. Scholar
Huang, Jeff, White, Ryen W., and Dumais, Susan. 2012. “No Clicks, No Problem: Using Cursor Movements to Understand and Improve Search.” Proceedings of SIGCHI.CrossRefGoogle Scholar
Huang, Yanping, You, Jane, Wang, Iris, Cao, Feng, and Gao, Ian. 2015. Data Science Interviews Exposed. CreateSpace.Google Scholar
Hubbard, Douglas W. 2014. How to Measure Anything: Finding the Value of Intangibles in Business. 3rd edition. Wiley.Google Scholar
Huffman, Scott. 2008. Search Evaluation at Google. September 15. Scholar
Imbens, Guido W., and Rubin, Donald B.. 2015. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press.Google Scholar
Ioannidis, John P. 2005. “Contradicted and Initially Stronger Effects in Highly Cited Clinical Research.” (The Journal of the American Medical Association) 294 (2).CrossRefGoogle ScholarPubMed
Jackson, Simon. 2018. “How increases the power of online experiments with CUPED.” January 22. Scholar
Joachims, Thorsten, Granka, Laura, Pan, Bing, Hembrooke, Helene, and Gay, Geri. 2005. “Accurately Interpreting Clickthrough Data as Implicit Feedback.” SIGIR, August.Google Scholar
Johari, Ramesh, Pekelis, Leonid, Koomen, Pete, and Walsh, David. 2017. “Peeking at A/B Tests.” KDD ’17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Halifax, NS, Canada: ACM. 15171525. doi: Scholar
Kaplan, Robert S., and Norton, David P.. 1996. The Balanced Scorecard: Translating Strategy into Action. Harvard Business School Press.Google Scholar
Katzir, Liran, Liberty, Edo, and Somekh, Oren. 2012. “Framework and Algorithms for Network Bucket Testing.” Proceedings of the 21st International Conference on World Wide Web 1029–1036.Google Scholar
Kaushik, Avinash. 2006. “Experimentation and Testing: A Primer.” Occam’s Razor. May 22. Scholar
Keppel, Geoffrey, Saufley, William H., and Tokunaga, Howard. 1992. Introduction to Design and Analysis. 2nd edition. W.H. Freeman and Company.Google Scholar
Kesar, Alhan. 2018. 11 Ways to Stop FOOC’ing up your A/B tests. August 9. Scholar
King, Gary, and Nielsen, Richard. 2018. Why Propensity Scores Should Not Be Used for Matching. Working paper. Scholar
King, Rochelle, Churchill, Elizabeth F., and Tan, Caitlin. 2017. Designing with Data: Improving the User Experience with A/B Testing. O’Reilly Media.Google Scholar
Kingston, Robert. 2015. Does Optimizely Slow Down a Site’s Performance. January 18. Scholar
Knapp, Michael S., Swinnerton, Juli A., Copland, Michael A., and Monpas-Huber, Jack. 2006. Data-Informed Leadership in Education. Center for the Study of Teaching and Policy, University of Washington, Seattle, WA: Wallace Foundation. Scholar
Kohavi, Ron. 2019. “HiPPO FAQ.” ExP Experimentation Platform. Scholar
Kohavi, Ron. 2016. “Pitfalls in Online Controlled Experiments.” CODE ’16: Conference on Digital Experimentation. MIT. Scholar
Kohavi, Ron. 2014. “Customer Review of A/B Testing: The Most Powerful Way to Turn Clicks Into Customers.” May 27. Scholar
Kohavi, Ron. 2010. “Online Controlled Experiments: Listening to the Customers, not to the HiPPO.” Keynote at EC10: the 11th ACM Conference on Electronic Commerce. Scholar
Kohavi, Ron. 2003. Real-world Insights from Mining Retail E-Commerce Data. Stanford, CA, May 22. Scholar
Kohavi, Ron, and Longbotham, Roger. 2017. “Online Controlled Experiments and A/B Tests.” In Encyclopedia of Machine Learning and Data Mining, by Sammut, Claude and Webb, Geoffrey I. Springer. Scholar
Kohavi, Ron, and Longbotham, Roger. 2010. “Unexpected Results in Online Controlled Experiments.” SIGKDD Explorations, December. Scholar
Kohavi, Ron and Parekh, Rajesh. 2003. “Ten Supplementary Analyses to Improve E-commerce Web Sites.” WebKDD. Scholar
Kohavi, Ron, and Thomke, Stefan. 2017. “The Surprising Power of Online Experiments.” Harvard Business Review (September–October): 74–92. Scholar
Kohavi, Ron, Crook, Thomas, and Longbotham, Roger. 2009. “Online Experimentation at Microsoft.” Third Workshop on Data Mining Case Studies and Practice Prize. Scholar
Kohavi, Ron, Longbotham, Roger, and Walker, Toby. 2010. “Online Experiments: Practical Lessons.” IEEE Computer, September: 82–85. Scholar
Kohavi, Ron, Tang, Diane, and Ya, Xu. 2019. “History of Controlled Experiments.” Practical Guide to Trustworthy Online Controlled Experiments. Scholar
Kohavi, Ron, Deng, Alex, Longbotham, Roger, and Xu, Ya. 2014. “Seven Rules of Thumb for Web Site.” Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’14). Scholar
Kohavi, Ron, Longbotham, Roger, Sommerfield, Dan, and Henne, Randal M.. 2009. “Controlled Experiments on the Web: Survey and Practical Guide.” Data Mining and Knowledge Discovery 18: 140181. Scholar
Kohavi, Ron, Deng, Alex, Frasca, Brian, Longbotham, Roger, Walker, Toby, and Xu, Ya. 2012. “Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained.” Proceedings of the 18th Conference on Knowledge Discovery and Data Mining. Scholar
Kohavi, Ron, Deng, Alex, Frasca, Brian, Walker, Toby, Xu, Ya, and Pohlmann, Nils. 2013. “Online Controlled Experiments at Large Scale.” KDD 2013: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.CrossRefGoogle Scholar
Kohavi, Ron, Messner, David, Eliot, Seth, Ferres, Juan Lavista, Henne, Randy, Kannappan, Vignesh, and Wang, Justin. 2010. “Tracking Users’ Clicks and Submits: Tradeoffs between User Experience and Data Loss.” Experimentation Platform. September 28. Scholar
Kramer, Adam, Guillory, Jamie, and Hancock, Jeffrey. 2014. “Experimental evidence of massive-scale emotional contagion through social networks.” PNAS, June 17.Google Scholar
Kuhn, Thomas. 1996. The Structure of Scientific Revolutions. 3rd edition. University of Chicago Press.CrossRefGoogle Scholar
Laja, Peep. 2019. “How to Avoid a Website Redesign FAIL.” CXL. March 8. Scholar
Lax, Jeffrey R., and Phillips, Justin H.. 2009. “How Should We Estimate Public Opinion in The States?American Journal of Political Science 53 (1): 107121. Scholar
Lee, Jess. 2013. Fake Door. April 10. Scholar
Lee, Minyong R, and Shen, Milan. 2018. “Winner’s Curse: Bias Estimation for Total Effects of Features in Online Controlled Experiments.” KDD 2018: The 24th ACM Conference on Knowledge Discovery and Data Mining. London: ACM.Google Scholar
Lehmann, Erich, L., and Romano, Joseph P.. 2005. Testing Statistical Hypothesis. Springer.Google Scholar
Levy, Steven. 2014. “Why The New Obamacare Website is Going to Work This Time.” Scholar
Lewis, Randall A., Rao, Justin M., and Reiley, David. 2011. “Proceedings of the 20th ACM International World Wide Web Conference (WWW20).” 157–166. Scholar
Li, Lihong, Chu, Wei, Langford, John, and Schapire, Robert E.. 2010. “A Contextual-Bandit Approach to Personalized News Article Recommendation.” WWW 2010: Proceedings of the 19th International Conference on World Wide Web. Raleigh, North Carolina. Scholar
Linden, Greg. 2006. Early Amazon: Shopping Cart Recommendations. April 25. Scholar
Linden, Greg. 2006. “Marissa Mayer at Web 2.0 .” Geeking with Greg . November 9. Scholar
Linowski, Jakub. 2018a. Good UI: Learn from What We Try and Test. Scholar
Linowski, Jakub. 2018b. No Coupon. Scholar
Liu, Min, Sun, Xiaohui, Varshney, Maneesh, and Xu, Ya. 2018. “Large-Scale Online Experimentation with Quantile Metrics.” Joint Statistical Meeting, Statistical Consulting Section. Alexandria, VA: American Statistical Association. 28492860.Google Scholar
Loukides, Michael, Mason, Hilary, and Patil, D.J.. 2018. Ethics and Data Science. O’Reilly Media.Google Scholar
Lu, Luo, and Liu, Chuang. 2014. “Separation Strategies for Three Pitfalls in A/B Testing.” KDD User Engagement Optimization Workshop. New York. Scholar
Lucas, Robert E. 1976. Econometric Policy Evaluation: A Critique. Vol. 1. In The Phillips Curve and Labor Markets, by Brunner, K. and Meltzer, A., 1946. Carnegie-Rochester Conference on Public Policy.Google Scholar
Malinas, Gary, and Bigelow, John. 2004. “Simpson’s Paradox.” Stanford Encyclopedia of Philosophy. February 2. Scholar
Manzi, Jim. 2012. Uncontrolled: The Surprising Payoff of Trial-and-Error for Business, Politics, and Society. Basic Books.Google Scholar
Marks, Harry M. 1997. The Progress of Experiment: Science and Therapeutic Reform in the United States, 1900–1990. Cambridge University Press.Google Scholar
Marsden, Peter V., and Wright, James D.. 2010. Handbook of Survey Research, 2nd Edition. Emerald Publishing Group Limited.Google Scholar
Marsh, Catherine, and Elliott, Jane. 2009. Exploring Data: An Introduction to Data Analysis for Social Scientists. 2nd edition. Polity.Google Scholar
Martin, Robert C. 2008. Clean Code: A Handbook of Agile Software Craftsmanship. Prentice Hall.Google Scholar
Mason, Robert L., Gunst, Richard F., and Hess, James L.. 1989. Statistical Design and Analysis of Experiments With Applications to Engineering and Science. John Wiley & Sons.Google Scholar
McChesney, Chris, Covey, Sean, and Huling, Jim. 2012. The 4 Disciplines of Execution: Achieving Your Wildly Important Goals. Free Press.Google Scholar
McClure, Dave. 2007. Startup Metrics for Pirates: AARRR!!! August 8. Scholar
McClure, Dave. 2007. Startup Metrics for Pirates: AARRR!!! August 8. Scholar
McCrary, Justin. 2008. “Manipulation of the Running Variable in the Regression Discontinuity Design: A Density Test.” Journal of Econometrics (142): 698714.CrossRefGoogle Scholar
McCullagh, Declan. 2006. AOL’s Disturbing Glimpse into Users’ Lives. August 9. Scholar
McFarland, Colin. 2012. Experiment!: Website Conversion Rate Optimization with A/B and Multivariate Testing. New Riders.Google Scholar
McGue, Matt. 2014. Introduction to Human Behavioral Genetics, Unit 2: Twins: A Natural Experiment . Coursera. Scholar
McKinley, Dan. 2013. Testing to Cull the Living Flower. January. Scholar
McKinley, Dan. 2012. Design for Continuous Experimentation: Talk and Slides. December 22. Scholar
Turk, Mechanical. 2019. Amazon Mechanical Turk. Scholar
Meenan, Patrick, Feng, Chao (Ray), and Petrovich, Mike. 2013. “Going Beyond Onload – How Fast Does It Feel?” Velocity: Web Performance and Operations conference, October 14–16. Scholar
Meyer, Michelle N. 2018. “Ethical Considerations When Companies Study – and Fail to Study – Their Customers.” In The Cambridge Handbook of Consumer Privacy, by Selinger, Evan, Polonetsky, Jules and Tene, Omer. Cambridge University Press.Google Scholar
Meyer, Michelle N. 2015. “Two Cheers for Corporate Experimentation: The A/B Illusion and the Virtues of Data-Driven Innovation.” 13 Colo. Tech. L.J. 273. Scholar
Meyer, Michelle N. 2012. Regulating the Production of Knowledge: Research Risk–Benefit Analysis and the Heterogeneity Problem. 65 Administrative Law Review 237; Harvard Public Law Working Paper. doi: Scholar
Meyer, Michelle N., Heck, Patrick R., Holtzman, Geoffrey S., Anderson, Stephen M., Cai, William, Watts, Duncan J., and Chabris, Christopher F.. 2019. “Objecting to Experiments that Compare Two Unobjectionable Policies or Treatments.” PNAS: Proceedings of the National Academy of Sciences (National Academy of Sciences). doi: Scholar
Milgram, Stanley. 2009. Obedience to Authority: An Experimental View. Harper Perennial Modern Thought.Google Scholar
Mitchell, Carl, Litz, Jonathan, Vaz, Garnet, and Drake, Andy. 2018. “Metrics Health Detection and AA Simulator.” Microsoft ExP (internal). August 13. Scholar
Moran, Mike. 2008. Multivariate Testing in Action: Quicken Loan’s Regis Hadiaris on multivariate testing. December. Scholar
Moran, Mike. 2007. Do It Wrong Quickly: How the Web Changes the Old Marketing Rules . IBM Press.Google Scholar
Mosteller, Frederick, Gilbert, John P., and McPeek, Bucknam. 1983. “Controversies in Design and Analysis of Clinical Trials.” In Clinical Trials, by Shapiro, Stanley H. and Louis, Thomas A.. New York, NY: Marcel Dekker, Inc.Google Scholar
MR Web. 2014. “Obituary: Audience Measurement Veteran Tony Twyman.” Daily Research News Online. November 12. Scholar
Mudholkar, Govind S., and George, E. Olusegun. 1979. “The Logit Method for Combining Probablilities.” Edited by Rustagi, J.. Symposium on Optimizing Methods in Statistics.” Academic Press. 345–366. Scholar
Mueller, Hendrik, and Sedley, Aaron. 2014. “HaTS: Large-Scale In-Product Measurement of User Attitudes & Experiences with Happiness Tracking Surveys.” OZCHI, December.CrossRefGoogle Scholar
Neumann, Chris. 2017. Does Optimizely Slow Down a Site’s Performance? October 18. Scholar
Newcomer, Kathryn E., Hatry, Harry P., and Wholey, Joseph S.. 2015. Handbook of Practical Program Evaluation (Essential Tests for Nonprofit and Publish Leadership and Management). Wiley.Google Scholar
Neyman, J. 1923. “On the Application of Probability Theory of Agricultural Experiments.” Statistical Science 465–472.Google Scholar
NSF. 2018. Frequently Asked Questions and Vignettes: Interpreting the Common Rule for the Protection of Human Subjects for Behavioral and Social Science Research. Scholar
Office for Human Research Protections. 1991. Federal Policy for the Protection of Human Subjects (‘Common Rule’). Scholar
Optimizely. 2018. “A/A Testing.” Optimizely. Scholar
Optimizely. 2018. “Implement the One-Line Snippet for Optimizely X.” Optimizely. February 28. Scholar
Optimizely. 2018. Optimizely Maturity Model. Scholar
Orlin, Ben. 2016. Why Not to Trust Statistics. July 13. Scholar
Owen, Art, and Varian, Hal. 2018. Optimizing the Tie-Breaker Regression Discontinuity Design. August. Scholar
Owen, Art, and Varian, Hal. 2009. Oxford Centre for Evidence-based Medicine – Levels of Evidence. March. Scholar
Park, David K., Gelman, Andrew, and Bafumi, Joseph. 2004. “Bayesian Multilevel Estimation with Poststratification: State-Level Estimates from National Polls.” Political Analysis 375–385.CrossRefGoogle Scholar
Parmenter, David. 2015. Key Performance Indicators: Developing, Implementing, and Using Winning KPIs. 3rd edition. John Wiley & Sons, Inc.Google Scholar
Pearl, Judea. 2009. Causality: Models, Reasoning and Inference. 2nd edition. Cambridge University Press.CrossRefGoogle Scholar
Pekelis, Leonid. 2015. “Statistics for the Internet Age: The Story behind Optimizely’s New Stats Engine.” Optimizely. January 20. Scholar
Pekelis, Leonid, Walsh, David, and Johari, Ramesh. 2015. “The New Stats Engine.” Optimizely. Scholar
Pekelis, Leonid, Walsh, David, and Johari, Ramesh. 2005. Web Site Measurement Hacks. O’Reilly Media.Google Scholar
Peterson, Eric T. 2005. Web Site Measurement Hacks. O’Reilly Media.Google Scholar
Peterson, Eric T. 2004. Web Analytics Demystified: A Marketer’s Guide to Understanding How Your Web Site Affects Your Business. Celilo Group Media and CafePress.Google Scholar
Pfeffer, Jeffrey, and Sutton, Robert I. 1999. The Knowing-Doing Gap: How Smart Companies Turn Knowledge into Action. Harvard Business Review Press.Google Scholar
Phillips, A. W. 1958. “The Relation between Unemployment and the Rate of Change of Money Wage Rates in the United Kingdom, 1861–1957.” Economica, New Series 25 (100): 283299. Scholar
Porter, Michael E. 1998. Competitive Strategy: Techniques for Analyzing Industries and Competitors. Free Press.Google Scholar
Porter, Michael E. 1996. “What is Strategy.” Harvard Business Review 61–78.Google Scholar
Quarto-vonTivadar, John. 2006. “AB Testing: Too Little, Too Soon.” Future Now. Scholar
Radlinski, Filip, and Craswell, Nick. 2013. “Optimized Interleaving For Online Retrieval Evaluation.” International Conference on Web Search and Data Mining. Rome, IT: ASM. 245254.CrossRefGoogle Scholar
Rae, Barclay. 2014. “Watermelon SLAs – Making Sense of Green and Red Alerts.” Computer Weekly. September. Scholar
RAND. 1955. A Million Random Digits with 100,000 Normal Deviates. Glencoe, Ill: Free Press. Scholar
Rawat, Girish. 2018. “Why Most Redesigns fail.” freeCodeCamp. December 4. Scholar
Razali, Nornadiah Mohd, and Wah, Yap Bee. 2011. “Power comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lillefors and Anderson-Darling tests.” Journal of Statistical Modeling and Analytics, January 1: 2133.Google Scholar
Reinhardt, Peter. 2016. Effect of Mobile App Size on Downloads. October 5. Scholar
Resnick, David. 2015. What is Ethics in Research & Why is it Important? December 1. Scholar
Ries, Eric. 2011. The Lean Startup: How Today’s Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses. Crown Business.Google Scholar
Rodden, Kerry, Hutchinson, Hilary, and Xin, Fu. 2010. “Measuring the User Experience on a Large Scale: User-Centered Metrics for Web Applications.” Proceedings of CHI, April. Scholar
Romano, Joseph, Shaikh, Azeem M., and Wolf, Michael. 2016. “Multiple Testing.” In The New Palgrave Dictionary of Economics. Palgram Macmillan.Google Scholar
Rosenbaum, Paul R, and Rubin, Donald B. 1983. “The Central Role of the Propensity Score in Observational Studies for Causal Effects.” Biometrika 70 (1): 4155. doi: Scholar
Rossi, Peter H., Lipsey, Mark W., and Freeman, Howard E.. 2004. Evaluation: A Systematic Approach. 7th edition. Sage Publications, Inc.Google Scholar
Roy, Ranjit K. 2001. Design of Experiments using the Taguchi Approach : 16 Steps to Product and Process Improvement. John Wiley & Sons, Inc.Google Scholar
Rubin, Donald B. 1990. “Formal Mode of Statistical Inference for Causal Effects.” Journal of Statistical Planning and Inference 25, (3) 279292.Google Scholar
Rubin, Donald 1974. “Estimating Causal Effects of Treatment in Randomized and Nonrandomized Studies.” Journal of Educational Psychology 66 (5): 688701.Google Scholar
Rubin, Kenneth S. 2012. Essential Scrum: A Practical Guide to the Most Popular Agile Process. Addison-Wesley Professional.Google Scholar
Russell, Daniel M., and Grimes, Carrie. 2007. “Assigned Tasks Are Not the Same as Self-Chosen Web Searches.” HICSS'07: 40th Annual Hawaii International Conference on System Sciences, January. Scholar
Saint-Jacques, Guillaume B., Aral, Sinan, Airoldi, Edoardo, Brynjolfsson, Erik, and Xu, Ya. 2018. “The Strength of Weak Ties: Causal Evidence using People-You-May-Know Randomizations.” 141–152.Google Scholar
Saint-Jacques, Guillaume, Simpson, Maneesh, Varshney, Jeremy, and Xu, Ya. 2018. “Using Ego-Clusters to Measure Network Effects at LinkedIn.” Workshop on Information Systems and Exonomics. San Francisco, CA.Google Scholar
Samarati, Pierangela, and Sweeney, Latanya. 1998. “Protecting Privacy When Disclosing Information: k-anonymity and its Enforcement through Generalization and Suppression.” Proceedings of the IEEE Symposium on Research in Security and Privacy.Google Scholar
Schrage, Michael. 2014. The Innovator’s Hypothesis: How Cheap Experiments Are Worth More than Good Ideas. MIT Press.Google Scholar
Schrijvers, Ard. 2017. “Mobile Website Too Slow? Your Personalization Tools May Be to Blame.” Bloomreach. February 2. Scholar
Schurman, Eric, and Brutlag, Jake. 2009. “Performance Related Changes and their User Impact.” Velocity 09: Velocity Web Performance and Operations Conference. and Scholar
Scott, Steven L. 2010. “A modern Bayesian look at the multi-armed bandit.” Applied Stochastic Models in Business and Industry 26 (6): 639658. doi: Scholar
Segall, Ken. 2012. Insanely Simple: The Obsession That Drives Apple’s Success. Portfolio Hardcover.Google Scholar
Senn, Stephen. 2012. “Seven myths of randomisation in clinical trials.” Statistics in Medicine. doi:10.1002/sim.5713.CrossRefGoogle Scholar
Shadish, William R., Cook, Thomas D., and Campbell, Donald T.. 2001. Experimental and Quasi-Experimental Designs for Generalized Causal Inference. 2nd edition. Cengage Learning.Google Scholar
Simpson, Edward H. 1951. “The Interpretation of Interaction in Contingency Tables.” Journal of the Royal Statistical Society, Ser. B, 238–241.Google Scholar
Sinofsky, Steven, and Iansiti, Marco. 2009. One Strategy: Organization, Planning, and Decision Making. Wiley.Google Scholar
Siroker, Dan, and Koomen, Pete. 2013. A/B Testing: The Most Powerful Way to Turn Clicks Into Customers. Wiley.Google Scholar
Soriano, Jacopo. 2017. “Percent Change Estimation in Large Scale Online Experiments.” November 3. Scholar
Souders, Steve. 2013. “Moving Beyond window.onload().” High Performance Web Sites Blog. May 13. Scholar
Souders, Steve. 2009. Even Faster Web Sites: Performance Best Practices for Web Developers. O’Reilly Media.Google Scholar
Souders, Steve. 2007. High Performance Web Sites: Essential Knowledge for Front-End Engineers. O’Reilly Media.Google Scholar
Spitzer, Dean R. 2007. Transforming Performance Measurement: Rethinking the Way We Measure and Drive Organizational Success. AMACOM.Google Scholar
Stephens-Davidowitz, Seth, Varian, Hal, and Smith, Michael D.. 2017. “Super Returns to Super Bowl Ads?Quantitative Marketing and Economics, March 1: 128.CrossRefGoogle Scholar
Sterne, Jim. 2002. Web Metrics: Proven Methods for Measuring Web Site Success. John Wiley & Sons, Inc.Google Scholar
Strathern, Marilyn. 1997. “‘Improving ratings’: Audit in the British University System.” European Review 5 (3): 305321. doi:10.1002/(SICI)1234-981X(199707)5:33.0.CO;2-4.Google Scholar
Student, . 1908. “The Probable Error of a Mean.” Biometrika 6 (1): 125. Scholar
Sullivan, Nicole. 2008. “Design Fast Websites.” Slideshare. October 14. Scholar
Tang, Diane, Agarwal, Ashish, O’Brien, Deirdre, and Meyer, Mike. 2010. “Overlapping Experiment Infrastructure: More, Better, Faster Experimentation.” Proceedings 16th Conference on Knowledge Discovery and Data Mining.Google Scholar
The Guardian. 2014. OKCupid: We Experiment on Users. Everyone does. July 29. Scholar
The National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. 1979. The Belmont Report. April 18. Scholar
Thistlewaite, Donald L., and Campbell, Donald T.. 1960. “Regression-Discontinuity Analysis: An Alternative to the Ex-Post Facto Experiment.” Journal of Educational Psychology 51 (6): 309317. doi: Scholar
Thomke, Stefan H. 2003. “Experimentation Matters: Unlocking the Potential of New Technologies for Innovation.”Google Scholar
Tiffany, Kaitlyn. 2017. “This Instagram Story Ad with a Fake Hair in It is Sort of Disturbing.” The Verge. December 11. Scholar
Tolomei, Sam. 2017. Shrinking APKs, growing installs. November 20. Scholar
Tutterow, Craig, and Saint-Jacques, Guillaume. 2019. Estimating Network Effects Using Naturally Occurring Peer Notification Queue Counterfactuals. February 19. Scholar
Tyler, Mary E., and Ledford, Jerri. 2006. Google Analytics. Wiley Publishing, Inc.Google Scholar
Tyurin, I.S. 2009. “On the Accuracy of the Gaussian Approximation.” Doklady Mathematics 429 (3): 312316.Google Scholar
Ugander, Johan, Karrer, Brian, Backstrom, Lars, and Kleinberg, Jon. 2013. “Graph Cluster Randomization: Network Exposure to Multiple Universes.” Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 329–337.CrossRefGoogle Scholar
van Belle, Gerald. 2008. Statistical Rules of Thumb. 2nd edition. Wiley-Interscience.Google Scholar
Vann, Michael G. 2003. “Of Rats, Rice, and Race: The Great Hanoi Rat Massacre, an Episode in French Colonial History.” French Colonial History 4: 191203. Scholar
Varian, Hal. 2016. “Causal inference in economics and marketing.” Proceedings of the National Academy of Sciences of the United States of America 7310–7315.Google Scholar
Varian, Hal R. 2007. “Kaizen, That Continuous Improvement Strategy, Finds Its Ideal Environment.” The New York Times. February 8. Scholar
Vaver, Jon, and Koehler, Jim. 2012. Periodic Measuement of Advertising Effectiveness Using Multiple-Test Period Geo Experiments. Google Inc.Google Scholar
Vaver, Jon, and Koehler, Jim. 2011. Measuring Ad Effectiveness Using Geo Experiments. Google, Inc.Google Scholar
Vickers, Andrew J. 2009. What Is a p-value Anyway? 34 Stories to Help You Actually Understand Statistics. Pearson. Scholar
Vigen, Tyler. 2018. Spurious Correlations. Scholar
Wager, Stefan, and Athey, Susan. 2018. “Estimation and Inference of Heterogeneous Treatment Effects using Random Forests.” Journal of the American Statistical Association 13 (523): 12281242. doi: Scholar
Wasserman, Larry. 2004. All of Statistics: A Concise Course in Statistical Inference. Springer.CrossRefGoogle Scholar
Weiss, Carol H. 1997. Evaluation: Methods for Studying Programs and Policies. 2nd edition. Prentice Hall.Google Scholar
Funnel, Wider. 2018. “The State of Experimentation Maturity 2018.” Wider Funnel. Scholar
Wikipedia contributors, Above the Fold. 2014. Wikipedia, The Free Encyclopedia. Jan. Scholar
Wikipedia contributors, Cobra Effect. 2019. Wikipedia, The Free Encyclopedia. Scholar
Wikipedia contributors, Data Dredging. 2019. Data dredging. Scholar
Wikipedia contributors, Eastern Air Lines Flight 401. 2019. Wikipedia, The Free Encyclopedia. Scholar
Wikipedia contributors, List of .NET libraries and frameworks. 2019. Scholar
Wikipedia contributors, Logging as a Service. 2019. Logging as a Service. Scholar
Wikipedia contributors, Multiple Comparisons Problem. 2019. Wikipedia, The Free Encyclopedia. Scholar
Wikipedia contributors, Perverse Incentive. 2019. Scholar
Wikipedia contributors, Privacy by Design. 2019. Wikipedia, The Free Encyclopedia. Scholar
Wikipedia contributors, Semmelweis Reflex. 2019. Wikipedia, The Free Encyclopedia. Scholar
Wikipedia contributors, Simpson’s Paradox. 2019. Wikipedia, The Free Encyclopedia. Accessed February 28, 2008. Scholar
Wolf, Talia. 2018. “Why Most Redesigns Fail (and How to Make Sure Yours Doesn’t).” GetUplift. Scholar
Xia, Tong, Bhardwaj, Sumit, Dmitriev, Pavel, and Fabijan, Aleksander. 2019. “Safe Velocity: A Practical Guide to Software Deployment at Scale using Controlled Rollout.” ICSE: 41st ACM/IEEE International Conference on Software Engineering. Montreal, Canada. Scholar
Xie, Huizhi, and Aurisset, Juliette. 2016. “Improving the Sensitivity of Online Controlled Experiments: Case Studies at Netflix.” KDD ’16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY: ACM. 645654. Scholar
Xu, Ya, and Chen, Nanyu. 2016. “Evaluating Mobile Apps with A/B and Quasi A/B Tests.” KDD ’16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, California, USA: ACM. 313322. Scholar
Xu, Ya, Duan, Weitao, and Huang, Shaochen. 2018. “SQR: Balancing Speed, Quality and Risk in Online Experiments.” 24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. London: Association for Computing Machinery. 895904.Google Scholar
Xu, Ya, Chen, Nanyu, Fernandez, Adrian, Sinno, Omar, and Bhasin, Anmol. 2015. “From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks.” KDD ’15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Sydney, NSW, Australia: ACM. 22272236. Scholar
Yoon, Sangho. 2018. Designing A/B Tests in a Collaboration Network. Scholar
Young, S. Stanley, and Karr, Allan. 2011. “Deming, data and observational studies: A process out of control and needing fixing.” Significance 8 (3).Google Scholar
Zhang, Fan, Joseph, Joshy, and James, Alexander, Zhuang, Peng Rickabaugh. 2018. Client-Side Activity Monitoring. US Patent US 10,165,071 B2. December 25.Google Scholar
Zhao, Zhenyu, Chen, Miao, Matheson, Don, and Stone, Maria. 2016. “Online Experimentation Diagnosis and Troubleshooting Beyond AA Validation.” DSAA 2016: IEEE International Conference on Data Science and Advanced Analytics. IEEE. 498–507. doi: Scholar