To send content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about sending content to .
To send content items to your Kindle, first ensure email@example.com
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about sending to your Kindle.
Note you can select to send to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Why you care: The choice of randomization unit is critical in experiment design, as it affects both the user experience as well as what metrics can be used in measuring the impact of an experiment. When building an experimentation system, you need to think through what options you want to make available. Understanding the options and the considerations to use when choosing amongst them will lead to improved experiment design and analysis.
Why you care: Triggering provides experimenters with a way to improve sensitivity (statistical power) by filtering out noise created by users who could not have been impacted by the experiment. As organizational experimentation maturity improves, we see more triggered experiments being run.
Why you care: Running A/A tests is a critical part of establishing trust in an experimentation platform. The idea is so useful because the tests fail many times in practice, which leads to re-evaluating assumptions and identifying bugs.
As discussed in Chapter 1, running trustworthy controlled experiments is the scientific gold standard in evaluating many (but not all) ideas and making data-informed decisions. What may be less clear is that making controlled experiments easy to run also accelerates innovation by decreasing the cost of trying new ideas, as the quotation from Moran shows above, and learning from them in a virtuous feedback loop. In this chapter, we focus on what it takes to build a robust and trustworthy experiment platform. We start by introducing experimentation maturity models that show the various phases an organization generally goes through when starting to do experiments, and then we dive into the technical details of building an experimentation platform.
Why you care: Understanding the ethics of experiments is critical for everyone, from leadership to engineers to product managers to data scientists; all should be informed and mindful of the ethical considerations. Controlled experiments, whether in technology, anthropology, psychology, sociology, or medicine, are conducted on actual people. Here are questions and concerns to consider when determining when to seek expert counsel regarding the ethics of your experiments.
Why you care: Guardrail metrics are critical metrics designed to alert experimenters about violated assumptions. There are two types of guardrail metrics: organizational and trust-related. Chapter 7 discusses organizational guardrails that are used to protect the business, and this chapter describes the Sample Ratio Mismatch (SRM) in detail, which is a trust-related guardrail. The SRM guardrail should be included for every experiment, as it is used to ensure the internal validity and trustworthiness of the experiment results. A few other trust-related guardrail metrics are also described here.
William Anthony Twyman was a UK radio and television audience measurement veteran (MR Web 2014) credited with formulating Twyman’s law, although he apparently never explicitly put it in writing, and multiple variants of it exist, as shown in the above quotations.
In Chapter 1, we reviewed what controlled experiments are and the importance of getting real data for decision making rather than relying on intuition. The example in this chapter explores the basic principles of designing, running, and analyzing an experiment. These principles apply to wherever software is deployed, including web servers and browsers, desktop applications, mobile applications, game consoles, assistants, and more. To keep it simple and concrete, we focus on a website optimization example. In Chapter 12, we highlight the differences when running experiments for thick clients, such as native desktop and mobile apps.
Why you care: Before you can run any experiments, you must have instrumentation in place to log what is happening to the users and the system (e.g., website, application). Moreover, every business should have a baseline understanding of how the system is performing and how users interact with it, which requires instrumentation. When running experiments, having rich data about what users saw, their interactions (e.g., clicks, hovers, and time-to-click), and system performance (e.g., latencies) is critical.
Why you care: You can run experiments either on a thin client, such as a web browser, or on a thick client, such as a native mobile app or a desktop client app. Changes for a webpage, regardless of whether it is frontend or backend, are fully controlled by the server. This is very different from a thick client. With an explosive growth of mobile usage, the number of experiments running on mobile apps has also grown (Xu and Chen 2016). Understanding the differences between thin and thick clients due to release process, infrastructure, and user behavior is useful to ensure trustworthy experiments.
In 2012, an employee working on Bing, Microsoft’s search engine, suggested changing how ad headlines display (Kohavi and Thomke 2017). The idea was to lengthen the title line of ads by combining it with the text from the first line below the title, as shown in Figure 1.1.