Behavioural economic findings have not only influenced economic theory (Rabin, Reference Rabin1998; Chetty, Reference Chetty2015; Thaler, Reference Thaler2016), but also public policy-making (Thaler & Sunstein, Reference Thaler and Sunstein2008; Shafir, Reference Shafir2013; Sunstein, Reference Sunstein2016b; Oliver, Reference Oliver2017). However, at least three methodological shortcomings have been identified that have the potential to reduce the effectiveness and ethical legitimacy of behavioural public policies. First, not much is known about the generalizability of behavioural findings from the laboratory to the real world where public policies take effect (Levitt & List, Reference Levitt and List2007; Gneezy & Imas, Reference Gneezy, Imas, Banerjee and Duflo2017; Galizzi & Navarro-Martinez, Reference Galizzi and Navarro-Martinez2018). Second, much of the existing behavioural public policy literature has focused on identifying ‘what works’ and less on investigating ‘why’ behavioural interventions work. Mechanistic evidence about the ‘why’, however, is important to ensure that policies are effective, robust, persistent and welfare-improving in their target environments (Harrison, Reference Harrison2014; Grüne-Yanoff, Reference Grüne-Yanoff2015). Finally, the field has not agreed upon a way to identify true preferences when decision-making can be biased and thus lacks a welfare standard to evaluate behavioural public policies (Beshears et al., Reference Beshears, Choi, Laibson and Madrian2008; Hausman, Reference Hausman2012; Infante et al., Reference Infante, Lecouteux and Sugden2016; Sugden, Reference Sugden2017).
This paper discusses how naturalistic monitoring of people's everyday decision-making biases can help to overcome these shortcomings. Naturalistic monitoring (or ecological momentary assessment) describes the observation of people's behaviours and experiences ‘in the wild’ (i.e., in people's natural environments where most of their economic decision-making takes place) (Shiffman et al., Reference Shiffman, Stone and Hufford2008). It includes self-reported measures, but also more objective observations, for example of individuals’ psychobiology (such as heart rate and skin conductance) and GPS data (Daly et al., Reference Daly, Baumeister, Delaney and MacLachlan2014). One of the most popular naturalistic monitoring tools is the Day Reconstruction Method (DRM). The DRM was developed by Kahneman et al. (Reference Kahneman, Krueger, Schkade, Schwarz and Stone2004a) as a cost-effective way to measure both how people allocate their time and how they feel in their everyday lives. So far, the use of the DRM has been limited to measuring the determinants and consequences of subjective well-being in everyday life.
This paper discusses how the DRM can improve behavioural public policy-making by providing data about when, how and why decision-making biases occur in everyday life. The DRM, and naturalistic monitoring more generally, can be a helpful complement to existing methods of obtaining data for behavioural policy design and evaluation. It can complement laboratory experiments by identifying the relevance of decision-making biases in the real world, as well as identifying situational factors that influence the extent to which decision-making is biased in everyday life. The DRM can complement field experiments and randomized controlled trials (RCTs) by providing information about the mechanisms that determine why a given policy intervention is effective in everyday life or not. This mechanistic evidence is essential to ensuring that the intervention will also work in other contexts. The DRM can also provide a way to identify what people want in their everyday lives (or their ‘subjective preferences’), which might be used as a standard to assess the welfare implications of behavioural policies.
While the DRM has shortcomings, which we discuss below, we see its biggest promise in its capacity to provide information about the high-frequency decisions that specific sub-populations make in their everyday lives. While common surveys and experiments are well-suited to analysing low-frequency decisions made with substantial deliberation, the DRM provides a window into the otherwise difficult to observe behavioural patterns of everyday life and their determinants. Some of these determinants are not under the influence of individuals or policy-makers (e.g., time of the day, weather, etc.), but others are (e.g., congestion, density of fast food restaurants, medication, work email policies, etc.). Due to its scalability, the DRM has the capacity to distinguish between different sub-populations and thus allows for the identification of situational factors that induce detrimental decision-making amongst the heterogeneous population (e.g., in terms of economic status, family status, income, education, financial literacy, geographical location, personality traits and economic preferences, etc.). Such a fine-grained analysis in terms of both the situational context determinants of behaviour and the different sub-populations is very suitable to supporting the discovery of actionable behavioural policy interventions. The DRM can thus complement the growing literature that uses behavioural findings to change the choice architecture of everyday life in order to encourage welfare-enhancing decision-making (Thaler & Sunstein, Reference Thaler and Sunstein2008).
Naturalistic monitoring and the DRM
Naturalistic monitoring describes the observation of people's behaviours and experiences ‘in the wild’ (i.e. in people's natural environments) (Shiffman et al., Reference Shiffman, Stone and Hufford2008). The DRM is one of the most popular naturalistic monitoring tools. It was developed by Kahneman et al. (Reference Kahneman, Krueger, Schkade, Schwarz and Stone2004a) and combines features of experience sampling, time-budget elicitation and classical survey questions. The DRM aims to capture an entire day's experience while curtailing opportunities for recall biases, limiting participant burden and providing a cost-effective alternative to more burdensome experience sampling methodologies (Sonnenberg et al., Reference Sonnenberg, Riediger, Wrzus and Wagner2012; Diener & Tay, Reference Diener and Tay2014). The DRM is widely used to measure subjective well-being in everyday life and has been applied both in large representative population surveys such as the American Time Use Survey and the German Socio-Economic Panel (Krueger et al., Reference Krueger, Kahneman, Schkade, Schwarz and Stone2009; Anusic et al., Reference Anusic, Lucas and Donnellan2017) and in smaller-scale studies focused on specific topics (Srivastava et al., Reference Srivastava, Angelo and Vallereux2008; Daly et al., Reference Daly, Delaney, Doran, Harmon and MacLachlan2010; Knabe et al., Reference Knabe, Rätzel, Schöb and Weimann2010; Bakker et al., Reference Bakker, Demerouti, Oerlemans and Sonnentag2013; Daly et al., Reference Daly, Baumeister, Delaney and MacLachlan2014; Ishio & Abe, Reference Ishio and Abe2017; Lee et al., Reference Lee, Tse and Lee2017).
In a typical DRM study, participants are first asked to complete a personal diary in which they divide their previous day into ‘episodes’, as if each episode were a scene in a movie. Participants are asked to reflect upon what they did and how they felt during each episode. In the second phase, participants complete a survey in which they are asked questions about each episode chronologically. These questions can address any themes the researchers are interested in. Kahneman et al. (Reference Kahneman, Krueger, Schkade, Schwarz and Stone2004a) asked about the episodes’ start and end times, the type of activity participants were engaged in (e.g., commuting to work, having a meal, exercising), where they were and the emotional states they were in (e.g., happiness, boredom, hunger). In such a setup, the DRM can elicit whether particular subjective experiences are correlated with situational aspects in everyday life (for more detailed descriptions of the DRM, see Kahneman et al., Reference Kahneman, Krueger, Schkade, Schwarz and Stone2004a, Reference Kahneman, Krueger, Schkade, Schwarz and Stone2004b; National Research Council (US), 2012; Diener & Tay, Reference Diener and Tay2014).
The most prominent alternative to the DRM is the Experience Sampling Method (ESM). The ESM is a real-time data capture tool developed by Larson and Csikszentmihalyi (Reference Larson and Csikszentmihalyi1983) in which participants are prompted at random intervals during the day through an electronic device (today it would be a mobile phone; Hofmann & Patel, Reference Hofmann and Patel2015) to record what they are doing and feeling at that moment. The ESM is sometimes considered the gold standard of naturalistic monitoring as it circumvents memory biases and allows for the elicitation of additional objective data from the smartphone. However, ESM studies are relatively expensive and burdensome and can suffer from low response rates, which makes it difficult to obtain large samples and to attach the studies to ongoing surveys (National Research Council (US), 2012). DRM studies are comparably easy to conduct and can be managed by most researchers without the help of technicians (Anusic et al., Reference Anusic, Lucas and Donnellan2017). Moreover, due to the ESM's invasive nature, study participants might become conscious of their actions throughout the day, which could change their behaviour. Finally, the ESM and the DRM provide conceptually similar data (Kahneman et al., Reference Kahneman, Krueger, Schkade, Schwarz and Stone2004a; Dockray et al., Reference Dockray, Grant, Stone, Kahneman, Wardle and Steptoe2010; Sonnenberg et al., Reference Sonnenberg, Riediger, Wrzus and Wagner2012). While this paper makes a general case for informing behavioural policy-making with naturalistic monitoring techniques, the DRM is arguably the most suitable method due to its cost-effectiveness and scalability, especially when analysing large samples.
Using naturalistic monitoring to inform behavioural public policy
The DRM is frequently used to measure subjective well-being in daily life, and a number of researchers have suggested using subjective well-being data to refocus or inform policy decisions (Kahneman & Sugden, Reference Kahneman and Sugden2005). More generally, subjective well-being research has emerged as a growing literature (Frey & Stutzer, Reference Frey and Stutzer2002; Blanchflower & Oswald, Reference Blanchflower and Oswald2004), and the case has been made to use affective, cognitive and eudaimonic forms of subjective well-being as alternative, non-monetary objectives for societal improvement (Layard, Reference Layard2006; Dolan & White, Reference Dolan and White2007; Stiglitz et al., Reference Stiglitz, Sen and Fitoussi2010; Odermatt & Stutzer, Reference Odermatt, Stutzer, Diener, Oishi and Tay2017). This section discusses three reasons why we consider it useful for behavioural public policy-making to broaden the analysis of everyday life beyond subjective well-being and also to monitor decision-making biases using naturalistic monitoring tools such as the DRM.
Improving ecological validity
Behavioural economics is the study of how real-world ‘Humans’ (rather than the ‘Econs’ from most economics textbooks) make economic decisions (Thaler & Sunstein, Reference Thaler and Sunstein2008; Dhami, Reference Dhami2016). Despite behavioural economics’ emphasis on the real world, however, the field is built on findings from laboratory experiments and surveys that are often abstract, that situate participants in artificial contexts and that often rely on student samples. Recognizing a gap between measurement in artificial contexts and the real world, Levitt and List (Reference Levitt and List2007) argue that “perhaps the most fundamental question in experimental economics is whether findings from the lab are likely to provide reliable inferences outside of the laboratory” (p. 170). The related “fundamental” question about the extent to which people make the same mistakes in the real-world as those found in the laboratory has not yet been answered systematically. Recent studies do not paint an overly optimistic picture of the generalizability of behavioural laboratory findings, as they suggest in many cases that there are no significant associations between economic preferences measured in the laboratory and theoretically related behaviour outside the laboratory (Delaney & Lades, Reference Delaney and Lades2017; Gneezy & Imas, Reference Gneezy, Imas, Banerjee and Duflo2017; Galizzi & Navarro-Martinez, Reference Galizzi and Navarro-Martinez2018). The relevance of behavioural economic laboratory experiments to help us better understand decision-making in everyday life may well be limited in the sense that experimental studies and their results do not tell us much about the real world. A range of behavioural biases identified in behavioural economics may characterize laboratory but not real-world decision-making.
Psychologists have long been concerned about how to generalize behaviour from experimental settings to the real world (e.g., Loewenstein, Reference Loewenstein1999; Kaplan & Stone, Reference Kaplan and Stone2013). A common strategy in psychology to overcome problems of low ecological validity is to design experimental stimuli that are good representations of naturally occurring environments (Brunswik, Reference Brunswik1956). But most experimental economic studies, on which many behavioural public policies rely, cannot (and do not aim to) reflect contextual factors that affect decision-making in everyday life. On the contrary, most economic experiments are designed to represent context-free representations of the payoff structures in real-world situations. While economic experiments have many advantages (e.g., the ability to control other variables, ease of replication, high internal validity and reliability, focus on clean causal effects), they abstract from realistic frames, put participants into unfamiliar roles in unfamiliar contexts that do not reflect real-world situations and encourage reflective decision-making rather than the automatic decision-making that is more common in the real world.
In order to increase the relevance of behavioural economics for a better understanding of how people make decisions in their real lives, studies can be conducted ‘in vivo’ (i.e., in the real world where everyday decision-making takes place). By definition, naturalistic measurement therefore does not share the ecological validity problems that characterize some laboratory experiments. Most importantly for behavioural scientists, naturalistic monitoring allows us to make inferences about how, when and where decision-making biases occur in real life. Providing information about the extent to which decision-making biases are relevant in everyday life is key for the cost–benefit analysis that should inform any policy or regulatory intervention (Sunstein, Reference Sunstein2016a). As such, naturalistic monitoring can complement (not substitute) experimental studies by providing data on the real-world relevance of decision-making biases that were previously identified in highly controlled laboratory experiments.
For example, a recent stream of naturalistic monitoring research has provided novel findings on self-control (Hofmann et al., Reference Hofmann, Baumeister, Förster and Vohs2012; Delaney & Lades, Reference Delaney and Lades2017; Milyavskaya & Inzlicht, Reference Milyavskaya and Inzlicht2017; Wilkowski et al., Reference Wilkowski, Ferguson, Williamson and Lappi2018), which is historically among the most prominent topics in the behavioural sciences (Elster, Reference Elster1979; Hoch & Loewenstein, Reference Hoch and Loewenstein1991; Thaler, Reference Thaler2018). In their seminal study on self-control in everyday life, Hofmann et al. (Reference Hofmann, Baumeister, Förster and Vohs2012) used the ESM to provide a detailed picture of everyday desires and self-control failures. Among other results, they found that desires are frequent, variable in intensity and mostly unproblematic in everyday life. However, they also found a non-trivial amount of self-control failures. Their participants enacted 17% of the desires despite resistance attempts. The study by Delaney and Lades (Reference Delaney and Lades2017) used the DRM to replicate most of the findings presented by Hofmann et al. (Reference Hofmann, Baumeister, Förster and Vohs2012) and showed that participants enact more than 30% of the desires despite resistance attempts. This study also showed that the most typical behavioural economic measure of self-control, namely present bias as measured using a financial inter-temporal choice task, is not significantly correlated with any aspect of self-control in everyday life. These findings show both that self-control failures are indeed prevalent in everyday life and that typical experimental measures of self-control in behavioural economics seem to measure another phenomenon entirely.
Obtaining mechanistic evidence
At the core of behavioural economics’ relevance for policy-making is the suggestion of changing the choice architecture in addition to (or even rather than) educating individuals to change behaviour and intervening with harder regulation or mandates (Thaler & Sunstein, Reference Thaler and Sunstein2008). For example, presenting healthy food first in cafeterias and simplifying forms to nudge individuals to behave differently can complement and substitute for awareness campaigns that aim to encourage healthy eating and filling forms correctly. Data on the effectiveness of such nudges often come from RCTs that are conducted with high ecological validity in the relevant real-world contexts, often in collaboration with businesses and/or policy-makers (Harrison & List, Reference Harrison and List2004; Halpern, Reference Halpern2015; Duflo, Reference Duflo2017; Gneezy & Imas, Reference Gneezy, Imas, Banerjee and Duflo2017). RCTs are often considered the gold standard if the aim is to identify the effect of an intervention in a given context, and an impressive body of literature has now emerged examining the causal impact on behaviour of changing aspects of the choice environment that, from a neoclassical perspective, should not impact on people's decisions.
However, several scholars have pointed out that the treatment effects from RCTs themselves cannot provide a basis for developing theoretical accounts of decision-making in real-world economic environments. RCTs have also been criticized for providing information that is not necessarily generalizable to other contexts, for being limited to evaluations of observables and average effects and for not providing data on the latent welfare consequences of interventions (Harrison, Reference Harrison2014; Grüne-Yanoff, Reference Grüne-Yanoff2015; Deaton & Cartwright, Reference Deaton and Cartwright2018). In the context of this paper, most importantly, RCTs do not explore the mechanisms that explain why interventions work, but instead focus on ‘what works’. These mechanisms are typically assumed relying on theory inspired by experimental findings from laboratory environments.
The rich data that naturalistic monitoring provides allow for the testing of detailed mechanisms of behavioural change. Similar to Harrison (Reference Harrison2014), who suggested complementing field studies with experimental studies on risk attitudes, subjective risk perception and time preferences, we suggest that it is of value to complement RCTs with naturalistic monitoring data. The main advantage of naturalistic monitoring for the evaluation of behaviourally informed policies is that we can directly measure the choice architecture, as well as its influence on real-world decision-making and everyday behaviour. While the DRM has not yet been used to evaluate treatment effects on behavioural change, it has been used to evaluate the effects of an early intervention policy on the subjective well-being of mothers (Doyle et al., Reference Doyle, Delaney, O'Farrelly, Fitzpatrick and Daly2017), which shows that it is possible to integrate a DRM element into RCTs. By measuring the effects of different variations of the choice architecture on decision-making patterns and the outcomes of these decisions, we will better understand the mechanisms that can explain why an intervention is effective or not. This will help us to better understand how choice architecture informs individual decision-making in daily life. As such. RCTs augmented by naturalistic monitoring can help address one of the key methodological weaknesses of behaviourally informed policies by supporting the design of policies that are ‘effective, robust, persistent or welfare-improving’ in their target environments (Grüne-Yanoff, Reference Grüne-Yanoff2015).
For example, the recent naturalistic monitoring studies on everyday desires and self-control (e.g., Hofmann et al., Reference Hofmann, Baumeister, Förster and Vohs2012; Delaney & Lades, Reference Delaney and Lades2017) provide mechanistic evidence about when, where, whether and why self-control failures in everyday life occur. These studies show that it is possible to identify different decision-making processes that can lead to similar behaviours. For example, eating a delicious but unhealthy snack might be the result of a self-control failure (e.g., when the person is on a diet) or might be in line with higher-order goals (e.g., when the snack is considered a reward or a treat). By asking whether the person attempted to resist eating the snack, researchers are able to differentiate between these decision-making processes leading to the same outcome.Footnote 1 Naturalistic monitoring studies can also identify individual differences and their links to everyday behaviours. Hofmann et al. (Reference Hofmann, Baumeister, Förster and Vohs2012) showed that personality is a relatively strong predictor of desire strength and conflict strength in everyday life, while situational factors, such as alcohol consumption, are stronger predictors of attempts to resist and eventually enacting desired behaviours. This suggests that behavioural interventions modifying the choice architecture are most effective when attempting to influence later stages in the decision-making process from desire to enactment. Finally, the findings from Delaney and Lades (Reference Delaney and Lades2017) suggest that everyday self-control failures are more likely to be due to visceral influences rather than the decreasing impatience that much of the inter-temporal choice literature suggests as the most likely mechanism for self-control failures.
Identifying true preferences
Arguably the most important implication of behavioural economics for welfare economics is that choices do not necessarily reveal true preferences if these choices are based on cognitive biases. Since Pareto's (Reference Pareto1971) argument for the removal of psychological considerations from economic theory and Samuelson's (Reference Samuelson1938) development of revealed preference theory, economic analysis has been focused on choice and on other observable conditions. If people choose good A when good B is available, good A is ‘revealed preferred’ to B. Policies that provide people with A are assumed to be better for individual welfare than policies that provide B. However, if people do not always decide rationally and with perfect willpower (e.g., when the decision to choose A is the result of a self-control failure), choices do not necessarily reveal true preferences, and policies might fail to maximize individual welfare as they recover preferences from biased or weak-willed choices. And even under the assumption that people make rational and controlled decisions, choice alone, in the absence of knowledge about individual beliefs, cannot reveal preference (Hausman, Reference Hausman2000). Thus, new ways of identifying individual preferences, or other appropriate welfare measures, are needed in order to evaluate policy interventions (Beshears et al., Reference Beshears, Choi, Laibson and Madrian2008; Chetty, Reference Chetty2015).
Some soft paternalists argue that the achievement of generally held higher-order goals (e.g., in terms of health, wealth and well-being) is a good welfare standard, as these higher-order goals represent what individuals want when not influenced by bounded rationality and limited willpower (Thaler & Sunstein, Reference Thaler and Sunstein2008). Others are sceptical as to whether policy-makers can identify whether people have these higher-order preferences (Rizzo & Whitman, Reference Rizzo and Whitman2009). Strategies have been devised to identify preferences when decisions can be biased. For example, Beshears et al. (Reference Beshears, Choi, Laibson and Madrian2008) provide several strategies to align revealed and normative preferences, Hausman's (Reference Hausman2012) ‘preference purification’ approach suggests reconstructing preferences by controlling for bias and Bernheim (Reference Bernheim2016) argues for using only the subset of decisions that is clearly unbiased when identifying preferences. All of these approaches have in common that they require some information on the behavioural mechanisms that underlie potentially biased decision-making.
As mentioned in the previous subsection, naturalistic monitoring can provide such information on behavioural mechanisms. For welfare analysis, it is particularly useful to differentiate between biased and unbiased decision-making, as well as between dynamically consistent and dynamically inconsistent decision-making (e.g., to identify whether a behaviour is or is not the result of a self-control failure). Following Bernheim (Reference Bernheim2016), one could use naturalistic monitoring techniques in order to identify those behaviours that are not driven by cognitive biases and self-control failures and use only those unbiased and dynamically consistent choices to recover true preferences. Similarly, naturalistic monitoring can help test whether people have stable and well-defined preferences – conditional on context. By measuring decisions and associated contexts repeatedly, clear patterns might emerge that allow the predicting of future choices in similar contexts and that provide a basis for recovering stable underlying preferences. Finally, by using naturalistic monitoring techniques, one can elicit the subjective beliefs that individuals have at the moment of making a decision. This approach might overcome the problem posed by Hausman (Reference Hausman2000), who argues that choices alone cannot reveal beliefs nor preferences, because any choice is consistent with any preference, given the right set of beliefs.Footnote 2
Naturalistic monitoring also allows for a direct measurement of what people want in their everyday lives and whether they get it. For example, Hofmann et al. (Reference Hofmann, Baumeister, Förster and Vohs2012) and Delaney and Lades (Reference Delaney and Lades2017) provide detailed accounts of everyday desires and their satisfaction. Preference satisfaction approaches have historically been used by many economists to guide welfare evaluations, and data from everyday life can inform us as to whether people satisfy their short-term preferences or not. This approach of measuring subjective preferences directly is analogous to previous research that measures experiential utility in everyday life (Kahneman et al., Reference Kahneman, Krueger, Schkade, Schwarz and Stone2004a). But rather than measuring how people feel, these new approaches measure what people want (i.e., their ‘wantability’; Fisher, Reference Fisher1918) in everyday life. Such direct measures of preference and of their satisfaction could complement indirect preference measures that rely on choice data in order to reveal preferences. If the satisfaction of short-term desires is taken as the guide to welfare, a policy that gives people what they want at any given moment would be preferred over a policy that restricts choice. Similarly, different desires could be ranked according to their normative weight in the sense of a hierarchy of needs, as suggested already by Maslow (Reference Maslow1943) and discussed in Witt (Reference Witt2017). A benevolent dictator might come up with a list of short-term desires that policies should encourage and with another list of desires to discourage. We do not claim that these would be the normatively best evaluation criteria, but merely state that naturalistic monitoring allows for quantifying preferences and their satisfaction in everyday life. Whatever welfare criterion related to everyday life one favours, naturalistic monitoring can provide data to assess the success of a policy based on this welfare criterion.Footnote 3
Finally, the most common way to identify preferences using naturalistic monitoring techniques is to decompose and compare different components of subjective well-being. Based on the distinction between life satisfaction as the evaluate component of subjective well-being and momentary happiness as its experiential component, Knabe et al. (Reference Knabe, Rätzel, Schöb and Weimann2010) used the DRM to show that unemployed people are dissatisfied with life, but have a good day in terms of momentary happiness. They explain the relatively high momentary happiness of the unemployed by the lack of time spent at work, which is one of the least enjoyable activities. In a follow-up study, Knabe et al. (Reference Knabe, Schöb and Weimann2017) show that workfare participants’ life satisfaction is between that of employed and unemployed people and that workfare participants’ emotional well-being is the highest of these three groups.
Assessing the DRM as a tool for behavioural public policy: methodological considerations
There are several situations in which the DRM can be used in public policy contexts. However, the benefits and costs of using the DRM, and naturalistic monitoring more generally, depend on a number of methodological considerations. This section discusses the opportunities and limitations of using the DRM with the purpose of informing behavioural public policies.
Methodological options and opportunities
The aim of the seminal DRM study by Kahneman et al. (Reference Kahneman, Krueger, Schkade, Schwarz and Stone2004a) was to measure experiences, settings, activities and time allocation in peoples’ daily lives. The study's focus was on presenting affect ratings and their correlations with contemporaneous situational factors such as activities, interaction partners and time of the day. The study design, however, is versatile and allows for various modifications, depending on the specific research question.Footnote 4 For example, in Kahneman et al. (Reference Kahneman, Krueger, Schkade, Schwarz and Stone2004a), participants completed the survey on a paper-and-pencil basis. But recent research has moved towards digital versions of the DRM, sometimes completely digital and sometimes with a paper diary to keep personal details anonymous (e.g., National Research Council (US), 2012; Bakker et al., Reference Bakker, Demerouti, Oerlemans and Sonnentag2013; Delaney & Lades, Reference Delaney and Lades2017). Moreover, in most DRM studies, participants are told that the typical length of an episode is between 15 minutes and 2 hours, but it is also possible to predefine certain time intervals, for example 2-hour intervals, and ask participants follow-up questions regarding this interval. The time intervals can be shortened and the absolute number of episodes reduced if the research is about rather short-lived aspects of everyday life, such as specific decisions and their psychological and situational correlates. The predefined time intervals also reduce individual heterogeneity in terms of the number of reported episodes (Diener & Tay, Reference Diener and Tay2014). It is also possible to deviate from the episode as the unit of analysis according to which the second phase of the DRM is structured. The follow-up questions would then be concerned with one of the other questions asked in the first phase, such as activity, decision or social interaction. For example, in the American Time Use Survey (ATUS) well-being module, after documenting time use for the full previous day, respondents are asked to rate their affect during certain activities rather than episodes (National Research Council (US), 2012).
Abbreviated versions of the DRM have also been used to keep participant burden low. For example, in the German Socio-Economic Panel, participants are asked to complete the full first-phase diary, but in the second phase, follow-up questions are asked only about some randomly sampled episodes (Anusic et al., Reference Anusic, Lucas and Donnellan2017). Such abbreviated versions of the DRM provide results that are similar to more comprehensive versions (Miret et al., Reference Miret, Caballero, Mathur, Naidoo, Kowal, Ayuso-Mateos and Chatterji2012). It is also possible to ask participants to recall only a subset of episodes in the first phase (e.g., only the morning, afternoon or evening, or beginning with a randomly specified point in time yesterday) and to ask participants to answer follow-up questions for all recalled episodes. In the ATUS, the full day is reconstructed in terms of activities, and three activities are randomly selected for follow-up questions. The English Longitudinal Study of Ageing focuses on seven specific activities (watching TV, working or volunteering, walking or exercising, engaging in health-related activities other than walking/exercising, travelling or commuting, spending time with family or friends and spending time at home alone) and asks participants follow-up questions only for these. Some research has also used the same concept but has focused on specific events, creating so-called ‘event reconstruction studies’ (Grube et al., Reference Grube, Schroer, Hentzschel and Hertel2008). For these studies, it is essential to invite only participants who had the event under investigation in the recent past.
Limitations of the DRM as a tool for decision-making research
While we have suggested that the DRM is a valuable tool for obtaining detailed information on decisions made in everyday life in order to inform behavioural public policy-making, it is important to be clear about the limitations of the method in these contexts. In particular, the DRM is not suited to investigating possible factors behind low-frequency decisions. For example, when interested in whether a situational factor influences the decision to buy a car, which mortgage to get or whom to marry, DRM data will not be useful, as they will likely miss the right moment. But even when analysing high-frequency decisions, some limitations need to be acknowledged before using DRM data for policy purposes. Some of the limitations are the same as in other studies that rely on self-reports (e.g., the validity, reliability and sensitivity of measurement instruments, dishonest reporting, social desirability bias, norms, self-image considerations and reactivity to assessment procedures), but others are more DRM-specific. The National Research Council (US) (2012) discusses question-order effects, scale effects and survey-mode effects.
The most obvious DRM-specific limitation is that it requires a reliance on participants’ memory. Memories are subject to a number of biases, such as a reliance on routine to infer yesterday's likely activities, the peak-end bias in memory and overstating of previous emotions and preferences (Diener & Tay, Reference Diener and Tay2014). Moreover, the mood during the completion of the questionnaire can influence the revivification of yesterday and influence how participants recall their past day (Schwarz & Clore, Reference Schwarz and Clore1983). Another important aspect to consider when looking at the reliability of participants’ memories is the structure of these memories. People rely on mental scripts and episodes in their memory to retrace their steps – and they are comfortable with a certain level of detail (e.g., “I had breakfast,” not “I had a meal” or “I had a continental breakfast”) (Tourangeau et al., Reference Tourangeau, Rips and Rasinski2000). Deviating from this preferred level of detail could mean a deterioration in the quality of the data being collected. Finally, individual differences in people's memories should also be considered – age and health, in particular, are likely to play a role. That said, the DRM was explicitly designed to minimize memory bias, and evidence from studies comparing the outcomes of the DRM with experience sampling measures in real time suggests that the DRM largely achieves its goals of assessing people's episodic feelings and experiences without being distorted by memory and other biases (Dockray et al., Reference Dockray, Grant, Stone, Kahneman, Wardle and Steptoe2010; Sonnenberg et al., Reference Sonnenberg, Riediger, Wrzus and Wagner2012; Kim et al., Reference Kim, Kikuchi and Yamamoto2013).
Another potential issue relates to the episode as the unit of measurement. A single episode might contain various feelings and decisions, and it is not known whether single responses can represent a full episode that might last up to several hours (Diener & Tay, Reference Diener and Tay2014). Moreover, differential responding patterns of subgroups can lead to misleading conclusions about actual experiences. For example, if older people were more open to acknowledge a bias than younger people, or if men were more likely to underreport socially inappropriate behaviours than women, differences in DRM data would not reflect real differences in people's everyday lives. This would raise doubts about the comparability across subgroups of the population.
Finally, naturalistic monitoring does not allow for much control over the study environment compared to laboratory studies or RCTs. This means that a DRM study might be hampered by more threats to internal validity: confounding factors affecting outcomes are more difficult to avoid, and this may threaten causal inferences and external validity (e.g., Jiménez-Buedo, Reference Jiménez-Buedo2011). Where spurious relationships cannot be ruled out, rival hypotheses to the original causal inference hypothesis of the researcher may be developed. However, this is precisely where the features of the DRM can shine, through follow-up questions about episodes that allow for in-depth probing in order to map out potential factors, identify causal mechanisms and identify alternative relationships.
Conclusion: evaluating the effects of behavioural policies in everyday life
Behavioural economics shows that people are boundedly rational, have limited willpower and do not always act in ways economics textbooks would suggest (Dhami, Reference Dhami2016). These deviations from rationality and dynamic consistency are often systematic and predictable, as shown repeatedly in laboratory experiments (Camerer et al., Reference Camerer, Loewenstein and Rabin2004). Such findings have started to change economic theory (Rabin, Reference Rabin1998) and have substantially reformed policy-making worldwide, particularly in the UK, through the foundation of behavioural insights teams and nudge units (Jolls et al., Reference Jolls, Sunstein and Thaler1998; Thaler & Sunstein, Reference Thaler and Sunstein2008; Halpern, Reference Halpern2015). The evidence on which many of these behavioural interventions rely often comes from laboratory environments (which often put study participants into rather artificial decision situations) and RCTs (which provide information about what works, but not about the underlying mechanisms).
This paper discussed how a popular naturalistic measurement tool, the Day Reconstruction Method (DRM), can be used to inform behavioural public policies by providing mechanistic evidence on how people make decisions in the real world. We suggest that the DRM is a valuable addition to the behavioural scientist's toolbox and can complement ordinary surveys, observational data, laboratory experiments and RCTs. The key benefit of the DRM, which sets it apart from alternative approaches, is that it allows for measuring decision-making in naturalistic, everyday contexts. It is thus a method that helps to quantify the extent to which behavioural biases change our behaviour in the real world and to identify where, when and why decision-making biases occur. The DRM can show, for example, whether there are correlations between biases and simultaneous situational factors such as location, activity, social interaction partner and internal state. Measuring everyday contexts and their effects on decision-making can also help to design better behavioural policies that change the choice architecture in order to nudge people to make better decisions. Such behavioural public policy interventions should be informed by domain-specific naturalistic monitoring studies in which detailed information about a particular type of phenomenon is elicited and where domain-specific context variables can be identified.
For future research, there are several potential applications of the DRM in behavioural science and behavioural public policy. The method can be used to measure the prevalence of almost any behavioural concept in every domain of life. It can measure, for example, in which real-life situations people are particularly risk or loss averse, or similarly examine the influence of everyday anchors, defaults and social norms or identities on everyday economic behaviour. The key challenge for these future studies is to design survey questions that are as similar as possible to the concepts usually identified in decision-making experiments. For several concepts (e.g., risk aversion), verbal survey questions that measure individual differences have already been designed (Weber et al., Reference Weber, Blais and Betz2002). Future research can adapt these questions to relate them to intra-individual changes that can differ across situations in everyday life. Such studies will then be able to quantify how prevalent behavioural biases are, and also explore in what contexts these biases are particularly likely to arise.
A key challenge for future research is to integrate DRM studies in causal designs. Since the DRM is a survey that can be completed in one sitting, it can be easily added to existing RCTs. DRM studies also lend themselves well to evaluating large policies where DRM data from before the policy implementation can be compared to DRM data gathered after the implementation. Another branch of future research should deal with methodological issues. For example, different versions of the DRM (full versus abbreviated, online versus analogue, one day versus multiple days, different reinstantiation procedures, etc.) should be compared in order to identify the effects that design choices have on participants’ response patterns. It will also be important to further test the reliability, validity and accuracy of DRM data by comparing it with experience sampling data. Cognitive testing in interviews and focus groups should be conducted to make sure that the question wording used does not confuse the participants. Moreover, filling out the DRM itself can change behaviour, and the potential of the DRM as a behavioural intervention should be explored. If we better understand these methodological issues, DRM studies measuring behavioural concepts could be integrated into existing large-scale, nationally representative time use surveys.Footnote 5 This strategy would help us to gain a better understanding as to how individuals from different sub-populations differ in terms of the decisions they make in their everyday lives.
Leonhard Lades has been supported by a grant from the Irish Environmental Protection Agency (project name: Enabling Transition; project number: 2017-CCRP-FS.32).