## 1 Introduction

Risk preferences influence a series of economic and social outcomes and behaviors. For example, human capital investment and career decisions (Reference WeissWeiss, 1972, Reference Snow and WarrenSnowWarren Jr, 1990), technology adoption and producer behaviour (Reference Chavas and HoltChavasHolt, 1996, Reference LiuLiu, 2013) and the use of financial instruments (Reference Jacobson and PetrieJacobsonPetrie, 2009). Measuring risk preferences has been restricted to lab research, which is often reliant on financially incentivizing research subjects. With the growth of field experiments in development economics, it is unclear if this approach can be scaled up in the field. Risk preference modules can be time-consuming and costly. Paying incentives may fall outside researchers’ budgets or be logistically challenging in the field. This study shows that a shorter version of a commonly used measure — (Reference Holt and LauryHoltLaury, 2002) — and not paying monetary incentives, produce similar risk profiles to those of the more standard and expensive approach where real monetary incentives are used.

The measure of (Reference Holt and LauryHoltLaury, 2002), hereafter HL, has been widely used to estimate risk parameters of utility functions. This allows researchers to profile individuals based on their risk preference, to study correlates of risk and choices, and to test theories on decision making. Other rigorous measures of risk such as the Balloon Analogue Risk Task (BART) and the Domain-Specific-Risk-Taking (DOSPERT) have been used to test correlation with risky behaviors such as addiction, safety and health-related behaviours (Reference LejuezLejuez et al., 2002, Reference Blais and WeberBlaisWeber, 2006) among other domains.Footnote ^{1} We focus here in the HL measure because we can compare our results to a wide range of studies using this measure in the lab.

A common practice among researchers when eliciting risk preferences in the lab is to use monetary incentives (non-hypothetical). However, in the field, the use of such incentives is less common. In particular, when data collection involves thousands of observations, as in national household surveys or impact evaluation data, researchers tend to use measures without monetary incentives (hypothetical).

The use of incentive compatible devices to elicit risk preferences entails two important implications: *i*) subjects make real choices with monetary consequences (hence, risk preferences are not self-reported as in hypothetical measures) and *ii*) final earnings depend on their choices and chance. Subjects are aware of what they can earn and in which circumstances using different devices, as suggested by (Reference Holt and LauryHoltLaury, 2002, Reference Gneezy and PottersGneezyPotters, 1997, Reference Eckel and GrossmanEckelGrossman, 2008, Reference Charness and ViceiszaCharnessViceisza, 2016) and (Reference Crosetto and FilippinCrosettoFilippin, 2013).

In the field, researchers must consider various practical issues. First, the use of money or gifts may put at risk the researchers. If the experiment is run in highly deprived areas where carrying any type of reward can put at risk the safety of researchers.Footnote ^{2} Second, the use of lotteries creates inequalities among subjects, at least in the short-run, between winners and losers — and this may create tensions in the communities. These inequalities may also trigger frictions between subjects and researchers.Footnote ^{3}

The use of monetary incentives — the gold standard in lab experimental economics — instead of hypothetical choices as in observational studies (and many impact evaluations) is a matter of interest for empirical researchers. Two arguments for incentives are often made: *i*) when payoffs are hypothetical and subjects do not risk their own money, they are more likely to be more risk loving and *ii*) in the absence of monetary incentives, subjects do not put enough effort on the task and consequently, they make random choices.Footnote ^{4} While economists often presume that subjects will not work harder if they do not earn money, psychologists tend to assume that most subjects are intrinsically motivated and therefore will exercise steady effort even in the absence of rewards (Reference Camerer and HogarthCamererHogarth, 1999).

However, paying in the lab and field may also adversely affect responses and behaviors. According to (Reference Harrison and ListHarrisonList, 2004), the nature of the stakes can affect field and lab responses differently. Subjects may adjust their behaviour based on the stake level. In the field, when using surveys as part of data collection for impact evaluation studies, stakes may affect not only the reliability of the measure itself, but they may also induce Hawthorne effects that can affect the identification of treatment effects. This might be the case if the subject believes that certain answers will increase the chances of getting higher stakes in subsequent data collections. (Reference Levitt and ListLevittList, 2007) also warns about inaccurate inference that may emerge if the analyst does not account for the differences in stakes across settings.Footnote ^{5}

While the question about the effect of incentives on risk elicitation is not new in the lab, the evidence is mixed and scarce about their effect in the field. *In the lab*, (Reference Wiseman and LevinWisemanLevin, 1996) found that student subjects made the same risky decisions under real and hypothetical consequences. In the same line, (Reference Kühberger, Schulte-Mecklenbeck and PernerKühberger et al., 2002) found that hypothetical choices match real choices for small as well as for large payoffs. Conversely, (Reference Holt and LauryHoltLaury, 2002, Reference Holt and LauryHoltLaury, 2005) found that increasing the size of real payoffs leads to more risk averse behavior than hypothetical payments. (Reference Etchart-Vincent and l’HaridonEtchart-Vincentl’Haridon, 2011) and (Reference Barreda-Tarrazona, Jaramillo-Gutiérrez, Navarro-Martínez and Sabater-GrandeBarreda-Tarrazona et al., 2011) found that real and hypothetical choices differ in the gain domain. Overall, when looking into different types of economic experiments, the evidence suggests that monetary rewards matter when effort or performance response to such incentives; (Reference Camerer and HogarthCamererHogarth, 1999) provide a detailed review of 74 experiments. According to this review, some tasks that seem to affect performance involve recall, judgement on probabilities and clerical tasks. Conversely, tasks related to trading in markets, bargaining in games and choosing risky gambles were not found sensitive to such incentives.

*In the field*, using a sample of Rwandan adults who were subject to financial decisions, (Reference Jacobson and PetrieJacobsonPetrie, 2009) found that hypothetical and real choices resulted in the same risk preference profiles. When looking into low and high stakes, (Reference Bellemare and ShearerBellemareShearer, 2010) found no differences in risk preferences when comparing both types of payments among tree-planters in Canada. However, using a sample of farmers, the opposite results are found. In Senegal, (Reference Charness and ViceiszaCharnessViceisza, 2016) compared hypothetical questions with incentivized measures. They found that women were more likely to report willingness to take risks under the hypothetical measure; and when increasing the level of the incentives in India, (Reference BinswangerBinswanger, 1980) found that risk aversion tended to increase. The need for additional evidence on the elicitation of risk preferences in the field is highlighted by (Reference Levitt and ListLevittList, 2007) when they argue that the choices made by individuals depend not only on financial implications, but also on the nature and degree of others scrutiny, the context in which a decision is embedded, and the manner in which subjects are selected to participate.

The present study aims to answer the following question: Can we elicit risk preferences in the field without using monetary incentives? Our study offers a simple and cheap method to elicit such preferences in the field which we tested in three countries. To our knowledge, this is the first study using the same method of elicitation in both lab and field. Our lab exercise took place in Spain and this was replicated in the field in two developing countries: Honduras and Nigeria. Using our data from these three countries, we compare three different payment schemes widely used in economics and psychology.

We test to what extent the use of hypothetical payoffs has an impact on the measurement of risk preferences. We use the common measure of (Reference Holt and LauryHoltLaury, 2002) as this is often used to estimate parameters relevant to risk attitude. Our study has two important features. *First*, we ran a lab experiment in Spain (April, 2019) where a reduced version of the (Reference Holt and LauryHoltLaury, 2002) task with only 5 choices (hereafter HL5, which is half of the 10-choice list) was introduced to reduce the number of choices and time needed. Subjects were randomly assigned to three treatment arms with probability 1/3. Each treatment differs from the other in payment schemes: real payment, paying 1 out of 10 subjects (BRIS)Footnote ^{6} and no payment at all (hypothetical).

*Second*, we bring our reduced form of the Holt-Laury test to the field. We ran field experiments in two low-middle income countries, Nigeria and Honduras. The former took advantage of a large scale randomized control trial of an educational intervention that targeted parents of 6–9 year-old children. To make our studies as comparable as possible, the study in Honduras used the same target population (i.e., parents of 6–9 year olds) and the same survey instrument used in Nigeria. The main difference was that while the Nigeria study was conducted in a rural setting in Kano, in northern Nigeria; the Honduras study was conducted in Copán, in an urban and peri-urban setting. Eliciting risk preferences in developing countries is relevant as they are the main target of development agencies to promote long-term investments (e.g., human capital, technology adoption, saving) and for which risk preferences are a key determinant of such outcomes. For these countries, we tested whether paying or not to the subjects produces different profiles of risk preferences. As in the lab experiment, all subjects were randomly assigned (*p*=1/3) to a real payment, payment with probability 1 out of 10 and no payment at all. To compare risk preferences across treatment arms, we contrasted three types of dimensions of the measure: number of inconsistent subjects (and consistency *per* individual), number of safe options and response time.

We find that hypothetical and probabilistic payments have no impact on consistency, number of safe choices or response time. This suggests that when field researchers elicit risk preferences using hypothetical measures, they can trust that these are homogeneous across payment schemes (i.e., real, probabilistic and hypothetical) and that this can be done faster (and therefore cheaper) by using a short version of the Holt-Laury approach. However, paying only a fraction of the sample (probabilistic payment) seemed to introduce noise in the elicitation of risk preference in the lab: subjects became more risk loving when only a fraction of them is paid than when all of them are paid.

Our study was partially run to inform data collection of a large-scale impact evaluation that took place in Northern Nigeria. If our findings would have shown differences between hypothetical and non-hypothetical measures, this would have implied the support for the use of incentives in the field. For this experiment, the cost per subject when eliciting our measures was equivalent to 2.75 US dollars. For the impact evaluation data collection, this would have implied a cost of approximately 25.5 thousand US dollars. Our findings on the comparability of risk preference profiles across payment schemes support the use of hypothetical measures for subsequent analysis.

We now proceed to describe our experimental design, as well as the details of the three experiments encompassing this study. Our last section concludes and discusses some recommendations for future data collection. All studies were pre-registered at AsPredicted: lab experiment, April 10, 2019; field experiments, April 25.

## 2 Experimental Design

When eliciting risk preferences in the field, observational and experimental studies commonly face the pressing issue of the time spent by enumerators when visiting households. This issue becomes more salient when subjects’ attention and time are limited as a result of domestic or job-related chores. To address this issue, we propose a short version of the measure suggested by (Reference Holt and LauryHoltLaury, 2002) using 5 choices (HL5) instead of 10 (HL), which allows the elicitation of risk preferences in less time (Table 1).Footnote ^{7} We decided to elicit 5 choices, over other possibilities, to keep the extreme choices as in the standard HL (expected value: *q*=0.1 × High + 1−*q*=0.9 × Low) and the complementary case (*q*=0.9 × High + 1−*q*=0.1 × Low). Our selection of middle points (*q*=0.4, 0.5, 0.6; 1−*q*=0.6, 0.5, 0.4, respectively) was based on previous evidence suggesting that the majority of subjects switch between options at these points.

The use of a short (trimmed) version of the HL might have a direct impact on the elicitation of risk preferences; for instance, under the short version subjects are more likely to be consistent (never choosing A after B has been chosen once, reading down Table 1) than under the long version. However, our main focus in this study is to compare the differences in risk preference profiles across payment schemes.Footnote ^{8} Hence, we use in Study 1, 2 and 3 the HL5 measure. As in HL, every subject is asked to choose between two lotteries A and B. Both A and B offer a low and high payment with probability *q* and (1−*q*).

Note (Figure 1) that the first choice in Table 1 is “trivial” since the expected value of A is much higher than B (82N vs. 21.8N) while the opposite happens in decision 5 (98N vs. 180.2N). In decisions 2, 3 and 4 the expected values of A and B are much closer (88N vs. 81.2N, 90N vs. 101N, 92N vs. 120.8N, respectively) therefore subjects preferences for risk — rather than expected value — might play a critical role in the choice between A and B. Under risk neutrality, subjects should select: *A*, *A*, *B*, *B*, *B*. Under consistency, *A*, *B*, *B*, *B*, *B* implies risk loving and *B*, *B*, *B*, *B*, *B* strong risk loving behaviour. On the contrary, *A*, *A*, *A*, *B*, *B* is indicative of slight risk aversion, *A*, *A*, *A*, *A*, *B* risk aversion and *A*, *A*, *A*, *A*, *A* reflects strong risk aversion. This means that only very risk loving individuals would choose option B in the first decision and very risk-averse would choose A in the last decision.

### 2.1 Dimensions of Risk Preferences

In Studies 1, 2 and 3, we compare five dimensions of the elicitation of risk preference under the three payment schemes. The first one is *consistency*, *C* _{i}, that is, whether subjects make consistent choices. A typical example of inconsistency is multiple switching from A to B and then from B to A. Any switch from B to A is indeed another example of inconsistency (see (Reference Amador-Hidalgo, Brañas-Garza, Espın, Garcıa-Muñoz and Hernández-RománAmador-Hidalgo et al., 2021) for a discussion). The second dimension is the *number of safe choices* or risk aversion, *RA* _{i} (A instead of B) under consistency. We also explore the number of safes choices for all subjects regardless of consistency, *RA* _{i}(*all*). In the fourth dimension we compute *response time* (*RT* _{i}), that is, the duration of the task. This dimension was measured only in Study 2 and 3. Finally, we compute a Goodman and Kruskal’s γ for each subject as a measure of the consistency observed within the sequence of choices.Footnote ^{9} Here we summarize our dimensions:

*C* _{i} Consistency: whether the subject makes *consistent* choices (=1) or not (=0).

*RA* _{i} Risk aversion: number of safe choices (A) for *consistent* subjects only, taking values from 0 to 5, where 5 refers to extremely risk averse.

*RA* _{i} (all) Risk aversion: number of safe choices (A) for *all* subjects, taking values from 0 to 5, where 5 refers to extremely risk averse.

*RT* _{i} Time: number of seconds spent by the subject to complete the task.

γ_{i} Consistency *per* subject: number of choice pairs which are consistent over the total number of possible pairs, taking values from 0 to 1, where 1 refers to perfect consistency.

### 2.2 Experimental Arms and Implementation

Each of Studies 1, 2 and 3 has three experimental arms with probability 1/3. The main difference across arms was the probability of receiving a payment with probability *p*:

Under *T* _{R}, all subjects are certain about receiving a payment, whereas under *T* _{H} subjects are certain of receiving no payment. For each study, the entire sample was randomly assigned to one out of three treatment arms: *T* _{R} refers to real payments, *T* _{B} to 1 out of 10 subjects gets paid and *T* _{H} to hypothetical payments. Subjects were fully aware of their payment scheme before the elicitation of their preferences.

Study 1 was based on a self-administered questionnaire where subjects were invited to a computer lab. Whereas Study 2 and 3 were conducted in the field by enumerators who were trained by the authors. Using enumerators implied that subjects did not self-manage the instructions. These were read and explained by the enumerator. Enumerators used computer-assisted personal interviewing (CAPI) questionnaires in Nigeria and paper-based interviewing in Honduras. In both cases, enumerators received a list of households they had to visit, including the type of questionnaire they had to apply (i.e., *T* _{R}, *T* _{B} or *T* _{H} questionnaire). The authors conducted the random allocation of treatments prior to the visit to the communities and the enumerators did not have any influence on such selection. For an additional check, a field coordinator monitored the correct use of the lists created by the researchers in the ground.

Prior to running both experiments, the authors piloted the risk preference questionnaire with around 20 subjects to ensure the translations in Hausa (Nigeria) and Spanish (Honduras) were appropriate to the context. All questionnaires and instructions were originally written in English. Data collection for both studies involved a one-day training of enumerators. Training involved in all cases a theoretical explanation of risk preferences and enumerator-subject role play to help enumerators to understand the questions. For both field experiments, enumerators conducted all face-to-face interviews in the households of the subjects. Only one experimental subject was interviewed per household.Footnote ^{10}

## 3 Study 1 — Lab in Spain

This lab experiment was run in the School of Economics and Business at the University of Sevilla, the largest public university in Andalusia (Southern Spain). The experiment used paper-based questionnaires and was conducted the last week of April 2019.

In most lab experiments, subjects are university students who are self-selected into the experiment and hold high levels of education (see (Reference Exadaktylos, Espın and Brañas GarzaExadaktylos et al., 2013)). The main advantage of lab experiments is the absolute control of the conditions faced by the subject: a) subjects cannot interact among them, unless interaction is required by the experiment; and b) there are no external distractions. For these reasons, a lab experiment is the cleanest approach to test our main research question: *do monetary incentives make a difference in the elicitation of risk preferences?*

The HL5 used in the lab is identical to the MPL shown in section 2 in the example. We just adjusted the monetary values to be meaningful to Spanish subjects. Thus, for Study 1, lottery A offers 5 euros with *q* probability and 4 euros with (1−*q*); lottery B offers 10 and 0.1 euros with probabilities *q* and (1−*q*), respectively.

### 3.1 Sample and balance

The entire sample consisted of *n*=178 Spanish subjectsFootnote ^{11}. Each subject was randomly assigned to one out of three payment schemes (*T* _{R}: 60, *T* _{B}: 57, *T* _{H}: 62). Table 2 shows the balance between sub-samples. Table 2 shows no differences across treatments regarding age, gender, cognitive reflection test (CRT) and courses (freshmen, sophomore, etc).

Note: H refers to hypothetical, R real payment and B probabilistic payment with 1 out of 10 chance of winning (BRIS).

### 3.2 Results

Table 3 shows the impact of payment schemes on the risk preference measure. The top of the table focuses on consistency (*C* _{i}) where *n* _{c} corresponds to the number of consistent subjects. The second row shows the number of safe choices (*RA* _{i}) under consistency and the third one *RA* _{i} (*all*) the number of safe choices regardless of the consistency of the subjects’ choices. The last row focuses on the gamma consistency measure (γ_{i}).

Note: Column 4 and 5 display LPM (*C* _{i}), negative binomial coefficients (*RA* _{i} & *RA* _{i} (*all*)) and OLS coefficients (γ_{i}) for each outcome variable (row) and for the Real and BRIS treatments (column). Our reference group is the hypothetical treatment (*T* _{H}). In *RA* _{i} (*all*) we use all subjects, while in *RA* _{i} only the consistent ones.

Columns 1–3 show different statistics for each treatment. The percentage of subjects making consistent choices is shown in row 1. These percentages are not significantly different between treatments. Row 2 and 3 show the mean and the standard deviation of the number of safe choices. Again, we do not observe significant differences either.

Finally, columns 4 and 5 provide the regression coefficients of each treatment for the two types of outcome variables: consistency and risk aversion responses. For *C* _{i}, we estimate a linear probability model (LPM), for both *RA* _{i} measures a negative binomial regression model and for γ_{i} an OLS.Footnote ^{12} _{T B} refers to the comparison between BRIS and Hypothetical payoffs, whereas _{T R} to the comparison between Real and Hypothetical payoffs.

Column 5 in Table 3 shows that risk preferences measured under real payments do not differ from those under hypothetical payment. This is true for the number of consistent individuals (*C* _{i}), the level of risk aversion (*RA* _{i} and *RA* _{i} (*all*)) and the gamma consistency measure (γ_{i}).Footnote ^{13}

The cumulative distributions of the number of safe choices by treatments shown in Figure 2 confirms the results found on the level of risk aversion (*RA* _{i}) for treatment *T* _{R}, but not for *T* _{R}. While there is no difference between *T* _{H} and *T* _{R} (black and blue lines, respectively), the cumulative distribution of *T* _{B} is above these two. The latter suggests that paying 1 out of 10 makes people choose less safe choices than the other two payment schemes. We performed a Kolmogorov-Smirnov test for equality of distributions between the hypothetical and each of the incentivized schemes. The results confirm that there are no statistical differences between the distribution of safe choices under the Hypothetical and Real treatments; however, there are for the BRIS treatment when compared with the hypothetical and real payment scheme (*T* _{H} vs. *T* _{R}, *p*=0.80; *T* _{H} vs. *T* _{B}, *p*=0.03; *T* _{R} vs. *T* _{B}, *p*=0.02). The same conclusion is found for *RA* _{i} (*all*). Since *T* _{B} involves two lotteries (to be selected for payment and the HL), a possible explanation might be that subjects consider that both lotteries are correlated.Footnote ^{14}

One particular feature highlighted by (Reference Camerer and HogarthCamererHogarth, 1999) is higher variance in choices or in task performance when using hypothetical measures. To study this, we performed a series of variance ratio tests. In all of them, we do not reject the null hypothesis of equal variance of our dimensions across treatments: *RA* _{i} (*T* _{H} vs. *T* _{R}, *p*=0.18; *T* _{H} vs. *T* _{B}, *p*=0.41; *T* _{R} vs. *T* _{B}, *p*=0.13). The same conclusion is found for *RA* _{i}(*all*) (*T* _{H} vs. *T* _{R}, *p*=0.29; *T* _{H} vs. *T* _{B}, *p*=0.41; *T* _{R} vs. *T* _{B}, *p*=0.22).Footnote ^{15}

To summarize our main findings for Study 1:

#### Result 1 — Lab (H vs. R):

*Compared to hypothetical payments in the lab: paying all the subjects has no impact on consistency or number of safe choices.*

#### Result 2 — Lab (H vs. B):

*Compared to hypothetical payments in the lab: paying 1/10 of the subjects has no impact on consistency, but decreases the level of risk aversion.*

Study 1 analyses risky decision-making in the lab. We address whether the payment scheme impacts the measurement of risk preferences.

Result 1 shows that paying all the subjects or none have no significant effect on consistency or number of safe choices. This is a critical issue for field data collection since samples (and therefore costs) are substantial when household visits are required.

Result 2, on the other hand, shows that BRIS payments may affect risk taking (although do not alter inconsistency).

Overall, Study 1 served to test whether incentives are important (and apparently are not). In the following two sections, we will test Result 1 and 2 in the field — Nigeria and Honduras — using the short version of Holt-Laury.

## 4 Study 2 — Field in Nigeria

This study was conducted in Kano (Nigeria) in April 2019. We ran the field experiment in three rural villages in Kano: Dorayi, Ja’en and Gidan Maharba, where 360 households were randomly selected. Our sample in each village was selected according to the eligibility criterion of having at least one child between 6 and 9 years old. We selected households with these characteristics as this experiment was used to inform the elicitation of risk preferences for a large-scale impact evaluation that took place in rural areas in Northern Nigeria.

The experiment was composed of four modules: social norms (coordination games), subjective expectations, time and risk preferences. The HL5 was the last module we elicited. Random assignment to *T* _{R}, *T* _{B} or *T* _{H} treatments remained the same throughout the entire experiment. Response time (*RT* _{i}) was collected using CAPI for all four modules.

To elicit risk preferences, we used the same HL5 as in the lab experiment. We adapted the currency to Naira. On average, subjects earned 126 Naira (equivalent to 0.33 US dollars), see example in subsection 2.Footnote ^{16}

### 4.1 Sample and balance

Sample size is *n*=360 (by treatments, *T* _{R}: 120, *T* _{B}: 124, *T* _{H}: 116). In this sample, 25% has no education, 8% has completed primary education and 32% has completed the secondary education. Around 40% of the sample is female and the average age is 39 years old. This sample is quite different from the sample used in the lab, but reflects common characteristics of the populations studied in Development Economics, which are usually the target of interventions aiming at increasing savings and human capital investment.

Table 4 shows the balance across treatments. We observe significant differences in age, where subjects in the real and BRIS treatment (*T* _{R} and *T* _{B}) have 2.8 years more than those in the hypothetical treatment (*T* _{H}). However, these few differences are no longer significant after adjusting p-values using Bonferroni corrections.

Note: H refers to hypothetical, R real payment and B probabilistic payment with 1 out of 10 chance of winning (BRIS). Significance levels:

^{***} *p*<0.01

and

^{**} *p*<0.05.

^{+} Sufficient^{+} refers to sufficient money to eat.

### 4.2 Results

Table 5 reports a summary of results. The top of the table shows consistency (*C* _{i}), where *n* _{c} and % correspond to the number and percentage of consistent subjects, respectively. The second and third lines focus on the number of safe choices or risk aversion (*RA* _{i} and *RA* _{i} (*all*)), the fourth one reports the response time (*RT* _{i}) and the last row focuses on the gamma consistency measure (γ_{i}). The number of observations per treatment appears at the bottom of the table. Columns 1, 2 and 3 report consistency, the number of safe choices, response time and gamma measure by experimental arms *T* _{R}, *T* _{B} and *T* _{H}. The last columns show the regression coefficients of outcome variables on the treatment variables *T* _{B} and *T* _{R}, having as reference group the hypothetical measure.

Note: Column 4 and 5 display LPM (*C* _{i}), negative binomial (*RA* _{i} & *RA* _{i} (*all*)) and OLS coefficients (*RT* _{i} in logs and γ_{i}) for each outcome variable (row) and for the Real and BRIS treatments (column). Our reference group is the hypothetical treatment (*T* _{H}). The time variable is expressed in logs.

As shown in Table 5 the fraction of consistent individuals, the mean of safe choices and the response time are nearly the same across treatments, which implies that there are no significant differences. To test differences in coefficients across treatments, as in Study 1, we ran LPM for *C* _{i} and a negative binomial regression for *RA* _{i} and *RA* _{i} (*all*). For *RT* _{i} and γ_{i}, we ran an OLS. The last two columns show the results of these regressions: column (4) focuses on the BRIS treatment and column (5) on the real case. The reference group is the hypothetical payment in both cases.

For *C* _{i} (first row), we do not find significant differences in the chances of making consistent choices between treatments. The estimated impact of paying 1/10 (vs. hypothetical payment) is not significant (*p*=0.227), the same finding we observe when comparing the real payment with the hypothetical (*p*=0.957). Our results are robust to different model specifications, see the supplement.

For *RA* _{i} and *RA* _{i} (*all*) (second and third rows), we do not find significant differences in the number of safe choices either. The estimated coefficients reported in column (4) and (5) are not significantly different from zero (*RA* _{i}: *p*=0.303, *p*=0.832; *RA* _{i} (*all*): *p*=0.501, *p*=0.990).Footnote ^{17}. As in Study 1, we analyzed the variance of *RA* _{i} across treatments. We do not find significant differences between them (*T* _{H} vs. *T* _{R}, *p*=0.29; *T* _{H} vs. *T* _{B}, *p*=0.48; *T* _{R} vs. *T* _{B}, *p*=0.31). The same conclusion is found for when using all choices *RA* _{i} (*all*) (*T* _{H} vs. *T* _{R}, *p*=0.23; *T* _{H} vs. *T* _{B}, *p*=0.38; *T* _{R} vs. *T* _{B}, *p*=0.33).

For *RT* _{i}, subjects making incentivized choices are 4 seconds faster than those making hypothetical choices, but the difference is not significant. (The coefficient is expressed after transforming the logs). When comparing the hypothetical payment with 1/10, the mode of payment does not make any difference — subjects take the same time. And for the γ_{i} consistency measure (last row), we do not find any effect of *T* _{B} or *T* _{R} when compared with the hypothetical measure (*p*=0.337 and *p*=0.747)Footnote ^{18}.

Finally, Figure 3 shows the cumulative distributions of the number of safe choices *RA* _{i} for the three treatments. The three lines are similar to each other and they cross for some values of safe choices. A Kolmogorov-Smirnov test found no statistical differences between the distributions (*T* _{H} vs. *T* _{R}, *p*=0.99; *T* _{H} vs. *T* _{B}, *p*=0.43; *T* _{R} vs. *T* _{B}, *p*=0.99). The same conclusion is found for *RA* _{i} (*all*).

In this study, we find that the payment scheme has no impact on consistency, the number of safe choices an the response time. It also shows no difference in distributions between payment schemes.

To summarize our main findings for Study 2:

#### Result 3 — Nigeria (H vs. R):

*Compared to hypothetical payments in the field: paying all subjects has no impact on consistency, number of safe choices or response time.*

#### Result 4 — Nigeria (H vs. B):

*Compared to hypothetical payments in the field: Paying 1/10 of the subjects has no impact on consistency, number of safe choices or response time.*

Our findings in Study 2 show that the elicitation of risk preferences with monetary incentives in the field provides the same results as the elicitation without incentives.

## 5 Study 3 — Field in Honduras

We ran the same experiment in Honduras. Our aim with this sample was to test whether Results 3 and 4 replicate in a different location, having subjects with similar socioeconomic characteristics to the Nigerian sample, but living in peri-urban areas. Our goal responds to the increasing demand for replication and validation of experimental studies (Reference Banerjee, Chassang, Snowberg, Banerjee and DufloBanerjee et al., 2017, Reference Peters, Langbein and RobertsPeters et al., 2018, Reference Al-Ubaydli, List and SuskindAl-Ubaydli et al., 2017). Study 3 took place in Santa Rosa de Copán (Honduras) between May 1 and 14, 2019 in the districts of Osorio, El Carmen, Prado Alto and Santa Teresa. We selected 360 households based on the same eligibility criterion as the Nigerian sample: having a child between 6 and 9 years old.

As in Nigeria, the experiment consisted of four tasks (coordination, expectations, risk and time preferences). The main difference with the Nigerian experiment is that the risk preference task was the third one, instead of last one as in Study 2. The original random assignment of subjects to *T* _{R}, *T* _{B} or *T* _{H} remained the same for the entire experiment. *RT* _{i} was recorded by the enumerator for the entire block of risk preferences.

To elicit risk preferences, we used the same reduced HL task (HL5) as in Study 2 and Study 3. We adapted our experiment to the local currency (Lempiras). On average, subjects earned 18 Lempiras in the HL5; for the first set of choices, in lottery A we paid 50L or 40L (instead of 100N and 80N) and in lottery B 100L or 1L (instead of 200N and 2N). We followed the same structure of payments as described in see section 2.Footnote ^{19}

### 5.1 Sample and balance

The total sample consisted of 360 subjects. Each subject was randomly assigned to 1 out of 3 arms resulting the following distribution. *T* _{R}: 109, *T* _{B}: 126, *T* _{H}: 125. Table 6 shows the balance across treatments. We found significant differences in age, where the subjects allocated to the real payment (*T* _{R}) had 3.3 years less than those allocated to the hypothetical payment (*T* _{H}). However, this difference is not economically important (9% of the average age).

Note: H refers to hypothetical, R real payment and B probabilistic payment with 1 out of 10 chance of winning (BRIS). Inference was made using robust standard errors. Significance levels:

^{**} *p*<0.05.

^{+} Sufficient^{+} refers to sufficient money to eat.

### 5.2 Results

Table 7 summarizes the results. As in Study 2, the top of the table refers to consistency (*C* _{i}), the second and third lines to risk aversion (*RA* _{i} and *RA* _{i} (*all*)), the fourth to response time (*RT* _{i}) and the last row focuses on the gamma consistency measure (γ_{i}). The number of observations per treatment appears in the bottom of the table. Columns 1, 2 and 3 respectively show the relevant values for *T* _{R}, *T* _{B} and *T* _{H}. We use LPM, negative binomial and OLS regressions, as in Study 1 and Study 2. Column 4 focuses on the BRIS payment and column 5 on the real case, in both cases the reference group is the hypothetical payment. For consistency *C* _{i} (first row), both treatments *T* _{B} and *T* _{R} are not significant (*p*=0.605 and *p*=0.378, respectively). These results are robust to different specifications (see the supplement). Also, our payment schemes have no impact on risk aversion. However, we found a marginally significant effect of both treatments on response time.

Note: Column 4 and 5 display LPM (*C* _{i}), negative binomial (*RA* _{i} & *RA* _{i} (*all*)) and OLS coefficients (*RT* _{i} in logs and γ_{i}) for each outcome variable (row) and for the Real and BRIS treatments (column). Our reference group is the hypothetical treatment (*T* _{H}). The time variable is expressed in logs.

As before, Figure 4 shows the cumulative distributions for the three treatments and confirms the results found on the level of risk aversion (*RA* _{i}). While there is no difference between *T* _{R} and *T* _{B} lines, *T* _{H} line is below the other two lines. However, a Kolmogorov-Smirnov test found no significant differences between treatments in their cumulative distributions (*T* _{H} vs. *T* _{R}, *p*=0.80; *T* _{H} vs. *T* _{B}, *p*=0.99; *T* _{R} vs. *T* _{B}, *p*=0.98), and the same conclusion holds for *RA* _{i} (*all*). As in Study 1 and Study 2, our results suggests no difference in variance for *RA* _{i} (*T* _{H} vs. *T* _{R}, *p*=0.42; *T* _{H} vs. *T* _{B}, *p*=0.37; *T* _{R} vs. *T* _{B}, *p*=0.45) or *RA* _{i} (*all*) (*T* _{H} vs. *T* _{R}, *p*=0.48; *T* _{H} vs. *T* _{B}, *p*=0.47; *T* _{R} vs. *T* _{B}, *p*=0.48).

For *RT* _{i}, we find a positive and marginally significant effect of *T* _{B} and *T* _{R} on response time (*p*=0.08 and *p*=0.05). These results suggests that using hypothetical measures might be quicker to implement (or subjects pay less attention to it) than when using monetary incentives. Finally, for the γ_{i} consistency measure, we do not find significant differences between the incentivized measure (*T* _{B} and *T* _{R}) and the hypothetical one (*P*=0.724 and *p*=0.585, respectively). As before, these results are robust using the Kendall correlation between gamma and the categorical variable that distinguishes between treatments (*p*=0.46).

Our main findings from Study 3 show that the payment scheme has no impact on consistency or the number of safe options. We also find that there are no differences in the distributions of neither of these variables. However, our results show that paying all or 1 out of 10 subjects increase the time response by 20 and 15 seconds, respectively.

To summarize our main findings for Study 3:

#### Result 5 — Honduras (H vs. R):

*Result 3 is partially replicated in Honduras: Compared to hypothetical payments in the field, paying all subjects has no impact on consistency or number of safe choices. R (weakly) increases response time.*

#### Result 6 — Honduras (H vs. B):

*Result 4 is partially replicated in Honduras: Compared to hypothetical payments in the field, paying 1 out of 10 subjects has no impact on consistency or number of safe choices. B (weakly) increases response time.*

Our findings from Study 3 reinforce our conclusion from Study 2. The elicitation of risk preferences with monetary incentives in the field yields the same results as in the hypothetical scheme. When comparing our field results with previous studies, the percentage of risk averse subjects in Nigeria and Honduras, 65.7% and 29.8% respectively, falls within the range observed in other countries: 40% of risk averse subjects in Senegal (Reference Charness and ViceiszaCharnessViceisza, 2016) and 85% in India (Reference Chakravarty, Harrison, Haruvy and RutströmChakravarty et al., 2011). In the lab, our study finds that 74.3% of our subjects are risk averse. This percentage is similar to other lab experiments ranging from 64% to 84% (Reference Charness, Eckel, Gneezy and KajackaiteCharness et al., 2018, Reference TaylorTaylor, 2013).

To test whether other alternative specifications may provide different results, we estimated a Random Utility ModelFootnote ^{20} to analyse both Honduras and Nigeria data. Using the whole sample (or even restricting our analysis to only consistent subjects) our findings remain the same. We also applied this method to the lab data with identical results.

Our findings in Nigeria and Honduras are important for two reasons. *First*, existing hypothetical measures collected in the field are indeed informative. *Second*, future measurements of risk preferences in the field do not need to use monetary incentives to get accurate proxies of risk preferences.

## 6 Equivalence Tests

In this section we test whether our estimates are different across treatments comparing estimates within a range instead of with respect to a point estimate.

A possible alternative is to explore whether or not the observed effect is large enough to be deemed worthwhile. This procedure is called *equivalence* (Reference LakensLakens, 2017, Reference WellekWellek, 2010). It testes whether the observed effect falls within or outside of an equivalence interval, defined by two exogenous bounds: lower (−γ_{L}) and upper (γ_{U}). To test for equivalence, a two one-sided test (TOST) approach is applied. Two composite null hypotheses are tested: *H* _{01}→γ≤−γ_{L} and *H* _{02}→γ≥γ_{U}. When both null hypotheses are rejected, we can conclude that −γ_{L}<γ<γ_{U} or that the observed effect falls within the equivalence bounds and it is *close enough to zero to be practically equivalent* (Reference LakensLakens, 2017).

Our objective and exogenous bounds are defined based on (Reference Holt and LauryHoltLaury, 2005) where they found a difference of one in the average number of safe choices between real and hypothetical incentives. Hence, our equivalence level is equal to one where we define the equivalence interval for each TOST as *H* _{01}→γ≤−1 and *H* _{02}→ γ≥1. To check the robustness of our results, we also use two additional equivalence levels: 0.50 and 0.75.

To consider a thorough analysis of our main findings, we use both the null hypothesis significance test (NHST) and the equivalence test (ET).Footnote ^{21} Table 8 summarizes the results from the equivalence test. Panel A shows that in the lab, paying 1 out 10 subjects yield to *relevant* differences with respect to hypothetical decisions, while using real payments to *equivalent* results. In both field studies (panel B and C), equivalence tests suggest that both treatments (real and BRIS) yield equivalent results to the hypothetical payment scheme.

Note: P-value refers to the maximum p-value obtained from the two one-side test. Conclus. shows the conclusion considering both test NHST and ET. The baseline is the group making decisions with Hypothetical money (*T* _{H}).

To summarize the equivalence tests results:

### Result 7 — Equivalence (H vs. R):

*Hypothetical and real measures yield to equivalent results in both the lab and field.*

### Result 8 — Equivalence (H vs. B):

*BRIS and Hypothetical payment mechanisms yield to equivalent results* **only***in the field.*

## 7 Conclusion

This paper studies the impact of monetary incentives in the elicitation of risk preferences using (Reference Holt and LauryHoltLaury, 2002). We report on a lab and two field experiments with different subject pools. Our lab experiment (Study 1 — Spain) took place in a developed country using a pool of undergraduate students, the common population used in lab experiments. The second and third experiment (Study 2 and 3 — Nigeria and Honduras) were implemented in rural villages of a middle-income country and in peri-urban towns of a low-income country. In both cases using a subject pool with similar characteristics: parents with children between 6–9 years old living in deprived areas.

Our study answers a simple question: *Can we elicit risk preferences in the field without using monetary incentives to pay subjects?* We found in this study that this is possible. Lab experimentalists, who have widely used monetary incentives in the lab when eliciting such measures can trust the measures that their field counterparts collect, in most of the cases, using hypothetical payments; and field experimentalists can now trust that these measures are consistent across payment schemes (i.e., real, probabilistic and hypothetical) and that this can be done faster (and therefore cheaper) by using a short version of the Holt-Laury approach.

Our concrete findings are summarized in here. Using a short version of (Reference Holt and LauryHoltLaury, 2002) in the lab and field, produces the same risk preference profiles (in terms of consistency and safe choices) regardless of the payment scheme (hypothetical or real payment). In particular:

When comparing the HL5 measures, hypothetical (no payment) and real payment schemes generate the same consistency and risk levels, that is, they yield equivalent measures.

When comparing the HL5 measures, hypothetical (no payment) and BRIS scheme generate the same consistency and risk levels in the field, while in the lab the measurement of risk preferences display higher levels of risk loving attitudes under the probabilistic scheme.

Empirical researchers interested in the elicitation of risk preferences at a low cost, but also concerned about the potential negative consequences of the use of lotteries in the field, may want to consider our reduced version of the hypothetical HL to measure risk preferences without incentives. This approach is faster (and cheaper) and does not create any asymmetries among subjects. Field experimentalists who not only aim at reducing data collection costs, but also at minimising feelings of unfairness among experimental subjects, should consider hypothetical payment schemes to minimise potential frictions between subjects and researchers, or among winners and losers.