“Economic man” in cross-cultural perspective: Behavioral experiments in 15 small-scale societies
Published online by Cambridge University Press: 22 December 2005
Researchers from across the social sciences have found consistent deviations from the predictions of the canonical model of self-interest in hundreds of experiments from around the world. This research, however, cannot determine whether the uniformity results from universal patterns of human behavior or from the limited cultural variation available among the university students used in virtually all prior experimental work. To address this, we undertook a cross-cultural study of behavior in ultimatum, public goods, and dictator games in a range of small-scale societies exhibiting a wide variety of economic and cultural conditions. We found, first, that the canonical model – based on self-interest – fails in all of the societies studied. Second, our data reveal substantially more behavioral variability across social groups than has been found in previous research. Third, group-level differences in economic organization and the structure of social interactions explain a substantial portion of the behavioral variation across societies: the higher the degree of market integration and the higher the payoffs to cooperation in everyday life, the greater the level of prosociality expressed in experimental games. Fourth, the available individual-level economic and demographic variables do not consistently explain game behavior, either within or across groups. Fifth, in many cases experimental play appears to reflect the common interactional patterns of everyday life.
- Research Article
- Behavioral and Brain Sciences , Volume 28 , Issue 6 , December 2005 , pp. 795 - 815
- Copyright © Cambridge University Press 2005
1. We extend this axiom to cover cases in which individuals maximize the expected utility of their material gains to address the question of risk aversion, but use this simpler formulation otherwise.
2. Most of this group-level variation is not likely to be explained by differences in sample size between our efforts and those of laboratory experimentalists. First, our experiments used mostly sample sizes on a par with, or larger than, university-based experiments. The robust UG pattern that motivated us is based on numerous samples of 25 to 30 pairs. For example, Roth et al.'s (1991) four-country study used samples of 27, 29, 30, and 30 pairs. Comparably, the Machiguenga, Hadza, Mapuche, and Tsimane studies used 21, 55, 34, and 70 pairs. Overall, our mean sample size was 38, compared to 29 for Roth et al. Second, the regressions on UG offer shown below explain a substantial portion of the between- group variation (which is unlikely to arise via sample variation). Third, we compared this standard regression to a weighted regression (using 1√n as the weight) and found little difference in the results – which shows that the sample size variation is likely not having important effects. Fourth, we regressed sample size on the groups’ deviations from the overall mean (across groups) and found no significant relationship (ρ = 0.41).
3. The two-dimensional intervals were calculated using the following procedure: For a sample of η data points, we created a randomized “bootstrap” sample by sampling η times from the offer distribution with replacement. For each randomly sampled offer, we randomly sampled a rejection (e.g., if we sampled an offer of 40%, and two out of three 40% offers were rejected, we sampled whether an acceptance or rejection occurred with probability 2/ 3). This yielded a single “pseudosample” of η offers and an associated rejection profile of zeroes or ones for each offer. We then used the rejection profile to estimate an IMO (explained in the Appendix of Henrich et al. 2004). This single resampling produced a mean offer and IMO. This procedure was repeated 1,000 times. Each repetition generated a mean offer/IMO pair. The two-dimensional intervals drew an ellipse around the 900 pseudosamples (out of the 1000 samples, which were closest to the mean – that is, the smallest circle which included all 900 pseudo-sampled [mean offer, IMO] pairs). Small samples generate large confidence intervals because the means of pseudo-sample of η draws, made with replacement, can be quite different from the mean of the actual sample.
4. A simple measure of our confidence that the average offer is above the estimated IMO is the percentage of resampled points that lie below the 45-degree unity line (this is an exact numerical measure of “how much” of the ellipse crosses right and below the 45-degree line). These percentages are 13.7% (Pittsburgh), 0.0% (Achuar), 0.0% (Shona), 58.9% (Sangu farmers), 0.0% (Sangu herders), 1.5% (Mapuche), 1.2% (Machiguenga), 25.5% (Hadza), and 0.0% (Orma). (These figures do not match up perfectly with the visual impression from Figures 4a and 4b because the ellipses enclose the tightest cluster of 900 points, so the portion of an ellipsis that overlaps the line may actually contain no simulated observations, or may contain a higher density of simulated observations across the 45-degree line). Note that the only group for which this percentage is above half is the Sangu farmers. Even the Pittsburgh (student) offers, which are widely interpreted as consistent with expected income maximization (i.e., average offers are around the IMO; see Roth et al. 1991), are shown to be too high to be consistent with expected income maximization.
The ellipses are flat and elongated because we are much less confident about the true IMOs in each group than we are about the mean offers. This is a reflection of the fact that small statistical changes in the rejections lead to large differences in our estimates of the IMOs. Since rejections may be the tail that wags the dog of proposer offers, our low confidence in what the true IMOs are is a reminder that better methods are needed for measuring what people are likely to reject. The second phase of our project addressees this directly.
5. An individual for whom ρ < 1 is risk averse, ρ =1 is risk neutral, and ρ > 1 is risk preferring. We calculated the values of ρ for which the observed mean offer maximized the expected utility of the proposers, where the expectation is taken over all possible offers and the estimated likelihood of their being rejected. See the Appendix of Henrich et al. (2004) for details on this calculation.
6. Because the numbers of rejections were small, some of our estimates of risk aversion are imprecise. Accordingly, one concern is that more reasonable estimates of risk aversion might fit the data nearly as well as the best fit. To test for this possibility, we computed the difference between the best-fit value of r and 0.81, the value estimated by Tversky and Kahneman (1992) from laboratory data on risky decision making. The differences were small for some data sets and quite large for others. In addition, there is a positive but non-significant correlation between the deviation of observed behavior from the IMO and this measure of the precision of the r estimate. Therefore, it seems unlikely that risk aversion is an important explanation of our observations.
7. Among nonstudent adults in industrialized societies, DG offers are higher, with means between 40 and 50%, and modes at 50% (Carpenter et al. 2005; Henrich & Henrich in press, Ch. 8).
8. Since completing this project, our research team has decided to avoid any use of deception in future work. We also hope to set this as the standard for experimental work in anthropology.
9. Of course, some variations might matter a lot in some places but not in others. This kind of culture-method interaction is in itself an important kind of cultural variation.
10. It is important to distinguish between classes of games in assessing the impact of methodological variables. Many of the largest effects of methodological and contextual variables have been observed in dictator games (DGs) rather than in ultimatum games (UGs) (e.g., Camerer 2003, Ch. 2; Hoffman et al. 1998). This is not surprising since the DG is a “weak situation.” Absent a strong social norm or strategic forces constraining how much to give, methodological and contextual variables have a fighting chance to have a large impact. In contrast, UG offers are strategically constrained by the possibility of rejection; that is, a wide range of rejection frequency curves will lead to a narrow range of optimal offers. As a result, we should expect less empirical variation in UGs than in DGs. Therefore, one cannot simply say “context matters a lot” without referring to specific games.
11. Relative wealth was measured by the in-group percentile ranking of each individual, with the measure of individual wealth varying among groups: for the Orma and Mapuche we used the total cash value of livestock, while among the Au, Gnau, and Machiguenga we used total cash cropping land. In the UG, estimates of relative wealth were available only for seven groups.
12. The original MacArthur-funded proposal is available at http://www.hss.caltech.edu/roots-of-sociality/phase-i/.
13. Abigail Barr suggested this procedure.
14. Three exercises were performed to test robustness. First, because the sample sizes vary across groups by a factor of almost 10, it is possible that the results are disproportionately influenced by groups with small samples. To correct for this, we ran weighted least squares in which observations were weighted by 1/√1n. This gives univariate standardized coefficients of 0.61 (t = 3.80,ρ < 0.01) for PC and 0.41 (t = 2.28, ρ < 0.05) for MI, close to those from ordinary least squares in Table 5. Second, we reran the (univariate) regressions, switching every pair of adjacent expressed ranks in the variables PC and MI, one pair at a time. For example, the societies ranked 1 and 2 were artificially re-ranked 2 and 1, respectively, then the regression was re-estimated using the switched ranks. This comparison tells us how misleading our conclusions would be if the ranks were really 2 and 1 but were mistakenly switched. For PC, this procedure gave standardized univariate values of βPC ranging from 0.53 to 0.66, with t-statistics from 3.0–4.5 (all ρ < 0.01). For MI, the corresponding estimates range from 0.37–0.45, with t-statistics from 2.0–2.6 (all ρ < .05 one-tailed). These results mean that even if small mistakes were made in ranking groups on PC and MI, the same results are derived as if the mistakes had not been made. The third robustness check added quadratic and cubic terms (e.g., MI2 and MI3). This is an omnibus check for a misspecification in which the ordered ranks are mistakenly entered linearly, but identical numerical differences in ranks actually have larger and smaller effects (e.g., the difference between the impacts of rank 1 and rank 2 may be smaller than between 9 and 10, which can be captured by a quadratic function of the rank). The quadratic and cubic terms actually lower the adjusted R 2 dramatically for MI, and increase it only slightly (from 0.60 to 0.63) for PC, which indicates that squared and cubic terms add no predictive power.
15. This is true even for situations of η-person cooperation, if punishing strategies also exist (Boyd & Richerson 1992; Henrich & Boyd 2001).
16. Hoffman et al. (1994) reported similar effects of “social distance” and construal in the UG; for example, players offer less (and appear to accept less) when bargaining is described as a seller naming a take-it-or-leave-it price to a buyer, rather than as a simple sharing of money.
17. It is a common misconception that decision-making models rooted in the preferences, beliefs, and constraints approach are inconsistent with notions of evolved modularity and domainspecificity. Such models, however, are mute on this debate, and merely provide a tractable approach for describing how situational (e.g., payoff) information is integrated with coevolved motivations. This implies nothing about the cognitive architecture that infers, formulates, and/or biases beliefs and preferences, nor about what kinds of situations activate which human motivations. It is our view that the science of human behavior needs both proximate models that integrate and weight motivations and beliefs, and rich cognitive theories about how information is prioritized and processed.