Hostname: page-component-77c89778f8-m42fx Total loading time: 0 Render date: 2024-07-25T01:52:14.991Z Has data issue: false hasContentIssue false

Quantifying Bias from Measurable and Unmeasurable Confounders Across Three Domains of Individual Determinants of Political Preferences

Published online by Cambridge University Press:  22 February 2022

Rafael Ahlskog*
Department of Government, Uppsala University, Box 514, 75120 Uppsala, Sweden E-mail:
Sven Oskarsson
Department of Government, Uppsala University, Box 514, 75120 Uppsala, Sweden E-mail:
Corresponding author Rafael Ahlskog
Rights & Permissions [Opens in a new window]


A core part of political research is to identify how political preferences are shaped. The nature of these questions is such that robust causal identification is often difficult to achieve, and we are not seldom stuck with observational methods that we know have limited causal validity. The purpose of this paper is to measure the magnitude of bias stemming from both measurable and unmeasurable confounders across three broad domains of individual determinants of political preferences: socio-economic factors, moral values, and psychological constructs. We leverage a unique combination of rich Swedish registry data for a large sample of identical twins, with a comprehensive battery of 34 political preference measures, and build a meta-analytical model comparing our most conservative observational (naive) estimates with discordant twin estimates. This allows us to infer the amount of bias from unobserved genetic and shared environmental factors that remains in the naive models for our predictors, while avoiding precision issues common in family-based designs. The results are sobering: in most cases, substantial bias remains in naive models. A rough heuristic is that about half of the effect size even in conservative observational estimates is composed of confounding.

Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (, which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
© The Author(s) 2022. Published by Cambridge University Press on behalf of the Society for Political Methodology

1 Introduction

The last decades have seen a steady rise, across the social sciences, in the interest in methods for robust causal inference (Angrist and Pischke Reference Angrist and Pischke2010; Clark and Golder Reference Clark and Golder2015). This movement, sometimes referred to as the causal identification revolution (Huber Reference Huber2013), has been spurred by the growing realization that conventional observational methods, utilizing various statistical adjustments for possible confounding factors, often fall short of identifying credible causal effects. An explosion in the use of alternative observational designs, such as instrumental variables, regression discontinuities, or natural experiments, as well as actual experimental designs both in the lab and in the field (Druckman et al. Reference Druckman, Green, Kuklinski and Lupia2006), has followed.

A core part of public opinion research, and arguably of political science as a discipline, is to identify how political preferences (such as attitudes about taxation, redistribution, family planning, foreign and environmental policy, etc.) are shaped by things like economic factors, social context, personality traits, education and skills. Scholarly work ranging from classical political economy and sociology (Lipset Reference Lipset1960; Marx Reference Marx1977), via overarching paradigms like rational choice theory or early life socialization (Jennings and Niemi Reference Jennings and Niemi1968; Reference Jennings and Niemi1981) to modern work on the psychological “endophenotypes” underpinning our politics (Oskarsson et al. Reference Oskarsson2015; Smith et al. Reference Smith, Oxley, Hibbing, Alford and Hibbing2011) has argued about various ways, in which individual circumstances and traits can affect placement on ideological issues.

The nature of these questions, however, is such that robust causal identification is often difficult to achieve: experimental interventions into the most important determinants of political preferences are often either unethical, impossible, or prohibitively expensive to implement,Footnote 1 while credible instruments and discontinuities for many of these factors are rare. Where such designs are viable, they may also be of limited value for the substantive questions we are after. As has been argued elsewhere (e.g., Deaton Reference Deaton2009), we often end up saying something credible about the effect among a very narrow group of people (e.g., “compliers” in the case of experiments or IV estimation, or the specific set of people around a regression discontinuity (RDD) threshold) and very little about anyone else, in effect trading external for internal validity (see McDermott Reference McDermott, Druckman, Greene, Kuklinski and Lupia2012 for an overview). Using individual fixed effects estimation with panel data also excludes all predictors that are stable over time. For a fairly large set of important research questions regarding political preference formation, it therefore seems like we are stuck with observational methods whose validity we now know is limited. This raises a crucial question: just how much bias should we expect to find in such estimates? In other words, how wrong will our best guess be?

The purpose of this paper is to attempt to measure the magnitude of bias stemming from both measurable and unmeasurable confounders for three broad and well-established domains of individual determinants of political preferences: socioeconomic factors, moral values, and psychological constructs. To accomplish this, we leverage a unique combination of rich Swedish registry data and a large sample of identical twins, with a comprehensive survey battery of political preference measures. Departing from the registry sources, we attempt to construct the best possible, conservative observational (naive) models, incorporating not only individual, but also family and contextual level controls. We then contrast this to a discordant twin design which also factors out unmeasured genetic and shared familial factors. Doing this across the full range of political preference measures allows us to meta-analyze the average effect size for each model and independent variable. The differences between the naive and discordant twin models can then be used to infer the average degree of confounding stemming from genetic and shared familial factors in the naive models, for each predictor separately. This meta-analytical procedure solves a number of precision problems associated with family-based designs.

The results are sobering: for a large set of important determinants, a substantial bias seems to remain even in conservative naive models. In a majority of cases, half or more of the naive effect size appears to be composed of confounding, and in no cases are the naive effect sizes underestimated. The implications of this are important. First of all, it provides a reasonable bound on effect estimates stemming from observational methods without similar adjustments for unobserved confounders. While the degree of bias will vary depending on both predictors and outcomes, a rough but useful heuristic derived from the results of this paper is that effect sizes are often about half as big as they appear. Second, future research will have to consider more carefully the confounding effects of genetic factors and elements of the rearing environment that are not easily captured and controlled for.

2 Background

Statistical controls often go a long way in removing spurious, or noncausal, covariation between two variables of interest. However, the degree to which it is possible to remove all bias in this way is crucially dependent on whether or not one can actually measure and correctly specify the variables causing this spurious covariation.

As a salient example, a growing number of studies have documented that just like other human traits (Polderman et al. Reference Polderman2015) individual variation in political behavior is also to some degree influenced by genetics (Alford, Funk, and Hibbing Reference Alford, Funk and Hibbing2005; Hatemi et al. Reference Hatemi2014). This raises the spectre of genetic confounding: traits might be correlated because they are influenced by the same genetic architecture.

Controlling for genetic factors requires genetically informative data. This might mean actual genetic sequencing data. However, modern genomic research tells us that complex social phenotypes are influenced by a very large number—possibly millions—of genetic variants (Chabris et al. Reference Chabris, Lee, Cesarini, Benjamin and Laibson2015), making required sample sizes for analyses controlling for all appropriate genetic variants, if they were known, impossibly large. Moreover, aggregative methods (like adding up all previously identified genetic variants in a so-called polygenic index) have not yet reached a predictive capacity that matches the known magnitude of genetic influences, and will therefore remove only part of the genetic confounding (Young Reference Young2019).

Similarly, there might be a number of environmental variables shared in families that are difficult to measure accurately and therefore difficult to control for (parenting style, culture, etc). There is no lack of potential confounders that we can think of, but perhaps we should worry most about the things we cannot think of.

Arguably the most powerful way of controlling for both genetic and other familial factors simultaneously is to use known family relationships to partial out these influences. Specifically, the existence of identical twins gives us access to a type of natural experiment that allows us to completely rule out genetic effects as well as family environment. This is often called the “discordant MZ-twin” model (Vitaro, Brendgen, and Arseneault Reference Vitaro, Brendgen and Arseneault2009), and boils down to comparing individuals within identical twin pairs—if the twin with, say, higher education also prefers more stringent environmental policies, this association at least cannot be attributed to the confounding effect of genetics or shared family environment. This approach differs from traditional twin methods in behavior genetics (like variance decomposition) in that it does not seek to map the extent of genetic influence, but instead attempts to find causal relationships between environmental variables free from familial confounding.

Our aim is to quantify the degree of bias both captured by, and remaining in, well-specified observational models of political preference formation. To accomplish this, we will contrast meta-analyzed results for naive models using a comprehensive and conservative set of statistical controls to the results from discordant twin models, for a wide range of political preference measures.

We have tried to cover three general and well-established domains of predictors. The first domain is socioeconomic factors—here, the predictors education, income, and wealth are included. The idea that these types of factors are important for political preference formation is arguably as old as political economy itself. The connection between an individuals’ level of education and their politics is well established, with results tending to show higher education associated with more liberal or left-leaning preferences (Dunn Reference Dunn2011; Weakliem Reference Weakliem2002, although see Marshall Reference Marshall2016). Similarly, the idea that wealth and income are important determinants of political preferences is central to both the patrimonial voting literature (Lewis-Beck, Nadeau, and Foucault Reference Lewis-Beck, Nadeau and Foucault2013; Quinlan and Okolikj Reference Quinlan and Okolikj2019; Ahlskog and Brännlund Reference Ahlskog and Brännlund2021) as well as public choice theory more broadly (e.g., Meltzer and Richard Reference Meltzer and Richard1981).

The second domain is moral and social attitudes. In this domain, we have included social trust, altruism and antisocial attitudes, and utilitarian judgement. Social trust—the tendency to think people in general can be trusted (Van Lange Reference Van Lange2015)—has been linked to a variety of political preferences, such as support for right-wing populists (Berning and Ziller Reference Berning and Ziller2016; Koivula, Saarinen, and Räsänen Reference Koivula, Saarinen and Räsänen2017), attitudes on immigration (e.g., Herreros and Criado Reference Herreros and Criado2009) and the size of the welfare state (Bjornskov and Svendsen Reference Bjornskov and Svendsen2013). Altruism or other-regarding preferences have been proposed to be connected to redistributive politics (Epper, Fehr, and Senn Reference Epper, Fehr and Senn2020) as well as the general left–right continuum (Zettler and Hilbig Reference Zettler and Hilbig2010). Finally, utilitarian judgement has recently been connected to ideological dimensions such as Right-Wing Authoritarianism and Social Dominance Orientation (Bostyn, Roets, and Van Hiel Reference Bostyn, Roets and Van Hiel2016). Both altruism and utilitarian judgment are also related to the care/harm dimension of Moral Foundations Theory (Graham et al. Reference Graham2013), which has been suggested to be more prevalent among individuals with liberal political attitudes (Graham et al. Reference Graham, Nosek, Haidt, Iyer, Koleva and Ditto2011).

The third domain is psychological constructs, and for this domain we have included risk preferences, extraversion, locus of control, and IQ. Risk aversion has been suggested to be a crucial determinant of certain political preferences (Dimick and Stegmueller Reference Dimick and Stegmueller2015; Cai, Liu, and Wang Reference Cai, Liu and Wang2020). Various personality domains have also been proposed to be important. Although it would be preferable to have data for all of the Big Five domains, we can only include extraversion due to data limitations. There is some evidence that extraversion is connected to social conservatism (Carney et al. Reference Carney, Jost, Gosling and Potter2008), although this has been disputed (Gerber et al. Reference Gerber, Huber, Doherty, Dowling and Ha2010). The construct locus of control, a measure on to what extent individuals feel responsible for their own life outcomes, has been shown to vary with political affiliation, with conservatives generally having a stronger internal locus of control (Gootnick Reference Gootnick1974; Sweetser Reference Sweetser2014). Finally, research on the connection between cognitive capacity and political orientation is diverse, with results generally indicating that intelligence predicts more liberal attitudes (Deary, Barry, and Gale Reference Deary, Barry and Gale2008; Schoon et al. Reference Schoon, Cheng, Gale, Batty and Deary2010) but also right-wing economic attitudes (Morton, Tyran, and Wengström Reference Morton, Tyran and Wengström2011).

3 Data

The main data come from a large sample of identical twins in the Swedish Twin Registry (STR). The STR is a near-complete nation-wide register of twins established in the 1950s, now containing more than 200,000 individuals (Zagai et al. Reference Zagai, Lichtenstein, Pedersen and Magnusson2019). Apart from being possible to connect to other public registers, the STR also frequently conducts their own surveys, making it not only the largest, but also one of the richest twin data sources available.

Political preference measures are taken from the SALTY survey from the STR. The SALTY survey was conducted in 2009–2010 in a total sample of 11,482 individuals born between 1943 and 1958, and contains measures of, among other things, psychological constructs, economic behavior, moral and political attitudes, and behavior and health measures. Importantly for our purposes, the survey contains a comprehensive battery of 34 political preference measures. We use these 34 items as our outcome space. The items present specific policy proposals spanning issue dimensions from economic and social policy (e.g., Taxes should be cut or Decrease income inequality in society) to environmental and foreign policy (e.g., Ban private cars in the inner cities or Sweden should leave the EU), and ask the respondent to indicate to what degree they agree with these proposals on a 1–5 scale. The choice of policy preferences used in the survey overlaps with previous waves of the Swedish Election Study (Holmberg and Oscarsson Reference Holmberg and Oscarsson2017), which in turn partially overlaps with election studies in several other countries. A full list of the items can be found in Supplementary Appendix A.

Data for the predictors outlined in the background section are gathered from a number of register and survey sources. Precise definitions and sources can also be found in Supplementary Appendix A.

3.1 Additional Datasets

The external validity of the main results is limited by three factors. First, the twin population might differ from the nontwin population simply by virtue of being twins. Second, the subsample of the STR used in this study consists of individuals who have agreed to participate in genotyping (Magnusson et al. Reference Magnusson2013), which may signal civic-mindedness that makes them different from the rest of the population. Third, Swedes might differ from other nationals.

To check the external validity of the empty versus naive model changes, data from election surveys in Sweden, Denmark, Norway, and the UK will be leveraged. These contain attitude data that can be matched to some of the outcomes used in the main data, as well as variables for a few predictors and controls. Details on these models can be found in Supplementary Appendix C.

4 Method

The method employed follows three steps for each predictor separately. First, three regression models (empty, naive, and within, as outlined below) are run for each political preference outcome in the sample of complete twin pairs. Second, a meta-analytical average for all outcomes, per model, is calculated. Third, this average effect size is compared across models to see how it changes with specification.Footnote 2

This procedure is intended to solve two fundamental problems. The first problem is the one broadly outlined above: it allows us to infer how much confounding each specification successfully captures. The second problem that it solves is that statistical precision is often severely reduced when moving from between- to within-pair estimates, for two reasons. First, since there are half as many pairs as there are individuals, adding pair fixed effects decreases the degrees of freedom by $n/2$ . Other things being equal, the standard errors should then be inflated by almost the square root of 2, that is, roughly $1.4$ . Second, since we are removing all factors shared by the twins, within-pair differences are going to be much smaller than the differences between any two randomly selected individuals in the population. This results in less variation in the exposure of interest and therefore less precision (Vitaro et al. Reference Vitaro, Brendgen and Arseneault2009). As a consequence, a change in the effect size when going from a naive to a within-pair model is more likely to come about by pure chance than when simply comparing two different naive specifications.

The precision problem is at least partially solved by the aggregation of many outcomes: while we should expect standard errors to be higher in the discordant models, the coefficients should not change in any systematic direction if the naive effect sizes are unbiased. Systematic changes in the average effect size across the different preference items is therefore a consequence of model choice (and, we argue, a reduction in bias) rather than variance artefacts.

4.1 Models

4.1.1 Empty

Three models of increasing robustness will be tested in two stages of comparisons. The first (the “empty,” e) model will be used as a reference point and controls only for sex, age fixed effects, and their interaction:

(1) $$ \begin{align} y^e_{ij} = a + b^e_{j}x_{ki} + b_{2}\text{sex}_{i} + \sum_{a=1} \left( c_a \text{age}_{ia} +d_a \text{sex}_{i}\times\text{age}_{ia} \right) + e_i, \end{align} $$

where i denotes an individual twin and j denotes the outcome.

4.1.2 Naive

The second model (the “naive” model, n), and hence the first model comparison, adds a comprehensive set of controls available in the register data. The ambition is to produce as robust a model as possible with conventional statistical controls. The controls include possible contextual (municipal fixed effects), familial (parental birth years, income, and education) and individual (occupational codes, income, and education) confounders. In total, this should produce a model that is fairly conservative:

(2) $$ \begin{align} y^n_{ij} = a + b^n_{j}x_{ki} + b_{2}\text{sex}_{i} + \sum_{a=1} \left( c_a \text{age}_{ia} +d_a \text{sex}_{i}\times\text{age}_{ia} \right) + \mathbf{b\chi_i} + e_i, \end{align} $$

where $\mathbf {\chi _i}$ is the vector of naive controls. Complete definitions of all naive controls can be found in Supplementary Appendix A.Footnote 3

4.1.3 Within

Finally, the third model (the “within” model, w) adds twin-pair fixed effects, producing a discordant twin design.Footnote 4 This controls for all unobserved variables shared within an identical twin pair, that is, genetic factors, ubpringing and home environment, as well as possible neighborhood and network effects:

(3) $$ \begin{align} y^w_{ij} = a + b^w_{j}x_{ki} + b_{2}\text{sex}_{i} + \sum_{a=1} \left( c_a \text{age}_{ia} +d_a \text{sex}_{i}\times\text{age}_{ia} \right) + \mathbf{b\chi_i} + \sum_{p=1} \phi_p P_{ip} + e_i, \end{align} $$

where $\sum \phi _p P_{ip}$ are the twin-pair fixed effects for twin pair p. Note that when adding these pair fixed effects, the age and sex variables as well as many of the controls will automatically be dropped since they are shared within pairs.

4.2 Changes Between Models

In all models, standardized regression coefficients are used to facilitate aggregation and comparison. Furthermore, to be able to calculate meaningful averages of coefficients across the full range of outcomes, all outcomes $y_j$ are transformed to correspond to positive coefficients in the baseline model (i.e., empty when comparing empty vs. naive, and naive when comparing naive vs. within), such that

(4) $$ \begin{align} \begin{aligned} y^{e*}_j &= |y^e_j| , \\ y^{n*}_j &= 6 - y_j \text{ if } b^e_{j}<0 \end{aligned} \end{align} $$

is used when introducing the naive controls, and

(5) $$ \begin{align} \begin{aligned} y^{n*}_j & = |y^n_j| , \\ y^{w*}_j & = 6 - y_j \text{ if } b^n_{j}<0 \end{aligned} \end{align} $$

is used when moving from the naive to the within model.Footnote 5 To see why this transformation is necessary, consider its absence—the average effect size would reflect both negative and positive effects and thus go toward zero, and the average effect size would be the result of the arbitrary coding of the items. Norming the sign by the previous model makes sure that changes to the average when new controls are introduced have an interpretation. Since there are two model comparisons (naive following empty, and within following naive), the naive model will also exist in two versions: one normed with Equation (4) to compare with the empty model (and hence with coefficients on both sides of the null), and one renormed with Equation (5) used as the departure point for the within model (and hence with non-negative coefficients).

When moving from the naive model to the discordant twin model, we should expect the precision in the estimates to decrease more than when moving from the empty to the naive model. This, as elaborated above, follows from the introduction of twin-pair fixed effects and implies that point estimates for any given preference measure can change substantially due to random chance. In expectation, however, the change between models is zero under the null hypothesis that the naive model captures all confounding. Leveraging the fact that there are 34 different outcomes in the same sample therefore allows us to make inferences about the average degree of confounding remaining: systematic differences between the naive and within models should not arise as a consequence of statistical imprecision.

The first step in assessing how the effects change as a consequence of model choice is to calculate the overall average effect size B for all outcomes, for each specification and predictor:

(6) $$ \begin{align} B^m_v = \frac{1}{k} \sum^{k}_{j=1} b^{m*}_{vj}. \end{align} $$

Here, k is the number of dependent variables (i.e., 34 for the main models), m is the model (out of e, n, or w) and v is the predictor.

To know whether changes between models are meaningful, we also need standard errors for this mean. Since political preferences tend to vary along given dimensions (such as left–right) is also necessary to account for the correlation structure of the outcome space. Unless completely independent, the effective number of outcomes is lower than 34. The standard error for B, when adjusted for the correlation matrix of the outcome space, is therefore the meta-analytical standard error for overlapping samples (Borenstein et al. Reference Borenstein, Hedges, Higgens and Rothstein2009, Ch. 24):

(7) $$ \begin{align} SE_{B^m_v} = \sqrt{\text{V}\left( \frac{1}{k} \sum^{k}_{j=1} b^{m*}_{vj} \right) } = \left( \frac{1}{k} \right) \sqrt{ \left( \sum^{k}_{j=1} V_{b^{m*}_{vj}} + \sum^{k}_{j\ne l} (r_{jl}\sqrt{V_{b^{m*}_{vj}}}\sqrt{V_{b^{m*}_{vl}}}) \right)}, \end{align} $$

where $r_{jl}$ is the pairwise correlation between preference measures j and l.Footnote 6 To see how this correction affects the standard error, we can consider how it changes as the correlations between the preference measures change. In the special case where the outcomes are completely uncorrelated, such that $r_{jl}=0\forall j,l$ , the formula collapses to $\dfrac {1}{k}\sqrt {\sum ^{k}_{j=1} V_{b^{m*}_{vj}}}$ and is decreasing in the number of outcomes: each preference measure adds independent information. As the correlation between outcomes increases, the standard error also increases by the corresponding fraction of the product of the individual item standard errors, and approaches, in the case where $r_{jl}=1\forall j,l$ the simple average of all included standard errors, such that any additional outcome provides no additional information.

5 Results

The first set of results, shown in Figures 1 and 2, includes all 34 political preference outcomes.Footnote 7 The lightest bars represent the meta-analyzed empty models, that is, the average standardized coefficients across all 34 outcomes, with only sex, age, and their interaction as controls.Footnote 8 The largest average coefficient is evident for education years at 11.9% of a standard deviation, trailing all the way down to 3.4% for utilitarian judgment.

Figure 1 Main results, all outcomes, empty versus naive. Average beta coefficients across all outcomes, per model and predictor. 90% confidence intervals shown.

Figure 2 Main results, all outcomes, naive versus within. Average beta coefficients across all outcomes, per model and predictor. 90% confidence intervals shown.

We can also see changes to the average effect sizes when moving from the empty to the naive models. It is evident that the extensive controls introduced in the naive models in many cases draw the average effect size substantially toward the null. For example, the naive effect estimate for education years is roughly 67% of the empty effect estimate, for IQ roughly 65% and for work income roughly 47%. In most cases, the reduction in the effect estimate is also itself significant, the exceptions being altruism, antisocial attitudes, and utilitarian judgment, where the average empty estimates were close to zero to begin with.

Furthermore, we can see what happens when we move from the naive models to the within models in Figure 2. Comparing the renormed naive to the within models, substantial chunks of the effect sizes are again removed. For example, for education years only 33% of the naive effect size remains, for work income 36% and for trust roughly 25%. For these, as well as college education and risk preferences, the majority of the naive effect size appears to be attributable to unmeasured confounding shared within twin pairs, whereas for net wealth and extraversion it is roughly half. This reduction is itself significant for education years, college, gross and net wealth, work income, trust, and risk preferences. In other cases, most notably utilitarian judgment, the within models are not at all or only slightly closer to zero than the naive models indicating that unmeasured familial confounding is not biasing these results appreciably.

Not all political preferences are theoretically plausibly connected to each predictor, however. Including all political preference outcomes is therefore going to push estimates for all models toward zero. For this reason, it is also interesting to restrict the analysis to some set of key outcomes per predictor. Barring a carefully constructed set of hypotheses, this can be done in an atheoretical fashion in two different ways. First of all, we report results restricted to the outcomes that were initially statistically significant ( $p<0.05$ ) in the naive models. Selecting on significance is known to produce a phenomenon known as the “winner’s curse”—meaning that the expected effect size is inflated. Apart from testing a more “lenient” set of outcomes, this procedure therefore also partially mimics the effects of publication bias (Young, Ioannidis, and Al-Ubaydli Reference Young, Ioannidis and Al-Ubaydli2008).

In Figure 3, the Winner’s curse results selected on naive significance are presented (restricted to predictors with at least five outcomes passing the threshold). As expected, there is a general inflation in effect sizes (the extent to which this is due to the Winner’s curse versus a better selection of outcomes is not possible to evaluate with the current methodology). However, there is still a substantial reduction in the average effect sizes for all predictors when moving to within-pair variation. For example, whereas previously the within pair effect of IQ was 84% of the naive effect and the reduction not significant, it is now 52% and significant. The reductions are also significant for all shown predictors except locus of control and antisocial attitudes.

Figure 3 Winner’s curse: naive significance selection. Average beta coefficients for outcomes with $p<0.05$ in naive model, per model and predictor. 90% confidence intervals shown. Only predictors with at least five included outcomes shown (number of included outcomes in parentheses).

Another way of atheoretically picking the “right” outcomes to include for each predictor is to look at the naive effect size instead of statistical significance. We have set the (somewhat arbitrary) threshold of having a standardized coefficient of at least $\beta>0.1$ . This filters on substantive rather than statistical significance and is going to lead to higher average effect sizes in both models for purely mechanical reasons, in a fashion similar to the Winner’s curse.

Figure 4 presents the results selected on naive effect sizes (again restricted to predictors with at least five outcomes passing the threshold). The picture is consistent with the previous results—reductions when moving to within-pair variation are substantial (the remaining effect is generally in the range of 40–60%) and are significant for the shown predictors except antisocial attitudes and IQ.

Figure 4 Naive effect size selection. Average beta coefficients for outcomes with $\beta>0.1$ in naive model, per model and predictor. 90% confidence intervals shown. Only predictors with at least five included outcomes shown (number of included outcomes in parentheses).

5.1 Robustness

To evaluate the external validity of the results, we have matched political preference items from Swedish, Danish, Norwegian, and British election surveys to the corresponding items in SALTY and run empty and naive models for a small subset of predictors. These results can be found in Supplementary Appendix C. Both average effect sizes and reductions in effect sizes between models are comparable when using data from the Swedish election study. This indicates that the overall results are not driven by the particular characteristics of the twin sample. Furthermore, effect size reductions are roughly comparable when using data from other Nordic countries. The least consistent results are obtained for college and income in the British sample, where the included controls make sizable dents in the effect sizes in the STR data, but almost none in the British data. This could stem from differences in data quality, but could also be taken to imply that the remaining familial bias is even larger in the British data. It is also possible that it reflects other institutional differences between the UK and the other included countries. In almost all cases, the matched within-pair models in the STR data further cuts the effect sizes dramatically, which indicates that the overall pattern of results should be externally valid.

Furthermore, to check the robustness of the results to violations of the independence assumption, we tested a set of contact rate interaction models. These are outlined in detail in Supplementary Appendix B. In summary, effect size reductions may be slightly inflated, but not significantly so for any predictor except IQ.

6 Discussion

The feasibility of observational methods for capturing causal effects on political preferences depends on to what extent they are able to remove confounding variation. In this study, we have shown that for a fairly large set of predictors, even conservative observational models will often still suffer from substantial bias. This bias can be found across predictors in all of the three domains we were able to investigate, but was particularly pronounced in the domain most often assumed to be crucial for political preference formation—socioeconomic factors like education, income, and wealth.

Using discordant twin analyses to estimate the remaining level of confounding is not without problems. This study was partly designed to overcome one of the main issues—decreased precision—but others can be addressed. In particular, there is always a risk that some remaining factor in the unique environment confounds the relationship. We have gone to great lengths to include as powerful statistical controls as possible, but this risk can never be fully ameliorated. However—unique environmental confounders will be missing for both naive and within-pair models, and will therefore not detract from the main takehome message: observational models with a seemingly robust and conservative set of controls are likely substantially biased by familial factors that are difficult to capture.

Another issue that could be important in the present case is random measurement error in the predictor. This becomes important since the attenuating effect of measurement error is potentially magnified when adding twin-pair fixed effects, at least in bivariate models (Griliches Reference Griliches1979).Footnote 9 In some of the more important cases, this would be less of an issue since data often come from registers with high reliability (e.g., education, wealth and income etc.), but for some of the psychological constructs it might cause an artifically large “bias reduction” that is actually attributable to magnified attenuation bias. This problem does not apply to problems with systematic measurement error—while a problem in its own right, it will attenuate estimates from all models equally. Unfortunately, without a good idea of the degree of measurement error (i.e., test–retest reliability ratios) for the specific items used to operationalize the predictors in this study, the possible magnitude of this problem is difficult to assess, and such an assessment would only be valid for bivariate comparisons.

Finally, an issue with discordant twin designs that has been discussed in the econometric literature (dating back to Griliches (Reference Griliches1979)) is that while twin-fixed effects does filter out much of the endogenous variation, it also filters out exogenous variation. If the proportion between the two is unaltered, we would be in no better situation than without the discordant design altogether (see e.g., Bound and Solon (Reference Bound and Solon1999) for a more thorough discussion). However, departing from the plausible assumption that the net effect of unobserved confounders is to inflate, rather than suppress, effect estimates, our within-pair models are still going to provide an upper bound. As such, we would not be finding bias reductions that are artifically large, but too small. This assumption can be bolstered by the fact that the net effect of the observable confounders (comparing empty to naive models in Figure 1) is inflationary.

Taking our aggregate effect size reductions at face value, a reasonable heuristic appears to be that we should expect roughly half of the effect size from naive observational methods to be composed of confounding. This result is largely in line with what tends to show up in discordant twin designs with other political outcomes. In Weinschenk and Dawes (Reference Weinschenk and Dawes2019), the estimated effect of education on political knowledge is reduced by about 72% when going from a naive to a discordant twin model. Similarly, in a recent paper on the relationship between political attitudes and participation by Weinschenk et al. (Reference Weinschenk, Dawes, Oskarsson, Klemmensen and Norgaard2021), the effect size decreased by 60%, 38% and 35% in Germany, the United States, and Sweden, respectively (in Denmark it was found to disappear completely). Looking at education and political participation, Dinesen et al. (Reference Dinesen2016) found that the effect decreased by 53% in the United states, and disappeared completely in Denmark and Sweden. Thus, it appears likely that our proposed general rule of thumb would also apply to outcomes other than political preferences, like political participation, civicness, knowledge and similar types of behaviors.

The pattern of results shown in this paper should be a strong reminder that observational estimates are likely to be substantially biased—even when a conservative set of controls are utilized. In short, causal conclusions in these situations are rarely warranted. This should not discourage researchers from the well-established approach of using the multiple regression toolkit on observational data—on the contrary, as we point out in the introduction, it is in many cases the only tool available to us. However, it underscores the necessity of refraining from using causal language, and making policy recommendations that will, in many cases, fall short in the real world.


This work was supported by Riksbankens Jubileumsfond [P18:-0728:1].

Conflicts of Interest

The authors would like to declare no conflicts of interest.

Data Availability Statement

This paper is based on proprietary register data held by Statistics Sweden and the Swedish Twin Register. The full code for obtaining the results reported in the paper, as well as intermediate (aggregate) data is available from the Political Analysis Dataverse at (Ahlskog and Oskarsson (Reference Ahlskog and Oskarsson2022)). The Dataverse also contains a detailed description on how to apply for access to the register data.

Supplementary Material

For supplementary material accompanying this paper, please visit


1 Unethical, because interventions required to change, for example, someones’ psychological disposition, would be unethical in themselves, or because they may change political attitudes (and therefore outcomes of political processes) that are not normatively neutral. Impossible, because many things are not sensitive to manipulation (e.g., cognitive capacity). Prohibitively expensive, for example, where income and wealth are concerned.

2 The code for reproducing our results, as well as intermediate level data, are available from this article’s Dataverse at

3 Models without education and income as individual controls are also reported in Supplementary Appendix B.

4 This is implemented in Stata using the reghdfe command and absorbing twin-pair identifiers.

5 The correct mirror transformation is $\text {max}(y)+\text {min}(y)-y_j$ , which reduces to $6-y_j$ since all outcomes range from 1 to 5.

6 A minor detail is that this formula is for completely overlapping samples. While the samples for the different preference outcomes are going to vary slightly due to nonresponse on certain preference issues, this nonoverlap is negligible and will lead to the estimated standard errors being marginally too conservative.

7 An alternative way of viewing specific political preferences is to see them as instantiations of latent ideological constructs. In Supplementary Appendix B, we also present detailed results using indices derived from the first five principal components of the outcome space. The results with these reduced attitude dimensions are qualitatively identical to those presented here.

8 Tables with all details can be found in Supplementary Appendix B, and histograms of the complete effect size distributions for all models and predictors are found in Supplementary Appendix D.

9 In bivariate models with twin-pair fixed effects, the attenuation becomes a simple factor of the within pair correlation $\rho $ in the predictor: $\beta =\dfrac {\hat {\beta }}{1-r_e/(1-\rho )}$ where $r_e$ is the ratio of measurement error. In this case, attenuation bias is the same or larger in the within-pair models, but in models with multiple independent variables, the change could go in either direction.


Ahlskog, R., and Brännlund, A.. 2021. “Uncovering the Source of Patrimonial Voting: Evidence from Swedish Twin Pairs.” Political Behavior. Google ScholarPubMed
Ahlskog, R., and Oskarsson, S.. 2022. “Replication Code for: Quantifying Bias from Measurable and Unmeasurable Confounders across Three Domains of Individual Determinants of Political Preferences.” Harvard Dataverse. UNF:6:phnYW+g6qZF/LeaiAs2EzA==[fileUNF] Google Scholar
Alford, J. R., Funk, C. L., and Hibbing, J. R.. 2005. “Are Political Orientations Genetically Transmitted?American Political Science Review 99 (2): 153167.CrossRefGoogle Scholar
Angrist, J. D., and Pischke, J. S.. 2010. “The Credibility Revolution in Empirical Economics.” Journal of Economic Perspectives 24 (2): 330.CrossRefGoogle Scholar
Berning, C., and Ziller, C.. 2016. “Social Trust and Radical Rightwing Populist Party Preferences.” Acta Politica 52: 120.Google Scholar
Bjornskov, C., and Svendsen, G. T.. 2013. “Does Social Trust Determine the Size of the Welfare State?Public Choice 157 (1–2): 269286.CrossRefGoogle Scholar
Borenstein, M., Hedges, L., Higgens, J., and Rothstein, H.. 2009. Introduction to Meta-Analysis. West Sussex: Wiley.CrossRefGoogle Scholar
Bostyn, D. H., Roets, A., and Van Hiel, A.. 2016. “Right-Wing Attitudes and Moral Cognition.” Personality and Individual Differences 96: 164171.CrossRefGoogle Scholar
Bound, J., and Solon, G.. 1999. “Double Trouble: On the Value of Twins-Based Estimation of the Return to Schooling.” Economics of Education Review 18: 169182.CrossRefGoogle Scholar
Cai, M., Liu, P., and Wang, H.. 2020. “Political Trust, Risk Preferences, and Policy Support.” World Development 125: 104687.CrossRefGoogle Scholar
Carney, D. R., Jost, J. T., Gosling, S. D., and Potter, J.. 2008. “The Secret Lives of Liberals and Conservatives.” Political Psychology 29 (6): 807840.CrossRefGoogle Scholar
Chabris, C. F., Lee, J. J., Cesarini, D., Benjamin, D. J., and Laibson, D. I.. 2015. “The Fourth Law of Behavior Genetics.” Current Directions in Psychological Science 24 (4): 304312.CrossRefGoogle ScholarPubMed
Clark, W. R., and Golder, M.. 2015. “Big Data, Causal Inference, and Formal Theory: Contradictory Trends in Political Science?: Introduction.” PS: Political Science and Politics 48 (1): 6570.Google Scholar
Deary, I., Barry, D., and Gale, C.. 2008. “Childhood Intelligence Predicts Voter Turnout, Voting Preferences, and Political Involvement in Adulthood.” Intelligence 36: 548555.CrossRefGoogle Scholar
Deaton, A. 2009. “Instruments of Development: Randomization in the Tropics, and the Search for the Elusive Keys to Economic Development.” National Bureau of Economic Research Working Paper 14690.Google Scholar
Dimick, M., and Stegmueller, D.. 2015. “The Political Economy of Risk and Ideology.” SOEP Papers 809-2015.CrossRefGoogle Scholar
Dinesen, P. T., et al. 2016. “Estimating the Impact of Education on Political Participation.” Political Behavior, 38: 579601.CrossRefGoogle Scholar
Druckman, J. N., Green, D. P., Kuklinski, J. H., and Lupia, A.. 2006. “The Growth and Development of Experimental Research in Political Science.” The American Political Science Review 100 (4): 627635.CrossRefGoogle Scholar
Dunn, K. 2011. “Left-Right Identification and Education in Europe: A Contingent Relationship.” Comparative European Politics 9: 292316.CrossRefGoogle Scholar
Epper, T., Fehr, E., and Senn, J.. 2020. “Other-Regarding Preferences and Redistributive Politics.” ECON, Working Paper 339, Department of Economics, University of Zurich.CrossRefGoogle Scholar
Gerber, A. S., Huber, G. A., Doherty, D., Dowling, C. M., and Ha, S. E.. 2010. “Personality and Political Attitudes.” American Political Science Review 104 (1): 111133.CrossRefGoogle Scholar
Gootnick, A. T. 1974. “Locus of Control and Political Participation of College Students.” Journal of Consulting and Clinical Psychology 42 (1): 5458.CrossRefGoogle ScholarPubMed
Graham, J., Nosek, B. A., Haidt, J., Iyer, R., Koleva, S., and Ditto, P. H.. 2011. “Mapping the Moral Domain.” Journal of Personality and Social Psychology 101 (2): 366385.CrossRefGoogle ScholarPubMed
Graham, J., et al. 2013. “Moral Foundations Theory: The Pragmatic Validity of Moral Pluralism.” Advances in Experimental Social Psycholog 47: 55130.CrossRefGoogle Scholar
Griliches, Z. 1979. “Sibling Models and Data in Economics: Beginnings of a Survey.” Journal of Political Economy 87: S37S64.CrossRefGoogle Scholar
Hatemi, P. K., et al. 2014. “Genetic Influences on Political Ideologies: Twin Analyses of 19 Measures of Political Ideologies from Five Democracies and Genome-Wide Findings from Three Populations.” Behavioral Genetics 44: 282294.CrossRefGoogle Scholar
Herreros, F., and Criado, H.. 2009. “Social Trust, Social Capital and Perceptions of Immigration.” Political Studies 57 (2): 337355.CrossRefGoogle Scholar
Holmberg, S., and Oscarsson, H. E.. 2017. “Svensk valundersökning 2010.” Svensk Nationell Datatjänst. Version 1.0.Google Scholar
Huber, J. 2013. “Is Theory Getting Lost in the ‘Identification Revolution’?” The Political Economist, Summer: 1–3.Google Scholar
Jennings, M. K., and Niemi, R. G.. 1968. “The Transmission of Political Values from Parent to Child.” American Political Science Review, 62 (1): 169184.CrossRefGoogle Scholar
Jennings, M. K., and Niemi, R. G.. 1981. Generations and Politics: A Panel Study of Young Adults and their Parents. Princeton: Princeton University Press.CrossRefGoogle Scholar
Koivula, A., Saarinen, A., and Räsänen, P.. 2017. “Political Party Preferences and Social Trust in Four Nordic Countries.” Comparative European Politics 15: 10301051.CrossRefGoogle Scholar
Lewis-Beck, M., Nadeau, R., and Foucault, M.. 2013. “The Compleat Economic Voter.” British Journal of Political Science 43 (2): 241261.CrossRefGoogle Scholar
Lipset, S. M. 1960. Political Man. New York: Anchor Books.Google Scholar
Magnusson, P. K. E., et al. 2013. “The Swedish Twin Registry: Establishment of a Biobank and Other Recent Developments.” Twin Research and Human Genetics 16 (1): 317329.CrossRefGoogle ScholarPubMed
Marshall, J. 2016. “Education and Voting Conservative: Evidence from a Major Schooling Reform in Great Britain.” Journal of Politics 78 (2): 382395.CrossRefGoogle Scholar
Marx, K. 1977. A Contribution to the Critique of Political Economy. Moscow: Progress Publishers.Google Scholar
McDermott, R. 2012. “Internal and External Validity.” In Cambdrige Handbook of Experimental Political Science, edited by Druckman, J. I., Greene, D. P., Kuklinski, J. H., and Lupia, A., 2740. Cambridge: Cambridge University Press.Google Scholar
Meltzer, A. H., and Richard, S. F.. 1981. “A Rational Theory of the Size of Government.” Journal of Political Economy 89 (5): 914927.CrossRefGoogle Scholar
Morton, R., Tyran, J. R., and Wengström, E.. 2011. “Income and Ideology: How Personality Traits, Cognitive Abilities, and Education Shape Political Attitudes.” Department of Economics, University of Copenhagen, Discussion Paper 11-08.Google Scholar
Oskarsson, S., et al. 2015. “Linking Genes and Political Orientations: Cognitive Ability as Mediator Hypothesis.” Political Psychology 36 (6): 649665.CrossRefGoogle Scholar
Polderman, T. J. C., et al. 2015. “Meta-Analysis of the Heritability of Human Traits Based on Fifty Years of Twin Studies.” Nature Genetics 47: 702709.CrossRefGoogle ScholarPubMed
Quinlan, S. and Okolikj, M.. 2019. “ Patrimonial Economic Voting: A Cross-National Analysis of Asset Ownership and the Vote .” Journal of Elections, Public Opinion and Parties. Scholar
Schoon, I., Cheng, H., Gale, C., Batty, D., and Deary, I.. 2010. “Social Status, Cognitive Ability, and Educational Attainment as Predictors of Liberal Social Attitudes and Political Trust.” Intelligence 38: 144150.CrossRefGoogle Scholar
Smith, K. B., Oxley, D. R., Hibbing, M. V., Alford, J. R., and Hibbing, J. R.. 2011. “Linking Genetics and Political Attitudes: Reconceptualizing Political Ideology.” Political Psychology 32 (3): 369397.CrossRefGoogle Scholar
Sweetser, K. D. 2014. “Partisan Personality: The Psychological Differences Between Democrats and Republicans, and Independents Somewhere in Between.” American Behavioral Scientist 58 (9): 11831194.CrossRefGoogle Scholar
Van Lange, P. A. M. 2015. “Generalized Trust: Four Lessons from Genetics and Culture.” Current Directions in Psychological Science 24 (1): 7176.CrossRefGoogle Scholar
Vitaro, F., Brendgen, M., and Arseneault, L.. 2009. “The Discordant MZ-Twin Method: One Step Closer to the Holy Grail of Causality.” International Journal of Behavioral Development 33 (4): 376382.CrossRefGoogle Scholar
Weakliem, D. L. 2002. “The Effects of Education on Political Opinions.” International Journal of Public Opinion Research 14 (2): 141157.CrossRefGoogle Scholar
Weinschenk, A. C., and Dawes, C. T.. 2019. “The Effect of Education on Political Knowledge: Evidence from Monozygotic Twins.” American Politics Research 47 (3): 530548.CrossRefGoogle Scholar
Weinschenk, A. C., Dawes, C. T., Oskarsson, S., Klemmensen, R., and Norgaard, A. S.. 2021. “The Relationship between Political Attitudes and Political Participation.” Electoral Studies 69: 102269.CrossRefGoogle Scholar
Young, A. I. 2019. “Solving the Problem of Missing Heritability.” PLoS Genetics 15 (6): e1008222.CrossRefGoogle ScholarPubMed
Young, N. S., Ioannidis, J., and Al-Ubaydli, O.. 2008. “Why Current Publication Practices May Distort Science.” PLoS Medicine 5 (10): e201.CrossRefGoogle ScholarPubMed
Zagai, U., Lichtenstein, P., Pedersen, N. L., and Magnusson, P. K. E.. 2019. “The Swedish Twin Registry: Content Management as a Research Infrastructure.” Twin Research and Human Genetics 22 (6): 672680.CrossRefGoogle ScholarPubMed
Zettler, I., and Hilbig, B. E.. 2010. “Attitudes of the Selfless: Explaining Political Orientation with Altruism.” Personality and Individual Differences 48 (3): 338342.CrossRefGoogle Scholar
Figure 0

Figure 1 Main results, all outcomes, empty versus naive. Average beta coefficients across all outcomes, per model and predictor. 90% confidence intervals shown.

Figure 1

Figure 2 Main results, all outcomes, naive versus within. Average beta coefficients across all outcomes, per model and predictor. 90% confidence intervals shown.

Figure 2

Figure 3 Winner’s curse: naive significance selection. Average beta coefficients for outcomes with $p<0.05$ in naive model, per model and predictor. 90% confidence intervals shown. Only predictors with at least five included outcomes shown (number of included outcomes in parentheses).

Figure 3

Figure 4 Naive effect size selection. Average beta coefficients for outcomes with $\beta>0.1$ in naive model, per model and predictor. 90% confidence intervals shown. Only predictors with at least five included outcomes shown (number of included outcomes in parentheses).

Supplementary material: Link

Ahlskog and Oskarsson Dataset

Supplementary material: PDF

Ahlskog and Oskarsson supplementary material

Ahlskog and Oskarsson supplementary material

Download Ahlskog and Oskarsson supplementary material(PDF)
PDF 1.8 MB