## 1 Introduction

Analysis of dyadic data—data for which each observation represents a pair of units, for example, countries—is common in quantitative empirical research in international relations. Seminal theories such as the democratic trade hypothesis (Bliss and Russett Reference Bliss and Russett1998; Dixon and Moon Reference Dixon and Moon1993; Green, Kim, and Yoon Reference Green, Kim and Yoon2001; Mansfield, Milner, and Rosendorff Reference Mansfield, Milner and Rosendorff2000), democratic peace theory (Dafoe Reference Dafoe2011; Gartzke Reference Gartzke2007; Imai and Lo Reference Imai and Lo2021; Oneal and Russett Reference Oneal and Russett2001), liberal peace theory (Oneal and Russett Reference Oneal and Russett2001), and democratic alliance formation (Gibler and Wolford Reference Gibler and Wolford2006; Simon and Gartzke Reference Simon and Gartzke1996) claim empirical support (or lack thereof) from the analysis of dyadic data.

Dyadic data have a unique dependency structure—one where repeated observations of dyads are likely correlated with one another (as in panel datasets) *and* dyads that share a common member are likely correlated with one another. In particular, because multiple dyads can share members, model errors can be correlated across dyads. Only accounting for the correlations between repeated observations of dyads (e.g., by using dyad clustered standard errors or fixed effects) and ignoring correlations across dyads assumes dyad-level events are independent. This assumption contradicts substantive knowledge of the dependencies between many types of dyads common to the social sciences, such as dyads of countries. As Erikson, Pinto, and Rader (Reference Erikson, Pinto and Rader2017) note, “when a nation undergoes a pro-democratic revolution or, alternatively, when democratic leaders are deposed in a coup, the change ripples through all the nation’s many dyads.”

The idea that dyadic data exhibit a unique clustering structure that needs to be addressed methodologically in empirical work is not novel to political scientists. Random effects models have been proposed for dyads (Cameron and Golotvina Reference Cameron and Golotvina2005), Erikson *et al.* (Reference Erikson, Pinto and Rader2017) proposed a permutation testing framework that accounts for dyadic structure, and fully parametric analyses have accounted for dyadic structure and network structure (Hays, Kachi, and Franzese Jr. Reference Hays, Kachi and Franzese2010). Previous research has therefore determined that failing to properly account for dyadic clustering may result in underestimation of the size of standard errors and confidence intervals (Aronow, Samii, and Assenova Reference Aronow, Samii and Assenova2015; Cameron and Miller Reference Cameron and Miller2014; Erikson *et al.* Reference Erikson, Pinto and Rader2017). However, these methodological insights have not yielded a corresponding change in the way in which applied scholars conduct their research.

Recent work has developed standard error estimators that account for dyadic clustering. Building from Fafchamps and Gubert (Reference Fafchamps and Gubert2007), Cameron and Miller (Reference Cameron and Miller2014), Aronow *et al.* (Reference Aronow, Samii and Assenova2015), and Tabord-Meehan (Reference Tabord-Meehan2019) have developed and studied dyadic cluster-robust standard errors (DCRSEs). Using these DCRSEs, we reanalyzed all articles published in *International Organization* over the course of just over 6 years (January 2014 to January 2020) that feature dyadic data, none of which originally implemented DCRSEs in their primary dyadic analyses. We find that DCRSEs are on average approximately twice as large as published standard errors, but that most findings remain statistically significant. While the literature therefore dramatically understates uncertainty, the estimated coefficients are usually large enough to remain statistically significant at conventional levels. To facilitate accounting for dyadic clustering in future research, we also offer software in both R and Stata that perform calculation of DCRSEs.

Our primary contributions are therefore: (1) to empirically assess the degree to which uncertainty has been underestimated in previous research due to the presence of dyadic clustering and (2) to increase the accessibility of potential solutions to dyadic clustering for applied scholars.

Note, however, that DCRSEs are not a panacea—the underlying theory and data generating process of the empirical setting should be taken into account prior to choosing an estimation strategy. When there is dependence between non-incident dyads (i.e., dyads that do not share a common member), DCRSEs will still underestimate uncertainty, just as only clustering on repeated dyads underestimates uncertainty when there is dependence across incident dyads. DCRSEs should therefore not be considered a replacement for approaches to clustering that (appropriately) account for greater amounts of dependence in the data. The reanalysis we present therefore offers a lower bound on the severity of the consequences of inadequate clustering practices in previous research, or, in other words, reveals the extent to which dyadic clustering *alone* is jeopardizing the reliability of research.

## 2 Why Is Dyadic Clustering a Problem?

Dyadic data contain a dependency structure whereby repeated observations of dyads are allowed to be correlated with one another, and, importantly, dyads that share any common member are *also* allowed to be correlated with one another.

It may be helpful to illustrate the substantive assumptions implicit in assuming that dyad-level events are independent with common examples from IR theory. As Cranmer and Desmarais (Reference Cranmer and Desmarais2016) and Maoz *et al.* (Reference Maoz, Johnson, Kaplan, Ogunkoya and Shreve2019) note, assuming dyadic independence in WWII-era conflict implies that the conflict between Germany and Poland is unrelated to the conflict between Germany and Great Britain. This assumption is not realistic, as we know that Great Britain used the German invasion of Poland to justify its declaration of war on Germany. Similarly, Neumayer and Plümper (Reference Neumayer and Plümper2010) and Poast (Reference Poast2016) note that bilateral trade or investment treaties are influenced by the other treaties each member may already be a part of. Assuming independence would, for example, imply that a bilateral trade deal between the US and the UK is unrelated to post-Brexit UK–EU trade negotiations. We provide an illustration using bilateral trade flows in the following section. Note that we set aside separate issues relating to analysis of time-series cross-sectional data (Beck and Katz Reference Beck and Katz1995; Blackwell and Glynn Reference Blackwell and Glynn2018).

### 2.1 Toy Example: Bilateral Trade

Consider an example in which we observe trade volume for a set of country-country-year dyads. For U.S.–U.K. trade volume, any observations that include either the US or the UK may be correlated with observations of any other dyad that also includes either the US or the UK, respectively. Table 1 illustrates the difference in assumptions about the dependencies between countries under traditional clustering by repeated dyad only, and with full dyadic clustering. Table 1 highlights that under clustering by repeated dyad only, all country groups that do not share *both* members are assumed to be independent. By contrast, under dyadic clustering, only country groups that share no members are assumed to be independent.

This clustering structure affects statistical inference. To illustrate this, suppose we were interested in characterizing the variance of the average level of commerce between the US and the UK
$(Y_{US-UK})$
and between the US and France
$(Y_{US-FR})$
. We can compute the sample mean of their outcomes:
${\hat \mu = \frac {1}{2} (Y_{US-UK} + Y_{US-FR})}$
. *If* we were to assume that dyads were statistically independent of one another, the variances are simply additive (i.e., there is no covariance term) and we could compute the variance of
$\hat \mu $
as

However, with dyadic data, this may not be reasonable, especially when country-level factors are likely to impact dyadic outcomes. To see this, suppose that the true data-generating process is additive among countries, so that

where the *X* and the *U* are independent, and all *X* are pairwise-independent. In this instance, the true variance of
$\hat \mu $
is

When *a* and *c* are of the same sign—that is, the dyads share a positive correlation—the naive characterization of the variance understates the true sampling variability. Our setting did not involve any “network effects” between countries: the problem emerges solely from the fact that a single country (here, the US) is, mechanically, present in more than one dyad in the data.

## 3 Standard Error Estimation with Dyadic Data

Our approach to standard error estimation largely follows the logic of Cameron and Miller (Reference Cameron and Miller2014), in which errors are likely correlated between dyad observations that have a country in common. To ease exposition, we consider the linear model of $Y_{ijt}$ on regressors $x_{ijt}$ ,

where
$Y_{ijt}$
is the level of commerce between countries *i* and *j* in time period or observation *t* and
$\beta $
is the slope that we obtain from fitting this model to the entire population. Under the exogeneity condition
$\text {E}[u_{ijt} \mid x_{ijt}] = 0$
and the usual regularity conditions, the parameters of this model can be estimated using ordinary least squares.

The question is then how to estimate uncertainty in this model. Generically, the variance of an estimated parameter from a linear model can be represented in a symmetric form that resembles a sandwich, with two identical “bread” matrices and a “meat” matrix multiplied together in the order of bread, meat, and bread again (Aronow and Miller Reference Aronow and Miller2019; Davidson and MacKinnon Reference Davidson and MacKinnon2004; Greene Reference Greene2002):

where *X* denotes the matrix of regressors, and
$\Omega $
is the variance–covariance matrix of model errors, such that
$\Omega _{ijt,i'j't'} = \text {E}[u_{ijt} u_{i'j't'}]$
. Robust sandwich estimators are formed by assuming that some elements of
$\Omega $
are equal to zero, and then substituting residuals for errors, and means for expectations. Thus, the empirical variance–covariance matrix of model residuals,
$\hat {\Omega }$
, can be plugged into the above expression of variance to arrive at a variance estimator of the following form:Footnote
^{1}

So long as there is not “too much” clustering (for a precise statement, see Aronow, Crawford, and Zubizarreta Reference Aronow, Crawford and Zubizarreta2018),
$\hat {V}$
will be a consistent estimator of the variance–covariance matrix of the sampling distribution of
$\hat {\beta }$
. The question then becomes, what restrictions on
$\Omega $
are suitable for the problem at hand? The simplest case is to assume that there is no dependence across observations whatsoever in the data, which is the assumption for non-clustered RSEs: if
$i \neq i'$
,
$j \neq j'$
, **or**
$t \neq t'$
, then
$\text {E}[u_{ijt} u_{i'j't'}] = 0$
, or that the errors are uncorrelated. With six observations, Table 2 demonstrates the variance–covariance structure assumed by the naive approach:

In practice, the naive approach is widely recognized as inappropriate in the context of international relations. It is expected that, for example, changes in bilateral trade relations will have impacts beyond the immediate dyad and across time periods.

The most common approach is clustering by dyad, where it is assumed that errors are correlated when $i = i'$ and $j = j'$ , regardless of time period. With the same six observations, section 3 demonstrates the additional clustering permitted.

In Table 3, we can see that while this approach does account for within-dyad correlations across time, all observations in the matrix of model errors that do not share both members of the dyad are still assumed to be uncorrelated (i.e., $\text {E}[u_{ijt} u_{i'j't'}] = 0$ ).

By contrast, our approach only assumes independence when country pairs share no members. With the same six observations, we can see the clustering permitted by the dyadic clustering approach.

In the matrix shown in Table 4—full dyadic clustering—we can now see that the only observations assumed to be independent are those for which the dyad does not share any members with other observations.

More specifically, our approach to DCRSEs follows Aronow *et al.* (Reference Aronow, Samii and Assenova2015), which, in practice, allows for the DCR variance estimator to be decomposed entirely using robust variance estimators that are readily computed using popular statistical software packages.Footnote
^{2}
The DCR variance estimator in this decomposed form is

where
$\hat {V}_r$
is the estimated DCR variance–covariance matrix for longitudinal data;
$\hat {V}_{c,i}$
is the estimated dyad-member-*i*-specific clustered variance–covariance matrix;
$\hat {V}_D$
is the estimated repeated-dyad clustered variance–covariance matrix; and
$\hat {V}_0$
is the estimated heteroskedasticity-consistent variance–covariance matrix. Taking the square root of the diagonal of the DCR variance–covariance matrix yields DCRSEs for all model parameters.Footnote
^{3}

Limit theorems for dyadic data (Tabord-Meehan Reference Tabord-Meehan2019) establish that DCRSEs may be used to form asymptotically valid confidence intervals and *p*-values under a normal approximation. However, DCRSEs will tend to have more sampling variability than will conventional estimators that impose more structure (e.g., standard CRSEs), meaning that they may be unreliable for inference in small samples. Although simple corrections exist (cf. Bergé Reference Bergé2021; Cameron *et al.* Reference Cameron, Gelbach and Miller2011), theory and further refinements for small samples (e.g., Imbens and Kolesar Reference Imbens and Kolesar2016; Pustejovsky and Tipton Reference Pustejovsky and Tipton2018) remain topics of ongoing inquiry for multi-way clustering problems, including the dyadic clustering problem.

Note that the above approach and its implementation only correct for interdependence between shared countries. DCRSEs are *not* sufficient—although still improve over the naive approach or clustering by repeated dyad approach—when there are interdependencies throughout the entire system (i.e., across non-incident dyads). Two common examples of such systemic interdependencies in IR are alliance formation and multilateral trade deals. In deciding to form an alliance, friendly countries *i* and *j* may be influenced by a previous alliance formed by countries *i* and *k* (i.e., the
$ij$
alliance is more attractive now that
$ik$
are also allied). However, the
$ij$
alliance could also be influenced by an alliance between unfriendly countries *a* and *b*. DCRSEs do not capture the
$ij$
–
$ab$
interdependence as there are no common dyad members. Likewise, a multilateral trade deal may also be influenced by multiple pairs of relationships that do not necessarily share members.Footnote
^{4}
Aronow *et al.* (Reference Aronow, Crawford and Zubizarreta2018) developed conservative estimators for the variance of least squares estimators in such cases where there is further dependence—and in such cases we expect the variance of least squares estimates to be larger than those that only take into account dyadic clustering. There is therefore still a need to understand the data-generating process and underlying theory of an empirical setting prior choosing an estimation strategy.Footnote
^{5}

## 4 Reanalyzing Previous Studies

To study the consequences of failing to account for dyadic clustering *in practice*, we reanalyze recent, prominent studies from the international relations literature by applying DCRSEs to estimates in empirical contexts where DCRSEs are uniquely suited to handle dyadic clustering.Footnote
^{6}

Specifically, we reanalyze all empirical articles published in *International Organization* over a period of 6 years (from January 2014 to January 2020) that feature dyadic data. Studies were discovered by performing a Google Scholar search of all publications mentioning any form of the word “dyad” in this period. Specifically, a search query specified the publication name “International Organization” and keywords “dyadic OR dyad OR dyads.” This process returned 70 candidate studies for reanalysis.

Each study was then assessed to determine its susceptibility to dyadic clustering. Studies were excluded from reanalysis for three primary reasons: (1) the study did not actually analyze dyadic data; (2) dyadic observations primarily featured nested relationships between dyad members, for which single- or multi-way clustering of standard errors is sufficientFootnote
^{7}
; or (3) the dyadic observations featured a common dyad member across all observations, for example, a nominally dyadic dataset consisting entirely of U.S. bilateral trade flows. There were 22 studies not excluded by these criteria (see Supplementary Table A.2 for a list of included studies). For each of these studies, replication data were either publicly available or provided by the authors.

For each eligible study, models that featured a key explanatory variable (KEV) (Lall Reference Lall2016) fit to dyadic data were identified and reanalyzed. For this reanalysis, KEVs are defined as independent variables whose parameter estimates are directly referenced in the study, or otherwise clearly pertain to the study’s stated hypotheses. Control variables are not considered to be KEVs, even if discussed or directly referenced in the study. Specifically, a model was selected for reanalysis if: (1) the model was dyadic; (2) a KEV appeared in the model; and (3) the model was not relegated to an analysis explicitly denoted as a robustness check, sensitivity analysis, or supplementary analysis, unless one of these analyses was the only dyadic analysis in the study.

The final analytic sample consisted of 691 KEVs across 174 models from 22 studies.Footnote
^{8}
While many studies clustered standard errors on repeated dyads, none utilized a non-parametric DCR variance estimation strategy in conducting primary analyses. Only 24 models across three studies did not use any sort of robust or clustered standard error.Footnote
^{9}
Three studies employ at least one model that has fixed effects for both members of the dyad under consideration.Footnote
^{10}
Of the 22 studies, 20 perform analysis on state (country) dyads and 2 perform analysis on international organization (IO) dyads.

All models were replicated and then re-estimated using the previously discussed DCR variance estimator formulated in Aronow *et al.* (Reference Aronow, Samii and Assenova2015) and implemented using an original suite of functions and commands for R and Stata, respectively. To ensure the comparability of results, the replications and reanalyses of selected models were conducted using the statistical software of origin. For the purposes of our reanalysis, dyads are assumed to be undirected, as modeling dyadic clustering based on directed dyads would require a stronger assumption about independence across observations.Footnote
^{11}
Also, for the purposes of our primary reanalysis, there are no small-sample corrections made to our standard error estimates. Finite-sample corrections would inflate our standard error estimates to account for increased sampling variability, and thus might paint an overly pessimistic picture of the original literature.Footnote
^{12}

### 4.1 Results

To quantify the impact of neglecting dyadic clustering in prior empirical IR findings, we compare DCR re-estimated standard errors with non-DCR standard errors for all KEVs in all models. We compute a standard error ratio (SER) for all KEVs, which is the DCRSE divided by the standard error produced using the original variance estimation strategy.Footnote
^{13}
We also examine the precision of reanalyzed KEVs, with special attention paid to estimates that lose statistical significance at a conventional level (i.e., 5%).

The primary results of the reanalysis are presented as aggregated by year and subfield, as well as overall, in Table 5. The empirical distribution of SERs is presented in the histogram in Figure 1, and a breakdown of average SERs by study analyzed can be found in Supplementary Table A.2. The inverse “study frequency” weighted (ISFW) average of SERs across all KEVs from all studies is 1.74, and the ISFW proportion of KEVs that go from significant to insignificant at the 5% level across all studies is 0.22, where “study frequency” is the total number of KEVs for a given study appearing in the analytic sample.Footnote ^{14} Due to the sampling variability of the standard error estimators, we see a small but not zero proportion (0.05) of KEV estimates change from statistical insignificance to significance at the 5% level.

^{a} “SER” denotes an inverse “study frequency” weighted (ISFW) average of standard error ratios for a given level of aggregation.

^{b} “Sig.
$\rightarrow $
Sig.,” “Sig.
$\rightarrow $
Insig.,” “Insig.
$\rightarrow $
Sig.,” and “Insig.
$\rightarrow $
Insig.” denote ISFW proportions of *p*-values that change significance levels in these respective ways for a given level of aggregation.

We also examine which subfields suffer most from dyadic clustering. Table 5 shows that no subfield is immune to standard error inflation. All subfields in our sample have an average SER indicating standard error inflation of more than 50%, and all subfields see 19% or more of all of their estimates change from statistically significant to insignificant at the 5% level. The “IOs and International Law” average SER indicates that DCRSEs are on average more than twice the size of originally reported standard errors, and that 34% of estimates become insignificant with DCRSEs.

For individual studies, the average SER ranges from 0.90 to 4.16, as seen in Supplementary Table A.2. On average, KEVs from 18 of 22 studies are less precise after accounting for dyadic clustering. Only 7 of 22 studies do not lose statistical significance in any KEV estimates due to DCRSE re-estimation. Three studies see half or more of their KEVs become insignificant at the 5% level upon reanalysis. Figure 2 shows how the empirical distribution of KEV *p*-values from all reanalyzed studies shifts due to the application of DCRSEs. Figure 3 visualizes how the precisions of individual KEV estimates change with the application of DCRSEs across all reanalyzed studies, with the area of each plotted data point being proportional to its inverse study frequency weight.

## 5 Conclusion

Though the need to account for the complex dependencies in dyadic data has been noted by previous researchers, these recommendations have not been commonly applied in practice. We investigated the consequences of failing to account for dyadic clustering in previous empirical research by reanalyzing all quantitative dyadic analyses in *International Organization* published in a 6-year window, thereby revealing a lower bound on the severity of the consequences of inadequate clustering practices in previous quantitative dyadic research.

We find that the standard errors associated with KEVs in reanalyzed studies are approximately half of what they would have been if calculated as DCRSEs, but that two-thirds of statistically significant KEVs remain statistically significant when using DCRSEs. Failure to compute DCRSEs therefore does not appear to have led to systematically false substantive conclusions in recent empirical IR literature, but can lead to a systematically large overestimation of the precision of estimates. In short, dyadic clustering matters, yet is not so severe as to make statistical inference using dyadic data infeasible.

However, solely accounting for dyadic clustering may not go far enough. For any of the studies we reanalyze, dependencies may exist in the data that extend across non-incident dyads, for example, due to network effects, in which case the problem may be even more severe than what our reanalysis suggests. That said, DCRSEs may be of particular interest to researchers because dyadic clustering—the clustering structure associated with dependence across dyads that share a member—is a feature of many dyadic datasets of interest to social scientists. Accordingly, we offer software in R and Stata to facilitate future analyses robust to dyadic clustering. These open-source packages implement DCRSEs for all the models in the reanalysis sample (and more), and mirror syntax familiar to users of R and Stata.

## Acknowledgements

We would like to thank Austin Jang for excellent and extensive research assistance, as well as Jonathon Baron, Laurent Bergé, Forrest Crawford, Joshua Kalla, Winston Lin, Cleo O’Brien-Udry, Paul Goldsmith-Pinkham, Cyrus Samii, Beth Tipton, and three anonymous reviewers for helpful comments and conversations.

## Supplementary Material

For supplementary material accompanying this paper, please visit https://doi.org/10.1017/pan.2023.26.

## Data Availability Statement

Replication code for this article can be accessed via Dataverse (Carlson *et al.* Reference Carlson, Incerti and Aronow2023). The statistical programming suite for DCR estimation is also available. To access the source code for the dcr command for Stata (version 15 or higher), clone its GitHub repository:

To access the $\verb+dcr+ $ package for $\mathsf{R}$ , run: