We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The validity of conclusions drawn from specific research studies must be evaluated in light of the purposes for which the research was undertaken. We distinguish four general types of research: description and point estimation, correlation and prediction, causal inference, and explanation. For causal and explanatory research, internal validity is critical – the extent to which a causal relationship can be inferred from the results of variation in the independent and dependent variables of an experiment. Random assignment is discussed as the key to avoiding threats to internal validity. Internal validity is distinguished from construct validity (the relationship between a theoretical construct and the methods used to operationalize that concept) and external validity (the extent to which the results of a research study can be generalized to other contexts). Construct validity is discussed in terms of multiple operations and discriminant and convergent validity assessment. External validity is discussed in terms of replicability, robustness, and relevance of specific research findings.
The accumulation of empirical evidence that has been collected in multiple contexts, places, and times requires a more comprehensive understanding of empirical research than is typically required for interpreting the findings from individual studies. We advance a novel conceptual framework where causal mechanisms are central to characterizing social phenomena that transcend context, place, or time. We distinguish various concepts of external validity, all of which characterize the relationship between the effects produced by mechanisms in different settings. Approaches to evidence accumulation require careful consideration of cross-study features, including theoretical considerations that link constituent studies and measurement considerations about how phenomena are quantifed. Our main theoretical contribution is developing uniting principles that constitute the qualitative and quantitative assumptions that form the basis for a quantitative relationship between constituent studies. We then apply our framework to three approaches to studying general social phenomena: meta-analysis, replication, and extrapolation.
This chapter examines the generalizability of the book’s main argument. It synthesizes the conclusions of other studies on the consequences of three similar episodes of forced migration in the twentieth century: the Greek-Turkish population exchange, the Partition of India, and the repatriation of Pied-Noirs to France. It then considers ways in which the argument can be extended to other cases of forced and voluntary migration.
from
Part II
-
The Practice of Experimentation in Sociology
Davide Barrera, Università degli Studi di Torino, Italy,Klarita Gërxhani, Vrije Universiteit, Amsterdam,Bernhard Kittel, Universität Wien, Austria,Luis Miller, Institute of Public Goods and Policies, Spanish National Research Council,Tobias Wolbring, School of Business, Economics and Society at the Friedrich-Alexander-University Erlangen-Nürnberg
Field experiments have a long tradition in some areas of the social and behavioral sciences and have become increasingly popular in sociology. Field experiments are staged in "natural" research settings where individuals usually interact in everyday life and regularly complete the task under investigation. The implementation in the field is the core feature distinguishing the approach from laboratory experiments. It is also one of the major reasons why researchers use field experiments; they allow incorporating social context, investigating subjects under "natural" conditions, and collecting unobtrusive measures of behavior. However, these advantages of field experiments come at the price of reduced control. In contrast to the controlled setting of the laboratory, many factors can influence the outcome but are not under the experimenter’s control and are often hard to measure in the field. Using field experiments on the broken windows theory, the strengths and potential pitfalls of experimenting in the field are illustrated. The chapter also covers the nascent area of digital field experiments, which share key features with other types of experiments but offer exciting new ways to study social behavior by enabling the collection large-scale data with fine-grained and unobtrusive behavioral measures at relatively low variable costs.
Survey experiments on probability samples are a popular method for investigating population-level causal questions due to their strong internal validity. However, lower survey response rates and an increased reliance on online convenience samples raise questions about the generalizability of survey experiments. We examine this concern using data from a collection of 50 survey experiments which represent a wide range of social science studies. Recruitment for these studies employed a unique double sampling strategy that first obtains a sample of “eager” respondents and then employs much more aggressive recruitment methods with the goal of adding “reluctant” respondents to the sample in a second sampling wave. This approach substantially increases the number of reluctant respondents who participate and also allows for straightforward categorization of eager and reluctant survey respondents within each sample. We find no evidence that treatment effects for eager and reluctant respondents differ substantially. Within demographic categories often used for weighting surveys, there is also little evidence of response heterogeneity between eager and reluctant respondents. Our results suggest that social science findings based on survey experiments, even in the modern era of very low response rates, provide reasonable estimates of population average treatment effects among a deeper pool of survey respondents in a wide range of settings.
Researchers and practitioners are increasingly embracing systems approaches to deal with the complexity of public service delivery and policy evaluation. However, there is little agreement on what exactly constitutes a systems approach, conceptually or methodologically. We review and critically synthesize systems literature from the fields of health, education, and infrastructure. We argue that the common theoretical core of systems approaches is the idea that multi-dimensional complementarities between a policy and other aspects of the policy context are the first-order problem of policy design and evaluation. We distinguish between macro-systems approaches, which focus on the collective coherence of a set of policies or institutions, and micro-systems approaches, which focus on how a single policy interacts with the context in which it operates. We develop a typology of micro-systems approaches and discuss their relationship to standard impact evaluation methods as well as to work in external validity, implementation science, and complexity theory.
Peer review supports decisions related to publications, grant proposals, awards, or personnel selection. Independent of the specific occasion, we propose validity as a chief evaluation criterion for reviews. While applicable to all occasions, the principles of validity-oriented quality control are particularly suited for journal reviews. Beyond evaluating validity and the scientific potential of a given piece of research, we address how peer reviewing serves important functions and is accountable for the growth of science at a more superordinate level. We also provide guidelines and concrete recommendations for how a good peer review may serve these functions. Good peer review, thereby, fosters both the advancement of scientific research and the quality, precision, and sincerity of the scientific literature. The end of the chapter is devoted to a core set of good reviewer practices, conceived as an essential feature of academic culture.
There is growing concern about the extent to which economic games played in the laboratory generalize to social behaviors outside the lab. Here, we show that it is possible to make a game much more predictive of field behavior by bringing contextual elements from the field to the lab. We report three experiments where we present the same participants with different versions of the dictator game and with two different field situations. The games are designed to include elements that make them progressively more similar to the field. We find a dramatic increase in lab–field correlations as contextual elements are incorporated, which has wide-ranging implications for experiments on economic decision making.
Conventional models of voting behavior depict individuals who judge governments for how the world unfolds during their time in office. This phenomenon of retrospective voting requires that individuals integrate and appraise streams of performance information over time. Yet past experimental studies short-circuit this 'integration-appraisal' process. In this Element, we develop a new framework for studying retrospective voting and present eleven experiments building on that framework. Notably, when we allow integration and appraisal to unfold freely, we find little support for models of 'blind retrospection.' Although we observe clear recency bias, we find respondents who are quick to appraise and who make reasonable use of information cues. Critically, they regularly employ benchmarking strategies to manage complex, variable, and even confounded streams of performance information. The results highlight the importance of centering the integration-appraisal challenge in both theoretical models and experimental designs and begin to uncover the cognitive foundations of retrospective voting.
Oral rotavirus vaccine efficacy estimates from randomised controlled trials are highly variable across settings. Although the randomised study design increases the likelihood of internal validity of findings, results from trials may not always apply outside the context of the study due to differences between trial participants and the target population. Here, we used a weight-based method to transport results from a monovalent rotavirus vaccine clinical trial conducted in Malawi between 2005 and 2008 to a target population of all trial-eligible children in Malawi, represented by data from the 2015–2016 Malawi Demographic and Health Survey (DHS). We reweighted trial participants to reflect the population characteristics described by the Malawi DHS. Vaccine efficacy was estimated for 1008 trial participants after applying these weights such that they represented trial-eligible children in Malawi. We also conducted subgroup analyses to examine the heterogeneous treatment effects by stunting and tuberculosis vaccination status at enrolment. In the original trial, the estimates of one-year vaccine efficacy against severe rotavirus gastroenteritis and any-severity rotavirus gastroenteritis in Malawi were 49.2% (95% CI 15.6%–70.3%) and 32.1% (95% CI 2.5%–53.1%), respectively. After weighting trial participants to represent all trial-eligible children in Malawi, vaccine efficacy increased to 62.2% (95% CI 35.5%–79.0%) against severe rotavirus gastroenteritis and 38.9% (95% CI 11.4%–58.5%) against any-severity rotavirus gastroenteritis. Rotavirus vaccine efficacy may differ between trial participants and target populations when these two populations differ. Differences in tuberculosis vaccination status between the trial sample and DHS population contributed to varying trial and target population vaccine efficacy estimates.
Lakshmi Balachandran Nair, Libera Università Internazionale degli Studi Sociali Guido Carli, Italy,Michael Gibbert, Università della Svizzera Italiana, Switzerland,Bareerah Hafeez Hoorani, Radboud University Nijmegen, Institute for Management Research, The Netherlands
We discuss the single embedded case study design in this chapter. We deliberate how this design is different from multiple and single holistic designs in terms of the levels of analysis and the nature of replication. The selection rationale and sampling are discussed next. Afterwards, we move on to the longitudinal and/or cross-sectional single embedded designs. The strengths and the weaknesses of the design in terms of internal validity, external validity, and the number of variables are discussed subsequently. This chapter also discusses the (mis)conception regarding longitudinal designs and temporal embedded units.
Lakshmi Balachandran Nair, Libera Università Internazionale degli Studi Sociali Guido Carli, Italy,Michael Gibbert, Università della Svizzera Italiana, Switzerland,Bareerah Hafeez Hoorani, Radboud University Nijmegen, Institute for Management Research, The Netherlands
We introduce and define the single holistic case study design in this chapter. The strengths of the design are discussed in detail, with examples. In particular, we discuss the potential of single holistic design in providing a detailed explanation of processes. Single holistic case studies also explore the theorizing potential of unique cases which hold the potential to reveal new dimensions of a phenomenon or falsify/refute an existing theory. Relatively high data access, construct validity, potential to include an unlimited number of variables, etc., are some other strengths that we discuss. The weaknesses of the design (i.e. low internal and external validity) are discussed afterwards. The chapter also addresses some common (mis)conceptions regarding single holistic designs and their external validity.
Lakshmi Balachandran Nair, Libera Università Internazionale degli Studi Sociali Guido Carli, Italy,Michael Gibbert, Università della Svizzera Italiana, Switzerland,Bareerah Hafeez Hoorani, Radboud University Nijmegen, Institute for Management Research, The Netherlands
The final chapter in this book discusses some methodological considerations and debates surrounding case study research and its quality. In particular, we revisit the topic of research paradigms (i.e. positivism and interpretivism). Relatedly, we discuss different quality criteria as proposed by prior researchers from both paradigmatic camps. In particular, we focus on the rigor versus trustworthiness discussion and the internal versus external validity debate. Afterwards, we briefly discuss the iterative cycles of data collection and analysis one would encounter during a qualitative case study research process. We end the chapter (and subsequently the book) with a guiding framework which will help researchers in sequencing case study designs by acknowledging the weaknesses of individual designs and leveraging their strengths. The framework can be adopted and adapted to suit the specific research objectives of the study in hand.
Lakshmi Balachandran Nair, Libera Università Internazionale degli Studi Sociali Guido Carli, Italy,Michael Gibbert, Università della Svizzera Italiana, Switzerland,Bareerah Hafeez Hoorani, Radboud University Nijmegen, Institute for Management Research, The Netherlands
We discuss multiple case studies in this chapter. We start off with a discussion of theoretical sampling and replication logic. We specifically discuss literal and theoretical replication (LR and TR) in connection with multiple case studies. The strengths and limitations of LR and TR are discussed thereafter. In particular, we deliberate upon the potential of TR to enhance the internal and external validity of a case study. Henceforth, we address some common (mis)conceptions regarding replication logic, internal validity, external validity (generalizability), and reliability. We also discuss how multiple case studies might need to sacrifice the depth of observation for breadth. Other potential weaknesses, such as the smaller number of independent variables and the difficulty in controlling context, are also discussed thereafter.
Prototype faces, created by averaging faces from several individuals sharing a common characteristic (for example a certain personality trait), can be used for highly informative experimental designs in face research. Although the facial prototype method is both ingenious and useful, we argue that its implementation is associated with three major issues: lack of external validity and non-independence of the units of information, both aggravated by a lack of transparency regarding the methods used and their limitations. Here, we describe these limitations and illustrate our claims with a systematic review of studies creating facial stimuli using the prototypes dataset ‘Faceaurus’. We then propose some solutions that can eliminate or reduce these problems. We provide recommendations for future research employing this method on how to produce more generalisable and replicable results.
During military operations, soldiers are required to successfully complete numerous physical and cognitive tasks concurrently. Understanding the typical variance in research tools that may be used to provide insight into the interrelationship between physical and cognitive performance is therefore highly important. This study assessed the inter-day variability of two military-specific cognitive assessments: a Military-Specific Auditory N-Back Task (MSANT) and a Shoot-/Don’t-Shoot Task (SDST) in 28 participants. Limits of agreement ±95% confidence intervals, standard error of the mean, and smallest detectable change were calculated to quantify the typical variance in task performance. All parameters within the MSANT and SDST demonstrated no mean difference for trial visit in either the seated or walking condition, with equivalency demonstrated for the majority of comparisons. Collectively, these data provided an indication of the typical variance in MSANT and SDST performance, while demonstrating that both assessments can be used during seated and walking conditions.
Edited by
Ruth Kircher, Mercator European Research Centre on Multilingualism and Language Learning, and Fryske Akademy, Netherlands,Lena Zipp, Universität Zürich
This chapter provides a comprehensive overview of the verbal-guise technique (henceforth VGT), a variant of the speaker evaluation paradigm in which guises representing different language varieties are produced by different speakers, each speaking in their habitual language variety. First, key features of the VGT are discussed. Second, a brief historical sketch of the technique’s introduction, development, and proliferation in language attitudes research is offered. Third, key advantages (e.g. speaker authenticity, ease of implementation) and disadvantages (e.g. lack of full experimental control) of the technique are reviewed. Fourth, various practical considerations surrounding research planning and design are described and several recommendations are offered, including: (a) matching speakers on various demographic factors (e.g. sex, age), (b) matching speakers on extraneous vocal characteristics not of interest in the study (e.g. pitch), and (c) using multiple speakers to represent each variety of interest. Fifth, the main concerns surrounding the analysis and interpretation of data obtained using the VGT are discussed. Finally, a brief sketch is provided of the methodological considerations that were involved in designing a recent study utilising the VGT, which examined Americans’ attitudes toward standard American English and nine foreign accents: Arabic, Farsi, French, German, Hindi, Hispanic, Mandarin, Russian, and Vietnamese.
Woolcock focuses on the utility of qualitative case studies for addressing the decision-maker’s perennial external validity concern: What works there may not work here. He asks how to generate the facts that are important in determining whether an intervention can be scaled and replicated in a given setting. He focuses our attention on three categories: 1) causal density, 2) implementation capability, and 3) reasoned expectations about what can be achieved by when. Experiments are helpful for sorting out causally simple outcomes like the impact of deworming, but they are less insightful when there are many causal pathways, feedback loops, and exogenous influences. Nor do they help sort out the effect of mandate, management capacity, and supply chains, or the way results will materialize – whether some will materialize before others or increase or dissipate over time. Analytic case studies, Woolcock argues, are the main method available for assessing the generalizability of any given intervention.
Achen aims to correct what he perceives as an imbalance in favor of randomized controlled trials – experiments – within contemporary social science. “The argument for experiments depends critically on emphasizing the central challenge of observational work – accounting for unobserved confounders – while ignoring entirely the central challenge of experimentation – achieving external validity,” he writes. Using the mathematics behind randomized controlled trials to make his point, he shows that once this imbalance is corrected, we are closer to Cartwright’s view (Chapter 2) than to the current belief that RCTs constitute the gold standard for good policy research. Achen concludes: “Causal inference of any kind is just plain hard. If the evidence is observational, patient consideration of plausible counterarguments, followed by the assembling of relevant evidence, can be, and often is, a painstaking process.” Well-structured qualitative case studies are one important tool; experiments, another.
Chapter 3 focuses on how to think when evaluating experiments. This includes a discussion of realism, particularly why mundane realism or resemblance to the “real world” receives far too much attention, as well as an overview of how to design experimental treatments. The chapter then turns to validity issues, offering a new way to think about external validity in assessing experiments. This includes a detailed discussion of sampling and why the onus should be more on critics of an experimental sample than on the experimentalist him/herself (i.e., to justify a sample).