To send content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about sending content to .
To send content items to your Kindle, first ensure firstname.lastname@example.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about sending to your Kindle.
Note you can select to send to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The purpose of this chapter is to set the stage for the book and for the upcoming chapters. We first overview classical information-theoretic problems and solutions. We then discuss emerging applications of information-theoretic methods in various data-science problems and, where applicable, refer the reader to related chapters in the book. Throughout this chapter, we highlight the perspectives, tools, and methods that play important roles in classic information-theoretic paradigms and in emerging areas of data science. Table 1.1 provides a summary of the different topics covered in this chapter and highlights the different chapters that can be read as a follow-up to these topics.
The complex interaction between the visual and print culture is central to transitions in definitions and perceptions of Black personhood and mid-century African American literature. Marshaled by race science and criminology, and underwriting the emergence of the periodical as media form in the United States through its advertisements for fugitives and enslaved Africans for sale, visuality was imbricated with print. Autumn Womack approaches visuality in the Anglo-African Magazine through its statistical essays in order to contend that the magazine was brokering a transition from a text-based articulation of Black freedom to one figured in and through visuality. The “visual grammar” that resulted, she argues, presented Black freedom as accomplished and aspirational. Womack thus further elaborates the scholarly contention that Black American practitioners of visuality ranged beyond those using photography, such as Frederick Douglass and Sojourner Truth, to those debating the visual’s affordances in written texts, including Martin Delany, William J. Wilson, Ida B. Wells, Harriet Jacobs, Paul Laurence Dunbar, Louisa Picquet, W. E. B. Du Bois, and Booker T. Washington. She argues that the Anglo-African Magazine explored and reframed the conceptual and visual dimensions of Black freedom by pursuing what optic strategies and practices it both demanded and engendered.
Equipped with a vocabulary to do justice to the performative dimensions of social scientific knowledge production, this chapter revisits the controversial study of sentencing disparities introduced in Chapter 1. Here, I emphasize the performative effects of statistical forms of ordering and analysing judicial decision-making. What are the consequences of treating judicial cases as bundles of legal and social characteristics? How is the promise of “equality before the Law” operationalized? How is the law itself reconfigured as a result of this analysis? What kind of population – of cases, of individuals – does this analyses presuppose? Together, these questions bring us closer to zooming in on a crucial performativity of social-scientific accounts of judicial practices, especially those assisted by quantitative measurement and statistical analyses. Disaggregating cases into case- and defendant “factors” and portraying criminal law as a machine distributing justice over an internally stratified population, this approach produces a reality the judges do not recognize as their own. Hence, this chapter also comments on the methodological limitations of statistical analyses in the study of actually existing, concrete work practices, demonstrating how such methodological approach black-boxes judicial decision-making and disaggregates cases into “factors”.
Most theories and hypotheses in psychology are verbal in nature, yet their evaluation overwhelmingly relies on inferential statistical procedures. The validity of the move from qualitative to quantitative analysis depends on the verbal and statistical expressions of a hypothesis being closely aligned—that is, that the two must refer to roughly the same set of hypothetical observations. Here I argue that many applications of statistical inference in psychology fail to meet this basic condition. Focusing on the most widely used class of model in psychology—the linear mixed model—I explore the consequences of failing to statistically operationalize verbal hypotheses in a way that respects researchers' actual generalization intentions. I demonstrate that whereas the "random effect" formalism is used pervasively in psychology to model inter-subject variability, few researchers accord the same treatment to other variables they clearly intend to generalize over (e.g., stimuli, tasks, or research sites). The under-specification of random effects imposes far stronger constraints on the generalizability of results than most researchers appreciate. Ignoring these constraints can dramatically inflate false positive rates, and often leads researchers to draw sweeping verbal generalizations that lack a meaningful connection to the statistical quantities they are putatively based on. I argue that failure to take the alignment between verbal and statistical expressions seriously lies at the heart of many of psychology's ongoing problems (e.g., the replication crisis), and conclude with a discussion of several potential avenues for improvement.
Recent work in structure–processing relationships of polymer semiconductors have demonstrated the versatility and control of thin-film microstructure offered by meniscus-guided coating (MGC) techniques. Here, we analyze the qualitative and quantitative aspects of solution shearing, a model MGC method, using coating blades augmented with arrays of pillars. The pillars induce local regions of high strain rates—both shear and extensional—not otherwise possible with unmodified blades, and we use fluid mechanical simulations to model and study a variety of pillar spacings and densities. We then perform a statistical analysis of 130 simulation variables to find correlations with three dependent variables of interest: thin-film degree of crystallinity and transistor field-effect mobilities for charge-transport parallel (μpara) and perpendicular (μperp) to the coating direction. Our study suggests that simple fluid mechanical models can reproduce substantive correlations between the induced fluid flow and important performance metrics, providing a methodology for optimizing blade design.
In this chapter, students learn about the levels of measurement that social scientists engage in when collecting data. The most common system for conceptualizing quantitative data was developed by Stevens, who defined four levels of data, which are (in ascending order of complexity) nominal, ordinal, interval, and ratio-level data. Nominal data consist of mutually exclusive and exhaustive categories, which are then given an arbitrary number. Ordinal data have all of the qualities of nominal data, but the numbers in ordinal data also indicate rank order. Interval data are characterized by all the traits of nominal and ordinal data, but the spacing between numbers is equal across the entire length of the scale. Finally, ratio data are characterized by the presence of an absolute zero. Higher levels of data contain more information, although it is always possible to convert from one level of data to a lower level. It is not possible to convert data to a higher level than it was collected at. It is important to recognize the level of data because there are certain mathematical procedures that require certain levels of data. Social scientists who ignore the level of their data risk producing meaningless results or distorted statistics.
In this chapter we present an introduction to the vast subject of non-Gaussian perturbations. We mainly concentrate on the bispectrum and the trispectrum. We define some standard shapes of the bispectrum in Fourier space and translate them to angular space. For a description of arbitrary N-point function in the sky we introduce a basis of rotation-invariant functions on the sphere in Appendix 4. This chapter has been added in the second edition.
The second edition of Statistics for the Social Sciences prepares students from a wide range of disciplines to interpret and learn the statistical methods critical to their field of study. By using the General Linear Model (GLM), the author builds a foundation that enables students to see how statistical methods are interrelated enabling them to build on the basic skills. The author makes statistics relevant to students' varying majors by using fascinating real-life examples from the social sciences. Students who use this edition will benefit from clear explanations, warnings against common erroneous beliefs about statistics, and the latest developments in the philosophy, reporting, and practice of statistics in the social sciences. The textbook is packed with helpful pedagogical features including learning goals, guided practice, and reflection questions.
In this address, I examine the lexical, geographic, temporal and philosophical origins of two key concepts in modern political thought: colonies and statistics. Beginning with the Latin word colonia, I argue that the modern ideology of settler colonialism is anchored in the claim of “improvement” of both people and land via agrarian labour in John Locke's labour theory of property in seventeenth-century America, through which he sought to provide an ideological justification for both the assimilation and dispossession of Indigenous peoples. This same ideology of colonialism was turned inward a century later by Sir John Sinclair to justify domestic colonies on “waste” land in Scotland—specifically Caithness (the county within which my own grandparents were tenant farmers). Domestic colonialism understood as “improvement” of people (the “idle” poor and mentally ill and disabled) through engagement in agrarian labour on waste land inside explicitly named colonies within the borders of one's own country was first championed not only by Sinclair but also his famous correspondent, Jeremy Bentham, in England. Sinclair simultaneously coined the word statistics and was the first to use it in the English language. He defined it as the scientific gathering of mass survey data to shape state policies. Bentham embraced statistics as well. In both cases, statistics were developed and deployed to support their domestic colony schemes by creating a benchmark and roadmap for the improvement of people and land as well as a tool to measure the colony's capacity to achieve both over time. I conclude that settler colonialism along with the intertwined origins of domestic colonies and statistics have important implications for the study of political science in Canada, the history of colonialism as distinct from imperialism in modern political thought and the role played by intersecting colonialisms in the Canadian polity.
The basic biological principles of human development, history of growth studies as examples of social concerns related to health, education, politics and human “race” biology are all examined. Important technological and statistical developments to better measure and interpret growth and development are covered.
Much literature shows that the ratings assigned by wine judges are uncertain, some authors have proposed that judges be tested, and a few wine competitions do test judges. However, no literature or competition has yet proposed a test or rating for judges based on realistic competition conditions. This article uses coefficients of multiple correlation to rate each of 54 judges who assigned ratings to 2,811 wines entered in a commercial competition. Results show that there is a strong and positive correlation between the ratings assigned by most judges to most wines. However, those correlations also show that the ratings assigned by approximately 10% of judges are indistinguishable from random assignments. Using correlations to rate the raters, a program is underway to monitor those judges and variations in competition protocol that may affect their ratings. (JEL Classifications: A10, C00, C10, C12, D12)
We describe a method to estimate background noise in atom probe tomography (APT) mass spectra and to use this information to enhance both background correction and quantification. Our approach is mathematically general in form for any detector exhibiting Poisson noise with a fixed data acquisition time window, at voltages varying through the experiment. We show that this accurately estimates the background observed in real experiments. The method requires, as a minimum, the z-coordinate and mass-to-charge-state data as input and can be applied retrospectively. Further improvements are obtained with additional information such as acquisition voltage. Using this method allows for improved estimation of variance in the background, and more robust quantification, with quantified count limits at parts-per-million concentrations. To demonstrate applications, we show a simple peak detection implementation, which quantitatively suppresses false positives arising from random noise sources. We additionally quantify the detectability of 121-Sb in a standardized-doped Si microtip as (1.5 × 10−5, 3.8 × 10−5) atomic fraction, α = 0.95. This technique is applicable to all modes of APT data acquisition and is highly general in nature, ultimately allowing for improvements in analyzing low ionic count species in datasets.
In this work we address the problem of estimating the probabilities of causal contacts between civilizations in the Galaxy. We make no assumptions regarding the origin and evolution of intelligent life. We simply assume a network of causally connected nodes. These nodes refer somehow to intelligent agents with the capacity of receiving and emitting electromagnetic signals. Here we present a three-parametric statistical Monte Carlo model of the network in a simplified sketch of the Galaxy. Our goal, using Monte Carlo simulations, is to explore the parameter space and analyse the probabilities of causal contacts. We find that the odds to make a contact over decades of monitoring are low for most models, except for those of a galaxy densely populated with long-standing civilizations. We also find that the probability of causal contacts increases with the lifetime of civilizations more significantly than with the number of active civilizations. We show that the maximum probability of making a contact occurs when a civilization discovers the required communication technology.
To investigate the behavior of restricted mean survival time (RMST) and designs of a two-state Markov microsimulation model through a 2 × 4 × 2 full factorial experiment.
By projecting patient-wise 15-year-post-trial survival, we estimated life-year-gained between an intervention and a control group using data from the Cardiovascular Outcomes for People Using Anticoagulation Strategies Study (COMPASS). Projections considered either in-trial events or post-trial medications. They were compared based on three factors: (i) choice of probability of death, (ii) lengths of cycle, and (iii) usage of half-a-cycle age correction. Three-way analysis of variance and post-hoc Tukey's Honest Significant Difference test compared means among factors.
When both in-trial events and post-trial study medications were considered, monthly, quarterly, or semiannually were not different from one other in projected life-year-gained. However, the annual one was different from the others: mean and 95 percent confidence interval 252.2 (190.5–313.9) days monthly, 251.8 (192.0–311.6) quarterly, 249.1 (189.7–308.5) semiannually, and 240.8 (178.5–303.1) annually. The other two factors also impacted life-year-gained: background probability (269.1 [260.3–277.9] days projected with REACH-based-probabilities, 227.7 [212.6–242.8] with a USA life table); half-a-cycle age correction (245.5 [199.0–292] with correction and 251.4 [209.1–293.7] without correction). When not considering post-trial medications, only the choice of probability of death appeared to impact life-year-gained.
For a large trial or cohort, to optimally project life-year-gained, one should consider using (i) annual projections, (ii) life table probabilities, (iii) in-trial events, and (iv) post-trial medication use.
Most textbooks on regression focus on theory and the simplest of examples. Real statistical problems, however, are complex and subtle. This is not a book about the theory of regression. It is about using regression to solve real problems of comparison, estimation, prediction, and causal inference. Unlike other books, it focuses on practical issues such as sample size and missing data and a wide range of goals and techniques. It jumps right in to methods and computer code you can use immediately. Real examples, real stories from the authors' experience demonstrate what regression can do and its limitations, with practical advice for understanding assumptions and implementing methods for experiments and observational studies. They make a smooth transition to logistic regression and GLM. The emphasis is on computation in R and Stan rather than derivations, with code available online. Graphics and presentation aid understanding of the models and model fitting.
After a heyday in the 1970s and 1980s, probability sampling became much less visible in archaeological literature as it came under assault from the post-processual critique and the widespread adoption of “full-coverage survey.” After 1990, published discussion of probability sampling rarely strayed from sample-size issues in analyses of artifacts along with plant and animal remains, and most textbooks and archaeological training limited sampling to regional survey and did little to equip new generations of archaeologists with this critical aspect of research design. A review of the last 20 years of archaeological literature indicates a need for deeper and broader archaeological training in sampling; more precise usage of terms such as “sample”; use of randomization as a control in experimental design; and more attention to cluster sampling, stratified sampling, and nonspatial sampling in both training and research.
Back-projection is an epidemiological analysis method that was developed to estimate HIV incidence using surveillance data on AIDS diagnoses. It was used extensively during the 1990s for this purpose as well as in other epidemiological contexts. Surveillance data on COVID-19 diagnoses can be analysed by the method of back-projection using information about the probability distribution of the time between infection and diagnosis, which is primarily determined by the incubation period. This paper demonstrates the value of such analyses using daily diagnoses from Australia. It is shown how back-projection can be used to assess the pattern of COVID-19 infection incidence over time and to assess the impact of control measures by investigating their temporal association with changes in incidence patterns. For Australia, these analyses reveal that peak infection incidence coincided with the introduction of border closures and social distancing restrictions, while the introduction of subsequent social distancing measures coincided with a continuing decline in incidence to very low levels. These associations were not directly discernible from the daily diagnosis counts, which continued to increase after the first stage of control measures. It is estimated that a one week delay in peak incidence would have led to a fivefold increase in total infections. Furthermore, at the height of the outbreak, half to three-quarters of all infections remained undiagnosed. Automated data analytics of routinely collected surveillance data are a valuable monitoring tool for the COVID-19 pandemic and may be useful for calibrating transmission dynamics models.
What was the contribution of European integration to the economic history of Western Europe? Also on this issue, the EU often claims to have been both important and successful while, in fact, there is surprisingly little research on its economic effects. This chapter argues that the EC did indeed contribute to growing material prosperity in the member states during the Cold War. However, this contribution remained rather modest, at well below half of 1 per cent additional GDP growth per annum. The European Community had greater weight in relative terms during the 1970s and 1980s than during the 1950s and 1960s, even this has been generally overlooked to date. It thus played a greater role once the post-war boom was over, and, without it, the slump would have been even worse. Those aspects aside, the location of the economic within the integration process remained curiously vague during the Cold War. Economic integration was on the one hand an end in itself to promote prosperity; on the other it was always just a means to achieve overarching political objectives.
People often find statistics confusing because anecdotes more effectively tell stories and no one’s direct experience matches the statistical realities. The younger any individual is introduced to any drug the higher the risk of developing dependence. This is especially true for marijuana because it affects neurodevelopment in early adolescence. However, Horwood has shown than the lifetime rate of marijuana dependence does not accurately portray the overall progression of use because the majority of those who ever become dependent discontinue or reduce use sufficiently to no longer meet the DSM criteria for Cannabis Use Disorder (CUD). While 43% of those with onset of marijuana use at 13 years old meet criteria for CUD at some time by age 30, only 15% are dependent during the previous year at 30. The generally accepted rate of CUD for those 12 and older who have ever used marijuana is approximately 9%, compared to a 15% dependence rate for alcohol. The more frequently individuals use marijuana, the more they use on each occasion. The increased rates of marijuana use in Conduct Disorder (CD), Antisocial Personality Disorder (ASPD) and Attention Deficit Hyperactivity Disorder (jsADHD) are discussed.
Using engaging prose, Mary E. Harrington introduces neuroscience students to the principles of scientific research including selecting a topic, designing an experiment, analyzing data, and presenting research. This new third edition updates and clarifies the book's wealth of examples while maintaining the clear and effective practical advice of the previous editions. New and expanded topics in this edition include techniques such as optogenetics and conditional transgenes as well as a discussion of rigor and reproducibility in neuroscience research. Extended coverage of descriptive and inferential statistics arms readers with the analytical tools needed to interpret data. Throughout, practical guidelines are provided on avoiding experimental design problems, presenting research including creating posters and giving talks, and using a '12-step guide' to reading scientific journal articles.