To send content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about sending content to .
To send content items to your Kindle, first ensure firstname.lastname@example.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about sending to your Kindle.
Note you can select to send to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In this chapter, students learn about the levels of measurement that social scientists engage in when collecting data. The most common system for conceptualizing quantitative data was developed by Stevens, who defined four levels of data, which are (in ascending order of complexity) nominal, ordinal, interval, and ratio-level data. Nominal data consist of mutually exclusive and exhaustive categories, which are then given an arbitrary number. Ordinal data have all of the qualities of nominal data, but the numbers in ordinal data also indicate rank order. Interval data are characterized by all the traits of nominal and ordinal data, but the spacing between numbers is equal across the entire length of the scale. Finally, ratio data are characterized by the presence of an absolute zero. Higher levels of data contain more information, although it is always possible to convert from one level of data to a lower level. It is not possible to convert data to a higher level than it was collected at. It is important to recognize the level of data because there are certain mathematical procedures that require certain levels of data. Social scientists who ignore the level of their data risk producing meaningless results or distorted statistics.
The chapter on visual models discusses basic ways that scientists create visual representations of their data, including charts and graphs, in order to understand their data better. Like all models, visual models are a simplified version of reality. Two of the visual models discussed in this chapter are the frequency table and histogram. The histogram, in particular, is useful in the shape of the distribution of data, skewness, kurtosis, and the number of peaks. Other visual models in the social sciences include frequency polygons, bar graphs, stem-and-leaf plots, line graphs, pie charts, and scatterplots. All of these visual models help researchers understand their data in different ways, though none is perfect for all situations. Modern technology has resulted in the creation of new ways to visualize data. These methods are more complex, but they provide data analysts with new insights into their data. The incorporation of geographic data, animations, and interactive tools give people more options than ever existed in previous eras.
When the dependent variable consists of nominal data, it is necessary to conduct a χ2 test, of which there are two types in this chapter: the one-variable χ2 test and the two-variable χ2 test. The former procedure tests the null hypothesis that each group formed by the independent variable is equal to a hypothesized proportion. The two-variable χ2 test has the null hypothesis that the two variables are uncorrelated. Both procedures use the same eight steps as all NHSTs.
The effect sizes for χ2 tests are the odds ratio (for both χ2 tests) and the relative risk (for the two-variable χ2 test). When these effect sizes equal to 1.0, the outcome of interest is equally likely for both groups. When these effect sizes are greater than 1.0, the outcome of interest is more likely for the non-baseline group. When these values are less than 1.0, the outcome of interest is more likely for the baseline group. However, odds ratio and relative risk values are not interchangeable. When there are more than two groups or two outcomes, calculating an effect size requires either (1) calculating more than one odds ratio, or (2) combining groups together.
This chapter covers fundamental information that students must know in order to correctly conduct and interpret statistical analyses. The first section discusses why students in the social sciences need to learn statistics. The second section is a primer on the basics of research design, including the nature of research hypotheses and research questions, the difference between experimental and correlational research, and how descriptive statistics and inferential statistics serve different purposes. These foundational concepts are necessary to understand the rest of the textbook.
The final section of the chapter discusses the essential characteristics of models. Every statistical procedure creates a model of the data. Models are simplified versions of the world that make reality easier to understand. Fundamentally, all models are wrong, but the goal of scientists is to create models that are useful in explaining processes, making predictions, and building understanding of phenomena. The lesson distinguishes between theories, theoretical models, statistical models, and visual models so that students are equipped to deal with these concepts in later chapters.
This chapter serves as a guide to common advanced statistical methods: multiple regression, two-way and three-way analysis of variance, logistic regression, multiple logistic regression, Spearman’s rho correlation, Wilcoxon rank-sum test, and the Kruskal-Wallis test. Each of these is explanations is accompanied by a software guide to show how to conduct these procedures and interpret the results. There is also a brief description of common multivariate procedures.
Chapter 5 teaches how data analysts can change the scale of a distribution by performing a linear transformation, which is the process of adding, subtracting, multiplying, or dividing the data by a constant. Adding and subtracting a constant will change the mean of a variable, but not its standard deviation or variance. Multiplying and dividing by a constant will change the mean, the standard deviation, and the variance of a dataset. A table shows students shows how linear transformations change the values of models of central tendency and variability. One special linear transformation is the z-score. All z-score values have a mean of 0 and a standard deviation of 1. Putting datasets on a common scale permits comparisons across different units. But linear transformations, like the z-score transformation, force the data to have the desired mean and standard deviation. Yet, they do not change the shape of the distribution – only its scale. Indeed, all scales are arbitrary, and scientists can use linear transformations to give their data any mean and standard deviation they choose.
A one-sample t-test is an NHST procedure that is appropriate when a z-test cannot be performed because the population standard deviation is unknown. The one-sample t-test follows all of the eight steps of the z-test, but requires modifications to accommodate the unknown sample standard deviation. First, the formulas that used σy now use the estimated population standard deviation based on sample data instead. Second, degrees of freedom must be calculated. Finally, t-tests use a new probability distribution called a t-distribution.
This chapter also explains more about p-values. First, when p is lower than α, the null hypothesis is always rejected. Second, when p is higher than α, the null hypothesis is always retained. Therefore, we can determine whether p is smaller or larger than α by determining whether the null hypothesis was retained or rejected for α. This chapter also discusses confidence intervals (CIs), which are a range of plausible values for a population parameter. CIs can vary in width, which the researcher chooses. The 95% CI width is most common in social science research. Finally, one-sample t-tests can be used to test the hypothesis that the sample mean is equal to any value (not the population mean).
A correlation coefficient can be used to make predictions of dependent variable values using a procedure called linear regression. There are two equations that can be used to perform regression: the standardized regression equation and the unstandardized regression equation. Both regression equations produce a straight line that represents the predicted value on the dependent variable for a sample member with a given X variable score.
One statistical phenomenon to be aware of when making predictions is regression towards the mean, which occurs when a predicted dependent variable value that is closer to the mean of the dependent variable than the person’s score on the independent variable was to the mean of the independent variable. This means that outliers and rare events can be difficult or impossible to predict via the regression equations.
There are important assumptions of Pearson’s r and regression: (1) a linear relationship between variables, (2) homogeneity of residuals, (3) an absence of a restriction of range, (4) a lack of outliers/extreme values that distort the relationship between variables, (5) subgroups within the sample are equivalent, and (6) interval- or ratio-level data for both variables. Violating any of these assumptions can distort the correlation coefficient.