Skip to main content Accessibility help
×
Hostname: page-component-7479d7b7d-k7p5g Total loading time: 0 Render date: 2024-07-11T18:18:14.815Z Has data issue: false hasContentIssue false

Appendix C - Multilevel Regression with Poststratification and Estimating State and District Ambient Temperature

from Appendices

Published online by Cambridge University Press:  18 November 2021

Katrina F. McNally
Affiliation:
Eckerd College, Florida

Summary

Type
Chapter
Information
Representing the Disadvantaged
Group Interests and Legislator Reputation in US Congress
, pp. 246 - 250
Publisher: Cambridge University Press
Print publication year: 2021
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NC
This content is Open Access and distributed under the terms of the Creative Commons Attribution licence CC-BY-NC 4.0 https://creativecommons.org/cclicenses/

Appendix C Multilevel Regression with Poststratification and Estimating State and District Ambient Temperature

Multilevel regression with poststratification (MRP) is a technique that uses multilevel modeling and Bayesian statistics to generate estimates that are a function of both demographic and geographic characteristics (Reference PopkinPark, Gelman, and Bafumi, 2004; Reference Lee and OppenheimerLax and Phillips, 2009; Reference WawroWarshaw and Rodden, 2012). This method combines demographic and public opinion data to create predictions for small subsets of the population, which are then weighted by subgroup population within a geographic area and summed for all subgroups within that area (in this case, a congressional district.) For data with an inherently hierarchical structure (as is the case for individuals within districts that are within states), multilevel models have an advantage over classical regression models. Classical regression models use either complete pooling data to generate effects (as when no district or state effects are taken into account) or no pooling (as when models include fixed effects for a respondent’s state or district). Multilevel regression models allow for data to be partially pooled to a degree dictated by the data, based upon group sample size and variation. These models thus allow for the effects of demographics to vary by geography, while also pulling the estimates for states or districts with limited numbers of observations or high variance toward the mean, and allowing estimates for states and districts with more robust samples and tighter variances to be more influenced by district-specific effects.

MRP generated estimates of public opinion outperform both disaggregated means and presidential vote share measures at the state-, congressional district-, and state senate district-levels, producing estimates that are more correlated with population means, have smaller errors, and are more reliable (Reference Lee and OppenheimerLax and Phillips, 2009; Reference WawroWarshaw and Rodden, 2012). These differences are even more apparent with the smaller sample sizes (2,500 for congressional districts) common to most national surveys. MRP estimates are also far less subject to bias than disaggregated means. Disaggregating from nationally (rather than district or state) representative samples can result in biased predictions. MRP avoids this pitfall because all estimates are weighted according to the percentage of a state or district that any particular subgroup makes up. Additionally, nonresponse bias is less likely to influence within-group estimates for MRP relative to disaggregation because of the effects of partial pooling (Reference Lee and OppenheimerLax and Phillips, 2009).

Reference Buttice and HightonButtice and Highton (2013) find that MRP is most effective as an estimator when higher-level variables (in this case, state or district) are strongly predictive of the concept of interest, and when there is a high level of geographic variation in the quantity being estimated.Footnote 1 To ensure the greatest level of validity and reliability in my estimates, I include a number of state- and district-level predictors with a clear theoretical tie to expected levels of warmth or hostility toward the selected disadvantaged groups. I also have a clear expectation that due to geographically driven district heterogeneity and distinct state and district cultures, inter-district variability should be high.

Data

To model individual responses, I use the ANES aggregated time-series data from 1992 to 2016. This data set is intended to be nationally representative, and has a total of 24,122 observations. Given the sampling technique and relatively small sample size (relative to the CCES or the NAES), MRP is the best estimator for generating unbiased and reliable measures of district opinion. To account for over-time changes in district lines and public opinion, I model each decade separately, with 9,085 observations for the 1990s; 5,006 observations for the 2000s; and 10,031 observations for the 2010s. Feeling thermometer estimates are generated for each group in each of the three decades.

In each of these models, the dependent variable is the group feeling thermometer score. The individual-level predictor variables in each of these models includes a respondent’s gender (two categories: male, female),Footnote 2 race/ethnicity (four categories: white, Black, Hispanic, other), education (five categories: less than high school completion, completed high school, some college, college graduate, graduate school), state, and congressional district. Additionally, district-level predictors (average income, percent urban, percent military, same-sex couples, percent Hispanic, and percent African American) and state-level predictors (region, percent union, and percent Evangelical or Mormon) were obtained using decennial US Census data, as well as data from the US Religion Census. Survey year is also included to account for any variation in context or questions.

Model

I generate estimates of district hostility by modeling individual responses as a function of individual-level demographic characteristics as well as district- and state-level predictors. I model this as a multilevel linear regression equation, using the lmer package in R.Footnote 3 The structure of the model estimating individual feelings toward the poor is given by the following:

yift poor=γ0+αr[i]race+αf[i]female+αe[i]educ+αy[i]year+αd[i]districtαrrace ~ N(0, σr2), for r=1, 2, 3, 4αffemale ~ N(0, σf2)αeeduc ~ N(0, σe2), for e= 1, 2, 3, 4, 5αpyear ~ N(0, σy2), for p= 1, 2(1)

The random effects across each level of these individual predictors (e.g., all five categories of education) are modeled.Footnote 4 These effects are expected to be normally distributed with a mean of 0, and a variance determined by the data. Both the district- and state-levels model random effects for each district and state (respectively) in the dataset as well as fixed effects for the other relevant predictors, while random effects are modeled for each of the four region categories:Footnote 5

αddistrict ~ N(ks[d]state+γinc*incomed+γurban*urband+γmil*militaryd+ γhisp*hispanicd+γblack*blackd, σdistrict2), for d= 1,, 435
αsstate ~ N(αz[s]region+βunion*unions+βrelig*religions,σstate2), for s=1,,50
αZregion ~ N(0, σregion2), for z= 1, 2, 3, 4

Poststratification

This model is then used to generate district hostility estimates for the average member of each of 17,400 subgroups. Each of these subgroups represents a unique combination of demographic categories by which the sample is weighted: race (4), gender (2),Footnote 6 education (5), and congressional district (435).Footnote 7 Once predictions for average feeling thermometer scores are generated for each of these subgroups (from white men with less than a high school education in the first district of Alabama to non-white, Black, or Hispanic women with a graduate education in the large district of Wyoming), these estimates are then weighted according to the proportion of a district that is composed of members of these subgroups, and summed across districts.

Formally, weighted district opinion estimates are obtained using this method:

ydistrict=cdNcθccdNc(2)

where c represents each of the forty demographic subcategories (race, gender, and education) within d, a given congressional district, θc is the prediction associated with each subcategory, and Nc is the frequency of individuals within a district that belong to a demographic subcategory. To weight my estimates, I use the calculated frequency proportions for each demographic category in each state or district. A summary of the estimates generated is given in Table 4.1, and graphical illustrations of each of the estimates produced are given in Figure 4.1.

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×