Impact Statement
What people think about climate change and climate policy strongly depends on their politics. To better understand the role played by ideology in shaping people’s attitudes, we assess people’s ideological thinking by analyzing the ways in which they explain their stance in openended survey responses. We develop new ways of measuring how topics are talked about together. Independent of what respondents discussed, we find people elaborating on opposition to the carbon tax did so in a more focused and structured way than did those who support a carbon tax. This suggests public opposition to the tax could be more easily galvanized politically, while also suggesting the design of more effective climate policy and communications.
1. Introduction
Personal values and prior beliefs can have a greater effect than immediate reward and direct evidence in shaping how individuals interpret the world and how they respond when prompted to opine about specific aspects of it. One helpful construct for understanding this cognitive and collective process is ideology. As a system of interconnected beliefs, ideology is invoked to explain the role of political partisanship and the causes of political polarization (Baldassarri and Gelman, Reference Baldassarri and Gelman2008). However, how to analyze ideology as a correlation structure on beliefs is a methodological challenge yet unresolved (Kalmoe, Reference Kalmoe2020). In this study, we propose an approach to infer properties of a latent ideology from openended survey responses, and ask to what extent these properties reveal differences in the way opposing groups rationalize their positions on carbon tax policy. Given the welldocumented antitax rhetoric of conservative political discourse (Lakoff, Reference Lakoff2010), we expected to observe differences between supporters and opponents of carbon tax policy.
There are many societal challenges for which methods that reveal ideology would be useful. A prime example is mitigating the worst effects of climate change. Climate change mitigation is in many countries a hotly contested political issue that is characterized by extensive ideological and partisan conflict (McCright and Dunlap, Reference McCright and Dunlap2011; McCright et al., Reference McCright, Dunlap and MarquartPyatt2016; Birch, Reference Birch2020). Addressing this challenge requires significant government intervention to shift energy production and consumption away from CO_{2}emitting fossil fuels (McGlade and Ekins, Reference McGlade and Ekins2015; Welsby et al., Reference Welsby, Price, Pye and Ekins2021). In this context, a price on carbon emissions is often proposed by economists as being simple, flexible, and easy to put on a rising schedule, encouraging emissions reductions across economic sectors at a low aggregate cost (Baranzini et al., Reference Baranzini, Goldemberg and Speck2000; Aldy and Stavins, Reference Aldy and Stavins2012). Policy design has addressed public resistance to carbon tax policy measures via the targeted use of tax revenue in the form of lump sum rebates to taxpayers (Klenert et al., Reference Klenert, Mattauch, Combet, Edenhofer, Hepburn, Rafaty and Stern2018). Despite these mechanisms, however, a lack of public acceptability remains a major obstacle to the adoption of substantive carbon taxes across countries (Lachapelle, Reference Lachapelle2017; Haites, Reference Haites2018; Rabe, Reference Rabe2018).
While mediated by various factors (Davidovic et al., Reference Davidovic, Harring and Jagers2020), public support for carbon pricing is typically driven by ideology, with conservatives on the right of the political spectrum generally in opposition, and those on the left generally supporting the policy (Drews and Van, Reference Drews and Van den Bergh2016). This structuring of carbon tax policy attitudes by ideology is evident even in countries that have implemented taxandrebate schemes designed to build public support by providing rational, monetary incentives that offset the policy’s costs to consumers. For instance, at current pricing levels in Canada (one of few countries that have implemented a taxandrebate scheme), 8 of 10 households are estimated to receive more than they pay as a result of the policy, and yet roughly 8 out of 10 surveyed conservatives oppose the policy (Mildenberger et al., Reference Mildenberger, Lachapelle, Harrison and StadelmannSteffen2022). Mildenberger et al. (Reference Mildenberger, Lachapelle, Harrison and StadelmannSteffen2022) probe that result using interventions and find conservatives underestimate the size of the rebate they receive. Such behavior is not necessarily irrational: strong priors can make rational decisionmaking insensitive to new data (Gershman, Reference Gershman2019). Even after being provided with information on the size of their rebate, individuals may continue to believe that they are net losers. One explanation for these results is that a broader system of values—an ideology—underlies subjects’ decisionmaking. This putative belief system would then also likely manifest when subjects justify their opposition to the tax. A better understanding of how ideology underlies carbon tax opinion would explain the effectiveness of opposition campaigns centered around it and could also inform the design of more effective carbon pricing policy in general, perhaps via more effective communication of the policy’s benefits. This might involve first identifying, then appealing to (and in some instances directly challenging) issue frames commonly associated with carbon tax policy beliefs.
Issue frames, when written out, can be partially quantified by the semantic content of the writing. Quantitative semantic representations typically focus on the frequency of worduse.Footnote ^{1} However, singleword frequency measures do not expose how the same words can be used when talking about different things. Other widely used approaches, such as sentiment analysis, classify responses into only a few affective classes (“like”/“dislike”). By formulating a rich latent topic structure, topic models address both of these limitations. Topic models are now an established approach to understanding humangenerated responses in the social sciences (Grimmer et al., Reference Grimmer, Roberts and Stewart2022). The Structural Topic Model in particular has been applied to understand openended responses on a carbon tax in Spain (Savin et al., Reference Savin, Drews, MaestreAndrés and van den Bergh2020), Norway (Tvinnereim et al., Reference Tvinnereim, Fløttum, Gjerstad, Johannesson and Nordø2017), and the US (Povitkina et al., Reference Povitkina, Jagers, Matti and Martinsson2021). We were motivated to make a similar application to data collected in Canada. Unlike these previous works, however, we chose to focus on the correlated statistics of the weights placed on different topics as a means to interrogate ideology.
Our contribution is a set of measures and analyses that characterize latent topic mixture structure as a means of inferring ideology from openended survey responses. We also contribute the results of applying our approach to carbon tax opinion in Canada. To further motivate this application, we hypothesized that opposition to carbon taxes arises from a wellworn “tax is bad” ideology, involving a handful of correlated ideas (e.g., “distrust in government,” “unfairness,” “infringement on personal freedom”) that mutually enforce each other (Lakoff, Reference Lakoff2010). Here, we use the Canadian data set collected by (Mildenberger et al., Reference Mildenberger, Lachapelle, Harrison and StadelmannSteffen2022), unique in the richness of its metadata, to fit the parameters of a generative bagofwords model of word responses having a latent structure of topics. We fit models to different response types: the subset of responses of those who stated separately that they support the tax, of those who stated they opposed the tax and of those two groups combined. After validating the fitted models, we use it to infer topic structure conditioned on the support and oppose response groups. We focus on the ways in which respondents mix topics when supporting or opposing the carbon tax. We not only find that responses are highly discriminable using these topic mixtures, but that there are clear differences in the topic mixture structure of the two response types that support our hypothesis.
2. Methods
2.1. Dataset
We analyzed a dataset of $ 3313 $ openended survey responses from respondents living in Canada’s four largest provinces (Ontario, Quebec, Alberta, and British Columbia) as published in Mildenberger et al. (Reference Mildenberger, Lachapelle, Harrison and StadelmannSteffen2022). In addition to asking respondents for a categorical opinion response in support/opposition/not sure (response type), demographic survey data were collected such as age, gender, and province. Lifestyle data (car use, residence environment), and belief data (partisanship [measured using party voting] and political ideology) were also collected. Ideology was measured with a unidimensional scale asking respondents, “In politics, people sometimes talk of left and right. Where would you place yourself on the following scale?” Response options ranged from 0 to 10 with extreme values labeled “Far to the left” (0) and “Far to the right” (10). The midpoint was also labeled as “Centre” (5). In line with research documenting a general trend in partisan dealignment (Dalton and Wattenberg, Reference Dalton and Wattenberg2002), partisanship was measured with a vote choice question asking respondents, “If a federal election were held today, for which party would you vote for?” Reflecting the multiparty nature of Canadian politics, response options included items for each of Canada’s main federal political parties: Liberal, Conservative, New Democratic Party, People’s Party, Greens, and Bloc Québécois (for respondents in Quebec). An option was also included for “I would not vote.” In the analysis, we recorded Ideology and Partisanship into three categories each. See Mildenberger et al. (Reference Mildenberger, Lachapelle, Harrison and StadelmannSteffen2022) for more details on this rich metadata and see Supplementary Material for the specific reductions of categories that we used. In addition to categorical responses, a central openended question in the survey asked respondents to elaborate on why they chose the categorical response support/oppose/not sure. For this analysis, we preprocessed these openended responses. French responses were first translated into English using the Google Translate API. The response corpus was preprocessed through automated spellchecking, removal of stop words, and reduction of word stems. Words were tokenized: each word in the vocabulary was assigned an index and each response was transformed into a vector representation of word counts in this vocabulary.
2.2. Word relevance
We used a standard termrelevance measure that combines term frequency (as a measure of term relevance) and inverse document frequency (as a measure of term specificity):

• term frequency, $ \mathrm{tf}\left(v,d\right) $ . The frequency of a vocabulary word $ v $ in a document $ d $ is $ \mathrm{tf}\left(v,d\right):= \frac{n\left(v,d\right)}{n(d)}, $ where $ n\left(v,d\right) $ is the number of times the term $ v $ is in the document $ d $ and $ n(d)={\sum}_{v\in \mathcal{V}}\;n\left(v,d\right) $ is the total number of words in the document $ d $ . The term frequency over all the documents is then,
$$ \mathrm{tf}(v):= \frac{\sum_{d=1}^Dn(d)\mathrm{tf}\left(v,d\right)}{N_{\mathrm{total}}}, $$where the denominator $ {N}_{\mathrm{total}}={\sum}_{d=1}^D\;n(d) $ is just the total word count across all $ D $ considered documents.

• inverse document frequency, $ \mathrm{idf}(v) $ . Here,
$$ \mathrm{idf}(v)=\frac{\log \left(D+1\right)}{\log \left(n(v)+1\right)+1}, $$where $ n(v) $ is the number of documents in which the term $ v $ appears, that is, $ n\left(v,d\right)>0 $ . Idf is like an inverse document frequency.
Term frequencyinverse document frequency (Tfidf) for a worddocument pair is then defined simply as the product, $ \mathrm{Tfidf}\left(v,d\right)\hskip0.35em := \hskip0.35em \mathrm{tf}\left(v,d\right)\mathrm{idf}(v) $ . This is then averaged over documents to create a wordonly measure. We used the sklearn package implementation of Tfidf that averages the normalized values to remove dependence on the length of a document
where $ \left\Vert \mathbf{x}\right\Vert =\sqrt{\sum_{i=1}^N{x}_i^2} $ is the Euclidean norm (throughout the paper, we denote vectors with boldface symbols). As a word relevance metric, $ \mathrm{Tfidf} $ adds discriminability than using frequency alone by downweighting words that appear in many documents, since these common words are less discriminative. We computed word clouds and rank histograms using the $ \mathrm{Tfidf} $ values.
2.3. Responsetype classification
We aimed to predict the response type label (oppose/support) for each response, first using the wordlevel representation (i.e., which words appear and how many times), then using the inferred topic mixture weights obtained from specific (e.g., $ K $ dependent) fitted STMs on the combined dataset of support and oppose responses (see the following sections for details on the topic model). As preprocessing, we applied maximum absolute value rescaling to handle sparsity in the data for the wordlevel representation, and standard rescaling in the topic mixture case. In both cases, we then performed logistic regression. We then ranked features (words or topic weight vector components, respectively) by their weighted effect size and then performed logistic regression trained a classifier for each subset of ranked features of increasing number, $ n $ . For each $ n $ , we ran 100 trials of a 2:1 train/test split and report the mean and variance of classification accuracy on the test set.
2.4. The Structural Topic Model
Topic models are generative models that generate a set of words in a response document according to some distribution. Topic models are typically bag of word models, which eschew grammar and syntax to focus only on the content and prevalence of words (i.e., sampling distributions do not condition on previously sampled words in a response). We select a topic model with rich latent structure: the Structural Topic Model (STM) (Roberts et al., Reference Roberts, Stewart, Tingley, Lucas, Leder, Gadarian, Albertson and Rand2014). Like the Correlated Topic Model (Blei and Lafferty, Reference Blei and Lafferty2005), the STM uses a logistic normal distribution from which to sample the topic mixture weights on a document and can thereby exhibit arbitrary topictopic covariance via the covariance parameter matrix of the logistic normal distribution. Unlike the Correlated Topic Model in which the mean and covariance matrix of the logistic normal distribution are learned parameters, the STM allows for parametrizing these using a linear model of the respondents’ metadata. We discuss our choice of metadata model below. In the standard implementation of STM that we use here (Figure 1), only the mean of the logistic normal is made dependent on the metadata. This adds flexibility beyond that offered by the CTM and makes the STM appropriate for datasets with nontrivial latent topic structure, as we expect here. Specifically, our use of the STM model specifies the parameter tuple $ \boldsymbol{\Theta} =\left({\left\{{\boldsymbol{t}}_k\right\}}_{k=1}^K,\Sigma, \Gamma \right) $ as follows (Figure 1):

• Topic content: an underlying set of $ K $ topics indexed by $ k $ , $ {\left\{{\boldsymbol{t}}_k\right\}}_{k=1}^K $ , where each topic, $ {\boldsymbol{t}}_k=\left({\beta}_{k,1},\dots, {\beta}_{k,\mid \mathcal{V}\mid}\right) $ , is a list of word frequencies on a given vocabulary $ \mathcal{V} $ indexed by $ v $ ( $ {\sum}_{v=1}^{\mid \mathcal{V}\mid}\;{\beta}_{k,v}=1 $ for all $ k $ ), and

• Topic prevalence: a prior distribution $ P\left(\boldsymbol{\theta} \boldsymbol{x}\right) $ on the topic mixture vector, $ \boldsymbol{\theta} =\left({\theta}_1,\dots, {\theta}_K\right) $ (weights are normalized, $ {\sum}_{k=1}^K{\theta}_k=1 $ ) that is conditioned on the metadata class, $ \boldsymbol{x} $ . Here, the distribution is chosen as a Logistic Normal with covariance matrix parameter $ \Sigma $ assumed to be the same across metadata classes, and mean $ \boldsymbol{\mu} =\Gamma \boldsymbol{x} $ is a linear transformation of $ \boldsymbol{x} $ with transformation matrix parameter $ \Gamma $ .
A single response sample for a given response document of length $ N $ words that comes from a respondent in metadata class $ \boldsymbol{x} $ is generated by sampling a topic mixture weight vector $ \boldsymbol{\theta} $ once from the weight vector distribution $ P\left(\boldsymbol{\theta} \boldsymbol{x}\right) $ , then iteratively sampling a topic index, $ {z}_n\in \left\{1,\dots, K\right\} $ , from $ {\mathrm{Categorical}}_K\left(\boldsymbol{\theta} \right) $ and a word, $ {w}_n\in \mathcal{V} $ from the distribution formed from that topic’s frequencies, $ {\mathrm{Categorical}}_{\mid \mathcal{V}\mid}\left({\boldsymbol{t}}_{k={z}_n}\right) $ , $ N $ times to make the response $ \left\{{w}_1,\dots, {w}_N\right\} $ , with $ n=1,\dots, N $ . Topic frequencies are represented in log space according to the Sparse Additive Generative text model (Eisenstein et al., Reference Eisenstein, Ahmed and Xing2011). This topic content model allows for metadata dependence. However, again confronted with the challenging model selection problem, and without strong hypotheses about how demographicbiased word use affects topic correlations (our primary interest), we chose to leave out this dependence.
2.5. STM model fitting
We built a Python interface for a wellestablished and computationally efficient STM inference suite written in the R programming language (Roberts et al., Reference Roberts, Stewart and Tingley2019). This STM package has a highly optimized parameter learning approach that largely overcomes the inefficiency of not using a conjugate prior.Footnote ^{2} In particular, STM uses variational expectation maximization, with the expectation made tractable through a Laplace approximation. Accuracy and performance of the method are improved by integrating out the wordlevel topic and through a spectral initialization procedure. For details, see (Roberts et al., Reference Roberts, Stewart, Tingley, Lucas, Leder, Gadarian, Albertson and Rand2014). The Python code that we developed to access this package is publicly accessible with the rest of the code we used for this paper in the associated GitHub repository.
The parameter fitting performed by this STM package uses a prior on its covariance matrix parameter $ \Sigma $ of the topic mixture weight prior that specifies a prior on topic correlations: $ \sigma \in \left[0,1\right] $ giving a uniform prior for $ \sigma =0 $ and an independence (.e. diagonal) prior for $ \sigma =1 $ (see the package documentation for more details on $ \sigma $ ). Unless otherwise stated, we used the default value of $ \sigma =0.6 $ . Note that our main analysis of statistical measures does not use $ \Sigma $ . The package also offers priors on $ \Gamma $ . We used the recommended default “Pooled” option. This uses a zerocenter normal prior for each element, with a HalfCauchy(1,1) prior on its variance.
We fit three model types based on what dataset they are trained on. Here, a dataset $ \mathcal{D}={\left\{\left({r}_d,{\boldsymbol{w}}_d,{\boldsymbol{x}}_d\right)\right\}}_{d=1}^D $ is a set of (categorical response, openended response, and respondent metadata) tuples. We fit one model to each of the support and oppose respondent data subsets, $ {\mathcal{D}}_{\mathrm{support}} $ and $ {\mathcal{D}}_{\mathrm{oppose}} $ , as well as one to a combined dataset $ {\mathcal{D}}_{\mathrm{combined}}={\mathcal{D}}_{\mathrm{support}}\cup {\mathcal{D}}_{\mathrm{oppose}} $ that joins these two subsets of tuples into one. The model parameters obtained from maximum likelihood fitting to a dataset are denoted $ \hat{\boldsymbol{\Theta}} $ . As with other topic models, an STM takes the number of topics, $ K $ , as a fixed parameter (on which most quantities depend so we omit it to simplify notation). We thus obtained three $ K $ dependent families of model parameter settings $ {\hat{\boldsymbol{\Theta}}}_m $ , for $ m\in \left\{\mathrm{support},\mathrm{oppose},\mathrm{combined}\right\} $ . With model parameters in hand, we obtained our object of primary interest: the models’ topic mixture posterior distributions conditioned on response type $ {r}^{\prime}\in \left\{\mathrm{support},\mathrm{oppose}\right\} $ ,
where we have substituted the Logistic Normal distribution for $ {P}_m\left(\boldsymbol{\theta} \boldsymbol{x}\right) $ and use the empirical average over $ \boldsymbol{x} $ . To compare the ideologies behind support and oppose responses, we make the two comparisons from evaluating equation (1):
Since we are interested in topic correlations, we do not need to derive effect sizes for particular metadata covariates, which in general depend on the particular metadata model, here given by which elements of $ \Gamma $ in $ \mu =\Gamma \boldsymbol{x} $ are nonzero and their sign. Consequently, we can sidestep the challenging metadata model selection problem (Wysocki et al., Reference Wysocki, Lawson and Rhemtulla2022) by simply including all the measured covariates that might contribute (i.e., all elements of $ \Gamma $ are specified as realvalued free parameters). We thus selected a broad subset of the metadata from Mildenberger et al. (Reference Mildenberger, Lachapelle, Harrison and StadelmannSteffen2022): $ \boldsymbol{x}=\left(\mathrm{age},\mathrm{sex},\mathrm{region},\mathrm{car}\;\mathrm{use},\mathrm{partisanship},\mathrm{ideology}\right) $ .
We analyzed results across a range of $ K $ to ensure validity of our conclusions. Models with larger $ K $ typically give higher model likelihood, but are at higher risk of overfitting. The model likelihood on a heldout test dataset will however exhibit a maximum likelihood at finite $ K $ as overfitting to the remaining “train” samples becomes significant with increasing $ K $ . The value of $ K $ where the heldout likelihood achieves its maximum will depend on the relative size of the test data relative to the train data in the traintest split. Nevertheless, a loose upperbound on $ K $ is the value at which the maximum of the split realizationaveraged heldout likelihood for a trainbiased traintest data split (here 9:1) is obtained. For the support, oppose, and combined response type models, maximums were observed at $ K=15,18 $ , and $ 20 $ , respectively for averages over 100 and 200 traintest split realizations for separate and combined response type models, respectively (see Figure 6e,j). Note that the strong subadditivity (20/(15+18) = 0.61) in these dataselected topic numbers suggests a high amount of overlap in the topic content between support and oppose responses.
We can directly compare the responsetype conditioned posteriors of the combined model at any given $ K $ . In contrast, comparison of posteriors for the responsetype specific models should account for the fact that the likelihood will in general suggest distinct $ K $ for each. With a prior in hand, we could simply marginalize over $ K $ in the posterior for each. We do not have such a posterior in hand, unfortunately. A natural alternative is simply to compare models at the $ K $ at which each achieves the respective maximum in the likelihood. We highlight these maximum comparisons when plotting results over $ K $ .
2.6. Topic quality
We assessed topic quality across different topic numbers using two related measures (the standard analysis in STM literature). First, exclusivity (high when a topic’s frequent words are exclusive to that topic) was measured using the FREX score: the linearlyweighted harmonic mean of normalized rank of a word’s frequency relative to other words within a topic ( $ {\beta}_{k,v} $ ) and the same across topics,
where $ \mathrm{NormRank}\left(\boldsymbol{x}\right) $ returns the $ n $ dimensional vector of $ n $ normalized ascending ranks of the $ n $ dimensional vector $ \boldsymbol{x} $ (high rank implies high relative value) and $ \omega $ is the linear weight parameter (set by default to 0.7). In this and the following definition, the notation $ {v}_i $ is the index having descending rank $ i $ in $ {\beta}_{k,\cdot } $ (low rank implies high relative value, i.e., the more weighted topic words) and the sum is over the top $ {N}_{\mathrm{top}} $ such words for topic $ k $ (the default setting of $ {N}_{\mathrm{top}}=10 $ was used). Note that this implies that the second term simplifies using the identity $ \mathrm{NormRank}{\left({\beta}_{k,\cdot}\right)}_{v_i}=1\frac{i}{\mid \mathcal{V}\mid } $ . Second, semantic coherence (high when a topic’s frequent words cooccur often), was defined as
where $ D(v) $ counts the documents (i.e., responses) in which the word $ v $ occurs and $ D\left(v,{v}^{\prime}\right) $ counts the documents in which both words $ v $ and $ {v}^{\prime } $ occur together. See the R package’s documentation for more details (Roberts et al., Reference Roberts, Stewart and Tingley2019). We can study topic quality by analyzing the set of $ \left({\mathrm{FREX}}_k,{C}_k\right) $ pairs over topics in the plane spanned by exclusivity and coherence, where higher quality topics are positioned further to the top right. The mean of these points over topic for each value of $ k $ is used to evaluate particular STM models in Figure 4.
2.7. Statistical measures of topic mixture statistics
Our primary interest is in the statistical structure of the posteriors over topic mixture weight vectors, $ \boldsymbol{\theta} $ , equation (1). However, the geometry of a domain can have a strong effect on the estimation of statistical properties and estimation using the $ \boldsymbol{\theta} $ representation is biased because of the normalization constraint on weight vectors. A now wellestablished approach to performing statistical analysis on variables defined on the simplex (Aitchison, Reference Aitchison1982; PawlowskyGlahn and Egozcue, Reference PawlowskyGlahn and Egozcue2001) maps these variables to the space of logarithms of relative weights (relative to the geometric mean, $ g\left(\theta \right)={\prod}_{i=1}^K{\theta}_i^{1/K} $ ). This bijective transformation maps the simplex of $ K $ variables to $ {\mathrm{\mathbb{R}}}^{K1} $ such that LogisticNormal distributions map back into their Normal counterparts. Applying this transformation to the computed models, we find that the average distance between all pairs from the set of estimated means $ {\hat{\boldsymbol{\mu}}}_d=\hat{\Gamma}{\boldsymbol{x}}_d $ over the three datasets is typically much larger than the size of the variation arising from the fitted covariance matrix parameter $ \hat{\Sigma} $ (more than 95% of pairwise distances were larger than the total variance/dimension for three of the four comparisons (equations (3) and (4)) with the remainder at 85%; see Supplementary Material for details). Thus, the global geometry of the distribution is wellcaptured by the distribution of these means. We then focus on the empirical distribution of over $ \boldsymbol{\mu} =\left({\mu}_1,\dots, {\mu}_{K1}\right) $ , that is, the set $ {\left\{{\hat{\boldsymbol{\mu}}}_d\right\}}_{d=1}^D $ , one for each response in the response type dataset and clustering according to demographic statistics.
We developed and used four intuitive characteristics of a data cloud (see Table 1). The first is simply the mean position $ \overline{\mu} $ , while the remaining three rely on the covariance, $ {\Sigma}_{\boldsymbol{\mu}} $ , estimated using the standard estimator $ {\Sigma}_{\hat{\boldsymbol{\mu}}}=\frac{1}{D1}{\sum}_{d=1}^D\left({\hat{\boldsymbol{\mu}}}_d\hat{\overline{\boldsymbol{\mu}}}\right){\left({\hat{\boldsymbol{\mu}}}_d\hat{\overline{\boldsymbol{\mu}}}\right)}^T $ , where $ \hat{\overline{\boldsymbol{\mu}}}=\frac{1}{D}{\sum}_{d=1}^D{\hat{\boldsymbol{\mu}}}_d $ is the mean estimate. In order to plot results across $ K $ without pure dimension effects displaying prominently, we apply appropriate normalization to each measure that gives them constant scaling in $ K $ . We label each measure according to its interpretation as a measure of ideology as follows. The position relative to equal usage (the origin in $ \boldsymbol{\mu} $ space) measures how specific the usage is across topics. The size measures how much variability there is in how different individuals discuss that ideology. Eccentricity measures how the correlations limit the expressivity in how the ideology mixes topics. We measure this via reductions in a natural global measure of intrinsic dimension (Recanatesi et al., Reference Recanatesi, Bradde, Balasubramanian, Steinmetz and SheaBrown2022). Finally, orientedness measures how strongly aligned or antialigned pairs of topics are.
Each measure is largely independent of the others and together the set is a largely comprehensive representation of unimodal, nonskewed data, since it covers global statistical features up to second order. We discuss the limitations and missing features of this set of measures in the discussion (Section 4).
3. Results
3.1. Ideology and carbon tax support/opposition
To motivate our study of openended responses, we first show how support for the carbon tax depends on the demographic characteristics of the study’s respondents. We find that support for carbon tax transitions from a majority to a minority opinion consistently across ideologyrelated feature dimensions of lefttoright political ideology and liberaltoconservative partisanship (see Figure 2). These transitions are relatively strong (they extend further away from equal support and opposition) in comparison with a transition with age, shown in Figure 2a as a baseline.
3.2. Discriminability of the wordlevel representation
Moving onto analysis of the openended responses, we first present the response typeconditioned word statistics in Figure 3a. One salient feature is how much the word “tax” dominates in frequency within oppose responses, as compared with the support responses that recruit many frequent words. We then asked what degree word content (which words appear and how many times) could be used to distinguish support and oppose responses. We performed logistic regression on the responses to distinguish support from oppose response type, finding best test accuracy around 80% (Figure 3a; see Section 2 for details about the classification procedure). Wordlevel features are thus moderately informative in distinguishing support from opposition. This wordlevel information is spread heterogeneously over the vocabulary. In particular, by rerunning the classification on the $ n $ most contributing words (with contribution by frequencyweighted effect size) to each of the two target labels (support/oppose), we show in Figure 3b that test accuracy is maximized for about 100 words for each label (200 words overall) out of the more than 2500 stem words appearing in the data. We also show word clouds of the frequencyweighted effect size for $ n=100 $ that display the most discriminating words. In sum, many words seem to contribute to a moderate level of discriminability.
A principal drawback of this wordlevel approach to assessing the discriminating power of the text responses is that individual words are assigned to one or the other response type. Yet, we know the words used by oppose and support responses overlap. For example, “carbon” and “tax” are used in high frequency by both response types. Removing this pair of words allows for higher discriminability (84% accuracy; not shown). That removing these central words increases discriminability suggests firstorder word statistics are a poor representation of the semantics of the responses, and that we should pursue approaches that employ latent representations that make use of words in different ways. In particular, it is the context in which these terms are used—what distinct ideas are evoked by the term—that ultimately distinguish the two response types and serves as a better semantic representation.
3.3. Topic modeling
To investigate the contextual origin of response type discriminability using the openended responses, we used a generative topic model with latent topic structure. We analyzed two model settings: (1) response typespecific models fit to responses of certain type (support/oppose) and (2) a single model fit to both types of responses combined.
To illustrate the topic content, in Table 2, we show the top words associated with a given topic in a given model for the support and oppose responsetype models, as well as for the combined model (all for $ K=7 $ ). A topic’s top words can evoke an unambiguous semantic label for some topics, but not others. For example, topics a and e in the combined model evoke opposition because of the focus on cost of living and because of the word “grab” as in “tax grab,” respectively. However, subjective inspection leaves many cases largely ambiguous about whether a topic in the combined model is weighed more by support or oppose responses. Complementary to top words, top responses can also convey topic quality and even topic labels. For example, we perceived high semantic coherence in topic d, about unfairness of the tax on an international level (as well as, though to a lesser degree, in topic f about economic benefits, in topic c about its effectiveness in bringing about behavior change, etc.). The latter were mostly not transparent in the list of top words, but only in the top responses, that is, in how these words were used. We list some top responses for these topics in the Supplementary Material for reference. It is hard to determine if the ambiguity in determining a topic label from a set of top responses arises from low topic quality reflecting a weakly clustered topic structure in the data, or if it reflects limitations in our ability to identify salient topics. Another convoluting factor is that these topics are specific to the chosen value of $ K $ insofar as it imposes a specific number of topics, independent of any natural structure in data. For example, clusters of cooccurring words that are wellseparated from each other will nevertheless likely be joined when $ K $ is too low and single clusters will likely be labeled incorrectly as multiple distinct clusters when $ K $ is too high.
To gain a deeper understanding of the relative changes in the topics as $ K $ increased without having to resort to ad hoc interpretation of the topics’ top words, we developed two complementary analyses. First, we analyzed how topics breakup as $ K $ is increased (we call this a depth phylogeny). Inspired by methods of representing genetic phylogenies, we drew a tree structure obtained from connecting each topic to the topic in the model learned with 1 fewer topic that was nearest in the space of vocabulary word frequencies. In order to see how support and oppose responses spread their use of these topics for topics learned on both responses, we computed a topic depth phylogeny for the combined model (see Figure 4a). Branch thickness denotes closeness.Footnote ^{3} We also added the relative support and oppose contribution to the mixture weight component of that topic as a pie chart at each topic node in the tree. This allowed us to observe the cleaving of more coarse topics at low values of $ K $ into their finergrained internal components as $ K $ increased. Reminiscent of phylogenetic analyses, we define a semanticagnostic topic nomenclature using the row and column labels, respectively. In most cases, we see that topics are recruited by both response types so that single topics alone are insufficient for discriminating oppose from support responses. It is the weighting across multiple topics that gives sufficient information. That said, one salient feature of this particular tree is that topic 7f is the common ancestor for many of the topics at larger values of $ K $ . It predominantly loads onto support responses, recapitulating existing results in the literature of a diverse and diffuse set of topics discussed by those who support a carbon tax. In the following section, we will see that this $ K=7 $ model is highly discriminative and this topic in particular conferred more than 99% accuracy.
In a second analysis, we assessed the quality of the topics over different $ K $ using the standard pair of metrics of exclusivity and semantic coherence (see Section 2). Running the model on each response type separately produced topics whose values on these two quality metrics (Figure 4b) show that topicaveraged values give a linear tradeoff between the two, with topic number $ K $ setting where along the tradeoff the model resides. This suggests the linear combination of semantic coherence and exclusivity (e.g., the varianceminimizing projection) as a scalar metric of topic quality. Across different response types and values of $ K $ , the fixed sum (i.e., equal weights) of exclusivity and semantic coherence is highest in oppose responses. We attribute this ordering to the singular focus that oppose responses have on the word “tax,” as compared to the much more diffuse responses (in word use) of support responses. Note, however, that the topic variance in each of the two quality metrics is at least as large as the difference in topic means at fixed topic number, so the aforementioned ranking is only apparent after averaging over topics. Semantic coherence bottomed out at around $ K=10 $ topics. Topic quality for the composite model was qualitatively similar.
How predictive of response type is the way topics are recruited in the responses? In Figure 5a, we plot the empirical distribution of mixture weights, projected into its first two principal components, and colorcoded with each response type. High separability (especially for $ K=7 $ ) is observed across different topic number $ K $ , suggesting high discriminability. We performed a classification analysis on the support/oppose label similar to that using word use and presented at the beginning of our results (see Figure 3b), but now instead using the inferred topic mixture weights. Note that this representation is able to recruit information in the correlations among words that distinguishes the two response types. In Figure 5b, we again show performance as a function of using the most predictive features (here not words, but topic mixture weights). Consistent with the high observed separability, we find accuracies approaching and more than 99% across all values of $ K>2 $ , demonstrating the power of this latent structure to predict support or opposition to the tax. Accuracies were only marginally lower at 95% or more for $ \sigma =0 $ (not shown). For $ K=7 $ , near 100% accuracy is possible even using only a single topic and perfect accuracy for only two topics. As an example of more finegrained analysis made possible by our approach, we focus in on these two topics. In Figure 5c, we show the projection of the set of bestestimate mixtures across responses, colored by response type, into the plane in mixture space spanned by the components associated with these two topics. We see that the perfect classification results from topic 7 (topic 5) being strongly suppressed in support (oppose) responses. We further assessed this association by inspecting the top responses for each topic (one from each is shown in Figure 5c). The responses in topic 7 describe the tax as a malicious injustice imposed on citizens, while those in topic 5 describe the effectiveness of the monetary incentive. We note that while the term “tax grab” is the core idea in Topic 7 (“grab” is in $ 8.5\% $ of oppose responses compared to only $ 1.5\% $ of support responses), the idea recruits a much wider set of terms operationalized as the support of topic 7 on the vocabulary. It is this wider set of words that confers the discriminability (in this case terms that are not used in high frequency by support responses).
We now turn to the topictopic correlations to better understand the origin of the topic representation’s higher classification performance over word frequency representations. We also arrive at analyses appropriate to investigating the structure of the mixture statistics that putatively hold distinctive features of the underlying ideology. We used the four measures of data cloud geometry summarized in Table 1 to assess the topictopic correlation structure using $ \boldsymbol{\mu} $ representation as explained in the Methods. For the combined response type models, we can directly compare the support and oppose responses at each value of $ K $ , so we represent the results as the difference in the values of each measure over the two response types for a range of $ K $ around the maximum in the heldout Likelihood (positive values indicate oppose responses exceed support responses on that metric; Figure 6a–d). For the separate response type models, for which the best fitting value of $ K $ differs, we simply report both measures over a larger range of $ K $ (Figure 6f–i). By showing the heldout likelihood (Figure 6e,j), a comparison at respective maximums could be made. We find consistent results for both comparisons. Namely, oppose responses are more specific, somewhat more variable, much less expressive, and more aligned across a wide range of $ K $ around where the heldout likelihood is maximized. All three of the nonweak results are consistent with our hypothesis of a rigid ideology underlying carbon tax opposition.
4. Discussion
In this study, we presented a set of principled, quantitative measures and analyses to pin down topic structure in the learned parameters of the STM. We applied them to understand the effects of ideology behind carbon tax opinion in Canada. We find topic mixture weights derived from the learned models are highly predictive of opposition to or support of the carbon tax, and we presented a topic tree analysis similar to phylogenetics to show that this performance arises when the model has a sufficient number of topics that together are discriminating on each response. Finally, we proposed a set of statistical measures on topic mixtures and evaluated them to find that the oppose responses had higher levels of specificity and orientedness, and were less expressive. This suggests carbon tax opposition in Canada is explained by its proponents through a more wellworn, coherent ideology.
How might this result generalize to carbon tax opposition in other countries? There is some debate in the literature regarding the generalizability of ideology structuring climate change beliefs (Lewis et al., Reference Lewis, Palm and Feng2019). When looking at environmental taxes specifically, ideology plays an important role, but is mediated by the quality of government (or at least public perception of it), such that progressives in low quality of government contexts are not more likely to support environmental taxation (Davidovic et al., Reference Davidovic, Harring and Jagers2020). Declining public faith in government may then hinder public acceptance of fiscal policy on climate change, independent of ideology.
We anticipate the methods we present here can be useful to social scientists who are interested in inferring social and political phenomenon from openended survey responses. Most obviously, our methods could be used to further quantify the structure of collective beliefs. In a longitudinal approach, they could also reveal the dynamics of these collective beliefs, and in particular their response characteristics, for example from largescale interventions or events (e.g., the COVID19 pandemic, RussiaUkraine war). A study of the timescales over which the effects of interventions decay would inform the recent work on the notion of inoculation to misinformation. To push the metaphor, the characteristics of the memory it induces have not yet been deeply probed (Roozenbeek et al., Reference Roozenbeek, van der Linden, Goldberg, Rathje and Lewandowsky2022). Our work suggests that the topic neighborhood around the focus of any given intervention may play a role via the topictopic correlation structure. This could also impact policy design, for example in how the carbon tax is promoted. Typical communications advice is to focus on a single issue/message to not overload the target audience. However, neighborhood effects that are strongly restoring (Hall and Bialek, Reference Hall and Bialek2019) could rapidly erase the effects of singleissue interventions. Instead, a communications strategy aimed at a local neighborhood of target issues in some appropriately defined issues space may be more effective at loosening the grip that ideology has on opinion.
There are many potential directions in which to take this work. For one, the existing set of metrics of mixture statistics only cover first and secondorder statistical moments. Additional loworder statistics could be included, for example, how concentrated is the distribution around its center (the expectation of $ \left\Vert \boldsymbol{\mu} \right\Vert $ might provide signal into how hollow is the data cloud). In another direction, the efficiency of the topic representations suggests it could serve as an input embedding space for use in deep learning approaches to largescale social computations (e.g., those run on large social media platforms). For example, recent work on reinforcement learning with human feedback for large language models has focused on generating consensus statements from a set of humangenerated opinions (Bakker et al., Reference Bakker, Chadwick, Sheahan, Tessler, CampbellGillingham, Balaguer, McAleese, Glaese, Aslanides, Botvinick and Summerfield2022). A currently limiting constraint of this approach is how many humangenerated text inputs can be used. A topic space learned from many opinions could circumscribe this constraint.
We probed model behavior over a range of $ K $ around the maximum in heldout model likelihood. We were able to make conclusions from the fact that the sign of the difference of the values of the metrics was unchanged over these ranges. This will not be true in general, in which case one approach is to marginalize $ K $ out of the problem. In a fully Bayesian approach, given a prior over $ K $ , one would infer a posterior over $ K $ and take the posterior average over any $ K $ dependent quantities, for example, average topic quality or the statistical metrics. Given that models with different $ K $ have a different number of parameters, a complexitypenalty term such as the AIC could be added to the posterior to account for variable model complexity.
One direction we did not pursue here was exploring how different metadata models affect the results. Are there specific factor models of fewer covariates that best explain the data? As we motivated our choice of the allin model, this is a delicate analysis requiring disentangling various cases. For example, is car use a confounder or collider of the effect of residence environment? Such distinctions will influence estimated effect sizes (Wysocki et al., Reference Wysocki, Lawson and Rhemtulla2022). A first step along this direction is to further analyze the results in the case of a sparsity prior on $ \Gamma $ . Including metadata model dependence in topic content is a related direction for future work. Note that in the absence of rich metadata, the STM model reduces roughly to the Correlated Topic Model (Blei and Lafferty, Reference Blei and Lafferty2005), in which case topic correlations can be studied via the mixture covariance matrix parameter, $ {\Sigma}_{\boldsymbol{\theta}} $ .
Finally, there is a broader question of the validity of the generative model class. In particular, are topic models with latent structure suitable mathematical representations of the semantic content of networks of beliefs? How much is lost when representing language as text, without syntax and without positional encoding? How much information about beliefs is contained in language, given that a word can have one of many different meanings depending on who is speaking (Marti et al., Reference Marti, Wu, Piantadosi and Kidd2023). These questions broach broader philosophical issues around meaning in natural human language. The rich semantics in (positionencoding) large language models that achieve incontext learning (Mollo and Millière, Reference Mollo and Millière2023), suggest text is a sufficiently rich representation of natural language from which to extract semantics. In fact, to the extent that semantics are not positionencoded (socalled exchangeability), the autoregressive objective on which these models are trained is formally equivalent to the posterior inference on a topic model (Zhang et al., Reference Zhang, McCoy, Sumers, Zhu and Griffiths2023). Stripping text of its positional encoding must result in some loss of meaning, though we note that it need not degrade and may even improve performance in some downstream tasks, for example, Kazemnejad et al. (Reference Kazemnejad, Padhi, Ramamurthy, Das and Reddy2023). We see topic models as complementing more powerful large language models by having more interpretable latent spaces, and we exploit that interpretability in this work. We hope it inspires the development, refinement, and application of topic models in computational data science and beyond.
Acknowledgments
The authors would like to acknowledge helpful discussions with Matto Mildenberger, Kathryn Harrison, and Dhanya Sridhar.
Author contribution
M.P.T.: Conceptualization (Equal), Formal analysis (Lead), Investigation (Lead), Methodology (Lead), Software (Lead), Writing—original draft (Lead), Writing—review and editing (Equal). E.L.: Conceptualization (Equal), Data curation (Lead), Supervision (Lead), Investigation (Supporting), Writing—review and editing (Equal).
Competing interest
The authors declare none.
Data availability statement
The raw survey data obtained for Mildenberger et al. (Reference Mildenberger, Lachapelle, Harrison and StadelmannSteffen2022) was not made publicaly available for privacy reasons (they include some derivative data on the Harvard Dataverse replication archive at https://doi.org/10.7910/DVN/3WBCH9). For this publication, we created a privacypreserving processed dataset of only the fields we used in our analysis. We have stored this dataset in a file that we make publically available in a GitHub repository (https://github.com/mptouzel/PuelmaTouzel_EDS_2024_carbontax), alongside the Python scripts and computational notebooks needed to reproduce all figures of the paper.
Ethics statement
The research meets all ethical guidelines, including adherence to the legal requirements of the study country.
Funding statement
M.P.T. acknowledges the support from the Canada CIFAR AI Chair Program and the Canada Excellence Research Chairs Program via Dr. Irina Rish. Funding was also drawn from a Social Sciences and Humanities Research Council of Canada award (#43520171388) in which E.L. is the principal investigator.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/eds.2023.44.