A data-driven kinetic model for opinion dynamics with social network contacts

Giacomo Albi; Elisa Calzola; Giacomo Dimarco

doi:10.1017/S0956792524000068

A data-driven kinetic model for opinion dynamics with social network contacts

Part of: Mathematical sociology Equations of mathematical physics and other areas of application Mathematical economics Partial differential equations, initial value and time-dependent initial-boundary value problems

Published online by Cambridge University Press: 20 February 2024

and

Giacomo Albi: Affiliation:
Department of Computer Science, University of Verona, Verona, Italy
Elisa Calzola*: Affiliation:
Department of Mathematics and Computer Science & Center for Modeling, Computing and Statistics (CMCS), University of Ferrara, Ferrara, Italy
Giacomo Dimarco: Affiliation:
Department of Mathematics and Computer Science & Center for Modeling, Computing and Statistics (CMCS), University of Ferrara, Ferrara, Italy
*: Corresponding author: Elisa Calzola; Email: elisa.calzola@unife.it

Article contents

Abstract
Introduction
An evolutionary model for contacts
Kinetic model of opinions and contacts
Numerical experiments
Conclusions
Competing interests
References

Rights & Permissions

Abstract

Opinion dynamics is an important and very active area of research that delves into the complex processes through which individuals form and modify their opinions within a social context. The ability to comprehend and unravel the mechanisms that drive opinion formation is of great significance for predicting a wide range of social phenomena such as political polarisation, the diffusion of misinformation, the formation of public consensus and the emergence of collective behaviours. In this paper, we aim to contribute to that field by introducing a novel mathematical model that specifically accounts for the influence of social media networks on opinion dynamics. With the rise of platforms such as Twitter, Facebook, and Instagram and many others, social networks have become significant arenas where opinions are shared, discussed and potentially altered. To this aim after an analytical construction of our new model and through incorporation of real-life data from Twitter, we calibrate the model parameters to accurately reflect the dynamics that unfold in social media, showing in particular the role played by the so-called influencers in driving individual opinions towards predetermined directions.

Keywords

Opinion dynamics multi-agent systems data-driven models kinetic and Boltzmann equations collective behaviour

MSC classification

Primary: 35Q91: PDEs in connection with game theory, economics, social and behavioral sciences

Secondary: 91D30: Social networks 91B74: Models of real-world systems 65M75: Probabilistic methods, particle methods, etc.

Type: Papers
Information: European Journal of Applied Mathematics , First View , pp. 1 - 27

DOI: https://doi.org/10.1017/S0956792524000068 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2024. Published by Cambridge University Press

1. Introduction

In recent years, kinetic models and specifically Boltzmann equations have emerged as very powerful tools for describing and analysing the collective behaviours exhibited by systems of interacting agents [Reference Bellomo, Dosi, Knopoff and Virgillito7, Reference Pareschi and Toscani45]. These models have found applications across a diverse range of fields giving a contribution to the advancement of the knowledge in various disciplines. For example in economics, kinetic models have been recently used to study the dynamics of market prices and trading outcomes ([Reference Burger, Pietschmann and Wolfram14, Reference Chatterjee and Chakrabarti17, Reference Cordier, Pareschi and Toscani19, Reference Düring, Pareschi and Toscani30, Reference Pareschi and Toscani46, Reference Toscani51]). Using the tools of kinetic theory in financial markets to study the evolution of prices, it is possible to predict the emergence of bubbles and crashes [Reference Li, Tao and Li42]. Moreover, Boltzmann-type equations permit to characterise the details of the economical interactions and to describe the wealth distribution in a society and the appearance of inequalities [Reference Cordier, Pareschi and Toscani19, Reference Pareschi and Toscani46]. In biology, kinetic models have been extensively used to study the dynamics of populations and the spread of epidemics (see, e.g., [Reference Boscheri, Dimarco and Pareschi12, Reference Chalub, Markowich, Perthame and Schmeiser16, Reference Dimarco, Perthame, Toscani and Zanella22, Reference Dimarco, Toscani and Zanella23, Reference Zanella, Bardelli and Dimarco54]). The kinetic theory of infectious diseases has been shown to be a powerful mean to describe the spread of a disease in a homogeneous population and also when spatial differences become a key aspect to accurately describe a pandemic [Reference Bertaglia, Boscheri, Dimarco and Pareschi9, Reference Boscheri, Dimarco and Pareschi12]. These models can be used to predict the effectiveness of vaccination and quarantine measures in a population [Reference Dimarco, Toscani and Zanella23] and can be efficiently interfaced with data [Reference Zanella, Bardelli and Dimarco54]. Boltzmann-type equations have also been used to study the evolution of cooperation and altruism in social systems [Reference Albi, Pareschi and Zanella6, Reference Burger13] and to analyse the dynamics of genetic mutations ([Reference Toscani50]). The application of kinetic models in social sciences has also a long and successful history [Reference Dimarco and Toscani24, Reference Dimarco and Toscani25, Reference Galam, Gefen and Shapir36, Reference Tosin52]; in this context, we also recall the recent study of information diffusion [Reference Franceschi and Pareschi34] which can be pursued through these mathematical techniques. In engineering, kinetic models have been used to study the dynamics of traffic flow [Reference Dimarco and Tosin26, Reference Dimarco, Tosin and Zanella27], notably the formation and dissipation of traffic jams and the prediction of the effects of traffic control measures (see [Reference Agnelli, Colasuonno and Knopoff2, Reference Günther, Klar, Materne and Wegener38, Reference Puppo, Semplice, Tosin and Visconti47]). Crowd behaviours [Reference Agnelli, Colasuonno and Knopoff2, Reference Albi, Bellomo and Fermo3, Reference Bellomo, Gibelli, Quaini and Reali8, Reference Festa, Tosin and Wolfram31] and network communication with emphasis on the optimisation of communication protocols in wireless networks [Reference Düring, Markowich, Pietschmann and Wolfram29, Reference Toscani49] can be studied as well.

In the above-depicted and wide framework of application of kinetic theory to social and biological systems, opinion formation, that is, the dynamics of how opinions evolve and spread among individuals plays a relevant role due to its importance in the society: political polarisation and consensus among others. To shed light on this complex process, the methods of statistical physics have proven to be highly effective and efficient tools for studying and analysing such phenomena [Reference Albi, Pareschi and Zanella4, Reference Albi, Pareschi, Toscani and Zanella5, Reference Düring, Markowich, Pietschmann and Wolfram29, Reference Toscani49]. In this regard, one of the key concepts from statistical physics that can be applied to the dynamics of opinion formation is the notion of emergent behaviour. Emergence refers to the phenomenon where collective properties and behaviours arise from the interactions and dynamics of individual components. In the context of opinion formation, emergent behaviour can manifest as opinion clusters, polarisation, consensus, or the formation of influential opinion leaders.

In this paper, we introduce a new kinetic model through the use of probabilistic and statistical tools describing the microscopic dynamics of opinion formation and change. We then upscale our model at the bound of observable quantities, and we provide a quantitative framework for analysing social phenomena and designing interventions that promote constructive dialogue and reduce polarisation. One key ingredient of our study consists in the description of the role played by individuals with strong ascendancy on the rest of the population through a detailed analysis of the network to which the population belongs and the high number of connections of such few influential individuals. More in detail, our model describes the evolution of opinions starting from the microscopic bound. Each agent/individual has associated two real values: its number of followers on a given social media platform where he/she is used to interact and its opinion. By assuming that the number of followers is not influenced by their opinion, we first construct an evolutionary model describing the connections among individuals over a fixed social platform. We successively restrict ourselves to a given social network, namely Twitter, and we show how the proposed model is well adapted to describe Twitter networks by matching real data, through a parameter estimation technique, with the equilibrium distribution obtained with this new contact model.

In the second part, we assume that opinions are continuous variables lying on a bounded interval $[\!-\!1,1]$ indicating, respectively, total disagreement and total agreement about a given topic, and that the agents update their opinions after the interaction with others through the social platform. The strength of such interaction is supposed to depend on the number of followers of each agent and on the distance between their opinions. We also suppose that there is a certain amount of randomness in the interaction, modelling external factors which can be hardly controlled such as the possibility to access information and the knowledge of every single individual. Under these hypotheses, we derive the kinetic equation that describes the time evolution of the distribution of opinions in presence of social media contacts in the population, and we finally study its properties using analytical and numerical methods. In the last part, we show that our kinetic model is able to capture important features of opinion dynamics, such as the emergence of consensus and polarisation, according to the choice of different interaction kernels between the agents. Furthermore, to comprise data with opinion dynamics, we use a sentiment analysis (SA) method to assign a score to textual information (the tweets). We use this method to represent the agents’ opinions extracting data from Twitter over some specific topic and assigning a score in the interval $[\!-\!1,1]$ . We mention that SA, also known as Opinion Mining, is a subclass of Natural Language Process (NLP) methods to analyse textual information; in this context, we refer to [Reference Hutto and Gilbert40, Reference Medhat, Hassan and Korashy44, Reference Zhang, Yoshida and Tang55, Reference Zhou, Tao, Yong and Yang56] for further details. Finally, to fit the actual trend of the opinion distribution of the agents, we calibrate the interaction kernels that rule the evolution of the dynamics, using a parameter estimation approach based on the minimisation of a loss function, between data extracted from Twitter and the result of the simulation. Our results provide insights into the mechanisms that drive opinion formation and changes on a social platform.

The rest of the work is structured as follows. In Section 2, we describe how to model the formation of a network on social media platforms starting from a microscopic approach and successively upscale the description at a mesoscopic bound by deriving a kinetic equation for the time evolution of the connections. We show that for different choices of a so-called value function appearing in every single interaction among individuals, we are able to recover different stationary distributions for the distribution of contacts, and we explain how to use a given dataset from Twitter in order to select the set of parameters that allows describing the current state of the connections on that specific social network. Section 3 is devoted to the model of opinion formation given the presence of social media contacts. Starting from the binary interaction between two agents, which takes into account the compromise propensity of the agents and a certain amount of randomness in the process, we recover the Boltzmann-like equation that describes the evolution of the density of the joint distribution of opinions and contacts. In Section 4, we perform some numerical simulations. In the first part, we analyse the qualitative behaviour of our new model and we illustrate its capabilities in describing different artificial situations. In the second part, we focus on real data extracted from Twitter, we perform a SA to obtain an opinion distribution and we reconstruct with our new model the interaction kernel that leads to the opinion distribution derived from that SA. The last Section 5 is devoted to drawing some conclusions and individuating some future research axes.

2. An evolutionary model for contacts

This section is dedicated to the construction of a mathematical model describing social contacts on the web with an emphasis on social platform networks. There exists a vast literature on network modelling, see for instance [Reference Acemoglu and Ozdaglar1, Reference Albi, Pareschi, Toscani and Zanella5, Reference Das, Gollapudi and Munagala20, Reference Dolfin and Lachowicz28] and the reference therein where empirical, Bayesian, and non-Bayesian methods and probability approaches based on Poisson distribution are discussed and employed. Here, the path followed is different and it sinks its roots in the interplay between the kinetic theory of gases [Reference Cercignani15, Reference Pareschi and Toscani45] and the prospect theory of Kahneman and Tversky [Reference Kahneman and Tversky41] which was first introduced in the context of behavioural economic studies to characterise the science of decision in a population. Our aim is to exploit this theory with the scope of building a model which is able to describe the evolution in time of the number of contacts on a social media platform through the methods and techniques of kinetic theory. Let us observe that a similar study has been performed in [Reference Dimarco, Perthame, Toscani and Zanella22] for characterising the contact dynamics related to the spread of an epidemic. However, in the context of the virtual contacts, which are the ones to take place in this study, the results are different as well as the type of equilibrium distributions characterising the network obtained as shown later. One additional and important point to highlight is that the results achieved in this section permit us to match the real data taken from a given web platform, namely Twitter, with high precision.

In the next section, through modelling choices discussed here for the network formation, we will introduce a detailed description of the microscopic binary interactions taking place among individuals acting on a social network for what concerns the formation of opinions. We will in particular shed light on the role played by individuals with a large number of connections in driving others’ opinions. In the sequel, we will often refer to individuals with a number of connections larger than the average to as the influencers. Let us also observe that the choice we will do successively of giving these individuals a larger weight in the opinion balance is consistent with the actual current functioning of most social media platforms: individuals are exposed to content created by popular users more often than the ones posted by their local connections and more likely influenced by the former. For the moment, and for the sake of clarity, we restrict ourselves to the sole case of contact dynamics and we will follow the construction in [Reference Dimarco and Toscani25] to derive the master equation of Boltzmann type that describes the evolution of such quantity.

We consider then a system of agents characterised by the number of their social media followers which, from now on, we will refer to as $c\gt 0$ . We assume that our agents are indistinguishable and that, at time $t \geq 0$ , they are only characterised by the number of their contacts. The concept of contact or connection here has to be intended as the number of individuals following the contents spread over a given platform by a given second subject. One can then suppose the statistical distribution of contacts/connections of the agents to be fully characterised by the density $h(c,t)$ of contacts, which is such that, given the sub-domain $D\subseteq \mathbb{R}_+$ , the integral

\begin{equation*} \int _D h(c,t)\mathrm {d}c \end{equation*}

represents the number of people having $c\in D$ followers at time $t\gt 0$ . The density function $h$ is assumed to be normalised to one, so that

\begin{equation*} \int _{\mathbb {R}_+} h(c,t)\mathrm {d}c = 1. \end{equation*}

In analogy with the problem of social climbing presented in [Reference Dimarco and Toscani25] where agents were aiming to climb the social ladder to reach high social status, here it is reasonable to assume that the formation of a given network begins due to the will of all the participants to interact and to be heard by others. The consequence is that each individual, we usually refer to him/her to as an agent in the sequel, likely wants to increase its number of contacts when he enters the platform by interacting with other agents. We assume that, for most participants, there exists a given number of social contacts, $\bar c$ , such that they considered themselves satisfied when this number is reached. Moreover, the elementary interaction which takes place at the microscopic bound will express the tendency of the agents to reach, at least, the number of followers equals to $\bar c$ . In order to describe the evolution of social contact dynamics, we will now take inspiration from the prospect theory of Kahneman and Tversky [Reference Kahneman and Tversky41] and we introduce a value function $\Psi _\delta$ modelling this behavioural theory. We also have

(1)

\begin{equation} c' = c - \Psi _\delta (c/\bar c) c +\eta c \end{equation}

where $0 \leq \delta \leq 1$ is a parameter characterising the intensity of the individual behaviour and $\eta$ is a random variable with zero mean and finite standard variation $\nu$ . In equation (1), we are then assuming that each individual tries to increase the number of its followers by getting in touch with friends or by sharing content that may be of interest to others, voicing strong opinions, or simply using their social status to capture interest. The result of such dynamics is that the number of followers of each agent can be modified for two reasons, one quantified by a deterministic value function $\Psi _\delta$ , which can assume both negative and positive values, that is, it is possible that with its actions an individual gain or lose connections. The second reason which may lead to a change in the number of followers is due to the intrinsic unpredictability of this complex process which is quantified consequently by a random variable $\eta$ with zero mean. The value function $\Psi _\delta (s)$ , $s\geq 0$ , encodes the properties of the theory exposed in [Reference Kahneman and Tversky41]: it is a dimensionless increasing function equal to 0 at the point $s=1$ corresponding to the point where most individuals reach satisfaction in terms of their role in the network and it verifies the conditions:

(2)

\begin{equation} {-}\Psi _\delta (1 - \Delta s) \gt \Psi _\delta (1 + \Delta s) \end{equation}

and

(3)

\begin{equation} \frac{\mathrm{d}}{\mathrm{d}s}\Psi _\delta (s)\big |_{{1 + \Delta s}}\lt \frac{\mathrm{d}}{\mathrm{d}s}\Psi _\delta (s)\big |_{{1 - \Delta s}} \end{equation}

for $0 \lt \Delta s \leq 1$ . Request (3) implies that the value function is asymmetric, meaning that it is steeper below the reference point $s=1$ than above it. This models the fact that, if two agents start at the same distance $\Delta s$ from the reference point $s=1$ , getting closer to $s=1$ will be easier for the agent starting from below the reference point than for the one starting above. It is quite easy for individuals with a low number of followers to increase the number of their connections by contacting friends and related, while it is more difficult to decrease the number of followers when a certain status, that is, $c\gt \bar c$ , is reached, and in general, an individual is not interested in decreasing this number. Setting $s=c/\bar c$ we choose the precise form of the value function which reads

(4)

\begin{equation} \Psi _\delta (s) = -\mu \frac{e^{(s^{-\delta } - 1)/\delta }-1}{(1-\mu )e^{(s^{-\delta } - 1)/\delta } + 1 + \mu }. \end{equation}

The above equation respects all the properties detailed above for any value of $0\lt \delta \leq 1$ . It has an inflection point $\bar s\lt 1$ , and it is convex in $[0,\bar s]$ and concave for $s\gt \bar s$ . This inflection point corresponds to a certain value $\hat c \lt \bar c$ below which, in principle, the agents do not expect to increase their number of followers, while the satisfactory number of followers $c=\bar c$ corresponds to the reference point $s=1$ . Moreover, the value function (4), is bounded by the following relation:

(5)

\begin{equation} -\frac{\mu }{1-\mu }\leq \Psi _\delta \leq - \mu \frac{e^{-1/\delta }-1}{(1-\mu ) e^{-1/\delta }+1+\mu } \end{equation}

for $0\leq \delta \leq 1$ . Concerning the role played by the parameter $\delta$ , one can notice that the smaller is this value, the easiest is the possibility to gain some followers when one is below the value $c=\bar c$ . We report in Figure 1 different shapes of the value function, for different choices of $\delta$ , where the dashed red lines represent the bounds (5).

Figure 1. Profiles of the value function (4) for different choices of $\delta$ and $\mu =0.15$ . The red dashed lines represent the bounds (5).

Before concluding this part, we introduce a rescaling factor in equation (1) meaning that we are interested in studying a process in which the formation of this social network is a consequence of small upgrading of the number of connections in time. The rescaled equation reads

(6)

\begin{equation} c' = c - \Psi ^\varepsilon _\delta (c/\bar c) c +\eta _\varepsilon c \end{equation}

where now

(7)

\begin{equation} \Psi ^\varepsilon _\delta (s) = -\mu \frac{e^{\varepsilon (s^{-\delta } - 1)/\delta }-1}{(1-\mu )e^{\varepsilon (s^{-\delta } - 1)/\delta } + 1 + \mu }. \end{equation}

$\eta _{\varepsilon }$ is the same random variable as before but with variance $\varepsilon \nu ^2$ and $\varepsilon$ is a small parameter. The role of $\varepsilon$ in the dynamics will be clarified in the next section.

2.1. The kinetic model for the evolution of social media contacts

We aim now in deriving an evolutionary model for the formation of a network on a social media platform resorting to the contact law (7) derived previously. To that aim, let us observe that the variation of the density $h_\varepsilon (c,t)$ , also rescaled through the parameter $\varepsilon$ , obeys a linear Boltzmann-like equation [Reference Pareschi and Toscani45] whose weak form corresponds to

(8)

\begin{equation} \frac{\mathrm{d}}{\mathrm{d}t}\int _{\mathbb{R}_+} \varphi (c) h_\varepsilon (c,t) \mathrm{d}c = \big \langle \int _{\mathbb{R}_+} \chi (c)\!\left ( \varphi (c_*) - \varphi (c) \right ) h_\varepsilon (c,t) \mathrm{d}c \big \rangle, \end{equation}

for all smooth test functions $\varphi (c)$ . These functions are the so-called observable quantities of the underlying random process. For example, taking $\varphi (c)=1$ leads equation (8) to an equation for the time evolution of the number of individuals in the network which can be easily inferred from equation (8) is constant in time. Instead, the case $\varphi (c)=c$ leads to an evolution equation for the average number of connections in the network which can be inferred, contrary to the previous case, not to be conserved in time. The positive function $\chi (c)$ measures the frequency of the interactions with $c$ followers and the expectation $\langle \cdot \rangle$ takes into account the presence of the random variable $\eta _\varepsilon$ . More in detail $\langle \cdot \rangle$ gives the expected value with respect to the random space in which $\eta _{\varepsilon }$ lives. In order to preserve the positivity of the connections in (1) and in the rescaled equation (6) based on the bounds of the value function (5), we require the random variable $\eta _\varepsilon$ to be uniformly distributed and to take values in

(9)

\begin{equation} { \eta _\varepsilon \in \left [-\left |\frac{e^{-\varepsilon/\delta }+1}{(1-\mu )e^{-\varepsilon/\delta } + 1 + \mu }\right |,\left |\frac{e^{-\varepsilon/\delta }+1}{(1-\mu )e^{-\varepsilon/\delta } + 1 + \mu }\right |\right ].} \end{equation}

In the sequel, we consider collision kernels in the form [Reference Furioli, Pulvirenti, Terraneo and Toscani35]

(10)

\begin{equation} \chi (c) = c^\beta \alpha, \end{equation}

for some multiplicative positive constants $\alpha \gt 0$ and for exponents $\beta \geq 0$ . In order to establish reasonable values for $\chi (c)$ , one can observe that, if $s\gt 0$ , the individual rate of growth ${\partial \Psi _\delta ^\varepsilon (s)}/{\partial s}$ vanishes as $\varepsilon \to 0$ (cfr. [Reference Dimarco and Toscani25]). So, to maintain a collective growth different than 0 for all values of the scaling parameter $\varepsilon$ , one suitable choice is to take

(11)

\begin{equation} \alpha = \frac{1}{\kappa \varepsilon }, \end{equation}

corresponding to a frequency of interaction proportional to $1/\varepsilon$ with instead $\kappa$ an order 1 constant. Concerning the second parameter $\beta$ , we will consider two different situations. The first consists in taking $\beta \gt 0$ which implies that the frequency of interaction becomes greater if the number of connections is higher. This situation is encountered in social platforms where the type of exchanges are typically one to one and consequently one shares contents proportionally to the number of its connections. The second case we will consider is $\beta =0$ which corresponds to the situation in which interactions are independent of the number of connections meaning that the activity of each individual is independent of the other and the content sharing is independent of the size of the relative network. In the case $\beta \gt 0$ , a rational choice would consist in setting $\beta =\delta$ when $\delta$ is positive. Choices of $\beta \lt \delta$ will imply very high variations of the collective growth of the number of social media connections when $c$ is small compared to the case with $c$ large. However, it is not reasonable to expect that individuals with few connections can reach an influential position easily. Conversely, $\beta \gt \delta$ implies very small variations of collective growth for small values of $c$ , excluding consequently the opposite situation, that is, the possibility to increase the number of followers for individuals having few connections. Hence, choosing $\beta = \delta$ is a good compromise between the two scenarios. With the above-discussed choices, then equation (8) can be rewritten as:

(12)

\begin{equation} \frac{\mathrm{d}}{\mathrm{d}t}\int _{\mathbb{R}_+} \varphi (c) h_\varepsilon (c,t) \mathrm{d}c = \frac{1}{\varepsilon \kappa } \big \langle \int _{\mathbb{R}_+}c^\delta \!\left ( \varphi (c_*) - \varphi (c) \right ) h_\varepsilon (c,t) \mathrm{d}c \big \rangle. \end{equation}

To get some insight into the time evolution of the model above, we use a standard procedure borrowed from the theory of gases and so we expand in Taylor series $\varphi (c')$ around $\varphi (c)$ supposing $\varphi (c)$ smooth enough. We have that

\begin{equation*} \langle c' -c\rangle = -\Psi _\delta ^\varepsilon (c/\bar c) c, \qquad \langle (c' - c)^2 \rangle = \left (\Psi _\delta ^\varepsilon (c/\bar c)\right )^2 c^2 + \varepsilon \nu ^2 c^2, \end{equation*}

One can also observe that for $0\lt \delta \leq 1$ , it holds that

(13)

\begin{equation} \lim _{\varepsilon \to 0} \frac{1}{\varepsilon } \Psi _\delta ^\varepsilon \!\left ( \frac{c}{\bar c} \right ) = \frac{\mu }{2\delta } \!\left (1- \!\left ( \frac{\bar c}{c}\right )^\delta \right ). \end{equation}

This gives

\begin{equation*} \langle \varphi (c') - \varphi (c) \rangle = \varepsilon \!\left (-\varphi '(c) \frac {1}{\varepsilon }\Psi _\delta ^\varepsilon (c/\bar c) c + \frac {\nu ^2}{2} \varphi ''(c) c^2\right ) + R_\varepsilon (c), \end{equation*}

where $ R_\varepsilon (c)$ is a remainder of the Taylor expansion such that $R_\varepsilon (c) = o(\varepsilon )$ thanks to equation (13). Therefore, using for the interaction frequency (11), we get for the evolution of the observable $\varphi (c)$ :

\begin{equation*} \frac {\mathrm {d}}{\mathrm {d}t} \int _{\mathbb {R}_+} \varphi (c) h_\varepsilon (c,t) \mathrm {d}c = \int _{\mathbb {R}_+}\frac {c^\delta }{\kappa }\!\left ( -\varphi '(c) \frac {1}{\varepsilon }\Psi _\delta ^\varepsilon (c/\bar c) c + \frac {\nu ^2}{2} \varphi ''(c) c^2 \right )h_\varepsilon (c,t) \mathrm {d}c + \frac {1}{\kappa \varepsilon }\mathcal{R}_\varepsilon (c,t) \end{equation*}

where

\begin{equation*} \mathcal{R}_\varepsilon (c,t) = \int _{\mathbb {R}_+} R_\varepsilon (c)h_\varepsilon (c,t) \mathrm {d}c \end{equation*}

and by using equation (13), we get the following approximation:

(14)

\begin{equation} \frac{\mathrm{d}}{\mathrm{d}t} \int _{\mathbb{R}_+} \varphi (c) h (c,t) \mathrm{d}c = \int _{\mathbb{R}_+}\!\left ( -\varphi '(c)\frac{\tilde \mu }{2\delta }\!\left (1- \!\left ( \frac{\bar c}{c} \right )^\delta \right )c^{1+\delta } + \frac{\tilde \nu ^2}{2} \varphi ''(c) c^{2+\delta } \right )h(c,t) \mathrm{d}c, \end{equation}

in which we have set $\tilde \mu = \mu/\kappa$ and $\tilde \nu ^2 = \nu ^2/\kappa$ . Under the additional hypothesis that the boundary terms produced by the integration by parts vanish, that is, a zero flux condition, equation (14) is the weak form of the following Fokker–Planck equation:

(15)

\begin{equation} \frac{\partial h(c,t)}{\partial t} = \frac{\tilde \mu }{2\delta } \frac{\partial }{\partial c} \!\left ( \!\left (1- \!\left ( \frac{\bar c}{c} \right )^\delta \right )c^{1+\delta }h(c,t)\right ) + \frac{\tilde \nu ^2}{2} \frac{\partial ^2}{\partial c^{2}}\!\left (c^{2+\delta }h(c,t)\right ), \end{equation}

that describes the evolution of the density of contacts $c\in \mathbb{R}_+$ in the limit of the quasi-invariant variations of followers.

We are now interested in a steady state solution of equation (15). In fact, for the time of dynamics which we aim to study, that is, the one related to the formation of opinions through social interactions on online platforms, one can reasonably suppose that the connectivity network is stationary being the time at which opinions about a given subject are shaped much faster than the changes in the network. Thus, one can observe that the equilibrium solution of equation (15) is a function solving the first-order differential equation:

(16)

\begin{equation} \frac{\tilde \nu ^2}{2} \frac{\mathrm{d}}{\mathrm{d}c}\!\left (c^{2+\delta }h(c)\right ) + \frac{\tilde \mu }{2\delta } \!\left (1- \!\left ( \frac{\bar c}{c} \right )^\delta \right )c^{1+\delta }h(c)= 0 \end{equation}

To find a solution to (16), we perform the change of variable $\rho (c) = c^{2+\delta } h(c)$ and by setting $\gamma = \tilde \mu/\tilde \nu ^2 = \mu/ \nu ^2$ , one can easily observe that the function $\rho (c)$ solves the following equation:

(17)

\begin{equation} \frac{\mathrm{d}\rho (c)}{\mathrm{d}c}= -\frac{\gamma }{\delta }\!\left (\frac{1}{c} - \frac{\bar c^\delta }{c^{1+\delta }}\right ) \rho (c). \end{equation}

The unique solution of (17) is then given by:

(18)

\begin{equation} h_\infty (c) = h_\infty (\bar c)\!\left (\frac{\bar c}{c}\right )^{2+\delta +\gamma/\delta }\text{exp} \left \{ -\frac{\gamma }{\delta ^2}\!\left (\!\left (\frac{\bar c}{c} \right )^\delta -1 \right )\right \}, \end{equation}

which is known as Amoroso distribution, corresponding to a particular class of the generalised Gamma distribution.

We focus now on a particular case, that is, the case in which $\delta \to 0$ and consequently $\beta =0$ , that is, the collision kernel is independent on the number of contacts. In this situation, the value function (7) degenerates to

(19)

\begin{equation} \Psi ^\varepsilon _0(s) =- \mu \frac{s^{-\varepsilon } - 1}{(1-\mu )s^{-\varepsilon } + 1 + \mu }=\!\left (\frac{\mu }{1-\mu }\right ) \frac{s^{\varepsilon }-1}{\frac{1+\mu }{1-\mu }s^{\varepsilon } +1}, \end{equation}

where now the following limit holds true

\begin{equation*} \lim _{\varepsilon \to 0} \frac {1}{\varepsilon }\Psi ^\varepsilon _0\!\left ( \frac {c}{\bar c}\right ) =\frac {\mu/(1-\mu )}{\frac {1+\mu }{1-\mu }+1} \text {ln}\!\left ( \frac {c}{\bar c}\right )=\frac {\mu }{2} \text {ln}\!\left ( \frac {c}{\bar c}\right ), \end{equation*}

so that the evolution of the observables, as $\varepsilon \to 0$ and in a case of a collision kernel which does not depend on the number of connections, is well described by the following Fokker–Planck type of equation in weak form:

(20)

\begin{equation} \frac{\mathrm{d}}{\mathrm{d}t} \int _{\mathbb{R}_+} \varphi (c) h (c,t) \mathrm{d}c = \int _{\mathbb{R}_+}\!\left ( -\varphi '(c)\frac{\tilde \mu }{2} \text{ln}\!\left ( \frac{c}{\bar c}\right )c + \frac{\tilde \nu ^2}{2} \varphi ''(c) c^2 \right )h(c,t) \mathrm{d}c, \end{equation}

where $\tilde \mu = \mu/\kappa$ and $\tilde \nu ^2 = \nu ^2/\kappa$ . Under again the zero flux hypothesis at the boundary, one gets a strong form of a Fokker–Planck type of equation:

(21)

\begin{equation} \frac{\partial h(c,t)}{\partial t} = \frac{\tilde \mu }{2} \frac{\partial }{\partial c} \!\left ( c \text{ln} \!\left (\frac{c}{\bar c}\right )h(c,t)\right ) + \frac{\tilde \nu ^2}{2} \frac{\partial ^2}{\partial c^2}\!\left (c^2h(c,t)\right ), \end{equation}

which equilibrium state now reads

(22)

\begin{equation} h_\infty (c) = \frac{1}{\sqrt{2 \pi \sigma } c}\text{exp} \left \{ -\frac{(\text{ln} c - \lambda )^2}{2\sigma }\right \}, \end{equation}

where $\gamma = \tilde \mu/\tilde \nu ^2$ and where we denoted $\sigma = 1/\gamma$ and $\lambda = \text{ln}\bar c - \sigma$ . Equation (22) is a log-normal probability distribution with mean and variance, respectively, given by:

\begin{equation*} m(h_\infty ) = \bar c e^{-\sigma/2}, \quad \text {Var}(h_\infty ) = \bar c^2(1-e^{-\sigma }). \end{equation*}

2.2. Contact distribution on Twitter and fitting

We discuss now the capability of our contact model to describe real networks. To that aim, we first collected data from Twitter in order to reconstruct a typical ensemble of connections. Successively, we estimated the parameters appearing in the general equilibrium state derived previously, (18) and (22), in such a way for our model to be as close as possible to real observable networks. The choice of using Twitter among other possible online platforms is motivated by the fact that one key characteristic of Twitter is to be more focused on staying informed and updated with respect, for instance, to Facebook which mainly aims at making friends. Thus, the first seemed more adapted to the study of opinion formation and modification with respect to the latter.

At the time we started this research, Twitter allowed academics to retrieve information about its users, given the IDs or usernames. Thanks to this possibility, the data have been collected through the user IDs of some of the most followed politicians from around the world, and successively by accessing their network, we retrieved the IDs of a million followers from each of their profiles. We then merged all these data, ignoring possible repetitions, and extracted one million profiles from such a set. We then gathered the number of followers of each participant in order to have a statistical representation of the distribution of connections over the platform. During this operation, we eliminated profiles with zero followers, assuming them as inactive. In Figure 2, we represent our dataset using a sample of $N_{\texttt{s}}=400$ accounts from the $N=10^6$ accounts extracted from Twitter. Sizes of the bubbles are proportional to the logarithm of agents’ contacts, where the information of the edges connection is reconstructed based on the statistical distribution of the connections. The fitting of the contact distribution arising from the data with the steady state solution (18)–(22) has been obtained by solving a nonlinear least-squares problem, through the Matlab function lsqcurvefit. The analysis of the best fit has been done using different choices for the contact distribution, namely log-normal, Amoroso, and Inverse Gamma distributions have been tested (each one corresponding to different values of the parameter $\delta$ appearing in the value function). In Figure 3 and in Table 1 we show the results of such a study obtained with different fitting functions. We can conclude that the best fit is obtained in the case in which the steady state distribution of the number of social media connections is distributed as a log-normal density, meaning that the parameter $\delta$ in Section 2 is equal to $0$ .

Figure 2. Representation of the social network using a sample of $N_{\texttt{s}}=400$ accounts from the $N=10^6$ dataset extracted from Twitter. Sizes of the bubbles are proportional to the logarithm of agents’ contacts, where edges are reconstructed based on the statistical distribution of the connections.

Figure 3. Comparison between the tails of the data distribution and the different possible equilibrium distributions of the Fokker–Planck models of Section 2.1.

We should remark that the log-normal distribution only has two parameters, while both the Amoroso and the Inverse Gamma have three parameters, and that the process of fitting is easier when fewer parameters have to be identified. Notice also that both the mean and the variance of the steady state depend on $\nu$ , the ratio between the variance of the random percentage of variation of followers, and $\mu$ , the maximal percentage allowed of possible variation of followers per interaction. Referring to (22), the values of the parameters resulting from the fitting process are $\lambda = 7.165 \times 10^{-1}$ and $\sigma = 8.882$ . In the sequel, we will then use a log-normal distribution to characterise the structure of the network.

3. Kinetic model of opinions and contacts

As done for the process of evolution of the number of social media contacts, we again start from the microscopic interactions between individuals interacting on a social platform to model the evolution of the distribution of opinions in time, as done for instance in [Reference Toscani49]. However, here we take into account the possibility that opinions of people having a large number of connections have a larger impact on the community and that consequently they more easily modify other opinions.

Table 1. Fitting of the contact distribution from Twitter data

3.1. The binary interaction

We start by associating the opinion of each agent with a variable $v \in I = [\!-\!1,1]$ . At the microscopic scale, we then suppose that binary interactions between individuals obey the following law:

(23)

\begin{equation} \begin{split} v' & = v + \alpha P(v,v_*,c,c_*)(v_*-v) +\xi D(v,c),\\[5pt] v'_{\!\!\ast} & = v_* + \alpha P(v_*,v,c_*,c)(v-v_*)+\xi _* D(v_*,c_*), \end{split} \end{equation}

where $v$ and $v_*$ are the agents’ opinions before the interaction, while $v'$ and $v'_{\!\!\ast}$ are their opinions after interacting. In (23), the function $P$ can be seen as the compromise propensity of the agent. In other words, as a consequence of the exchange of relative information, the two interacting agents change their opinions, in a symmetric or more in general non-symmetric way, approaching one the opinion of the other and vice versa. The function $D$ instead is responsible for diffusion effects, and it models the unpredictable role played by the environment. It is indeed multiplied by the random variable $\xi$ , with $\langle \xi \rangle =0$ , and $\langle \xi ^2 \rangle =\sigma ^2$ . Let us observe that in the general case depicted in (23), the post-interaction opinions depend on the number of connections of both participants to the interaction. We will detail this dependence later. We remark for the moment that the restriction of the binary interaction to the case in which the values assumed by the couple $(v',v'_{\!\!\ast})$ are independent of the number of contacts $(c,c_*)$ can be considered quite classical and it is discussed for instance in [Reference Albi, Pareschi, Toscani and Zanella5].

Let now $f(v,c,t)$ be the density of agents which at time $t \gt 0$ are represented by their opinion $v$ and have connection $c$ . The time evolution of the distribution of opinions/connections $f(v,c,t)$ , consequence of the binary interactions of type (23) among individuals acting on a social platform, is obtained by resorting to kinetic collision-like models [Reference Pareschi and Toscani45, Reference Toscani49]. This reads in weak form as:

(24)

\begin{equation} \begin{aligned} & \displaystyle \frac{d}{dt}\int _{I \times{\mathbb R}_+}f(v,c,t)\varphi (v,c)\,dv\,dc =\displaystyle \frac 12 \Big \langle \int _{I^2\times{\mathbb R}_+^2} \bigl (\varphi (v',c')+ \varphi (v'_{\!\!\ast},c'_{\!\!\ast})\cr &\qquad \qquad \qquad \qquad \displaystyle -\varphi (v,c)-\varphi (v_*,c_*) \bigr ) f(v_*,c_*,t)f(v,c,t) \,dv\,dv_*\,dc\,dc_* \Big \rangle. \end{aligned} \end{equation}

In (24), the post-interaction opinions $v'$ and $v'_{\!\!\ast}$ are given by (23), while the post-interaction connections are given by (1). The operator $\langle \cdot \rangle$ represents the mathematical expectation with respect to the random variables $\xi$ and $\eta$ . Let us observe that here we do not consider an interaction kernel depending on the number of contacts as done for instance in (8). This choice is driven by the fact that we aim to represent a specific situation when comparing the model to the experiments, namely the case in which the network is well described by a log-normal distribution as shown in Section 2.2. However, we stress that the extension to the case of kernels depending on $c$ is possible even if not discussed in the present work.

The opinion variable $v$ belongs to the bounded domain $[\!-\!1,1]$ , so it is important to only consider interactions that do not produce values outside of such domains. A sufficient condition to preserve the bounds is given by the following proposition.

Proposition 3.1. The binary interaction (23) preserves the bounds, that is, $v',v'_{\!\!\ast}\in [\!-\!1,1]$ if $v,v_*\in [\!-\!1,1]$ and if

(25)

\begin{equation} 0 \lt P(v,v_*,c,c_*) \leq 1, \quad 0\lt \alpha \leq 1/2, \quad |\xi | \leq (1-\gamma ^*)d \end{equation}

where

(26)

\begin{equation} \gamma ^* = \alpha \min _{\substack{v,v_* \in [\!-\!1,1], \\ c,c_*\gt 0}} P(v,v_*,c,c_*), \quad d = \min _{\substack{v\in [\!-\!1,1], \\ c\gt 0}} \left \{ \frac{1-|v|}{D(v,c)}, D(v,c) \neq 0\right \}. \end{equation}

Proof. Let us define $\gamma = \alpha P(v,v_*,c,c_*)$ . We first consider the case in which there is no diffusion, that is, $\xi = 0$ : we have that

\begin{equation*} |v'| = | v + \gamma (v_* - v)| \leq (1-\gamma )|v| + \gamma |v_*| \leq 1, \end{equation*}

since $|v|, |v_*| \leq 1$ and, under the hypothesis (25), we have that $0\lt \gamma \leq 1$ .

Let us now assume that $\xi \neq 0$ : we can write, using that $|v_*|\leq 1$ ,

\begin{equation*} |v'| = |v + \gamma (v_* - v) + \xi D(v,c)| \leq (1-\gamma )|v| + \gamma |v_*| + |\xi | D(v,c) \leq (1-\gamma )|v| + \gamma + |\xi | D(v,c). \end{equation*}

So, in order to have $|v'|\leq 1$ it is sufficient to require

\begin{equation*} |\xi | \leq \frac {(1-\gamma )(1-|v|)}{D(v,c)}, \end{equation*}

with $D(v,c)\neq 0$ , for all the possible values of $v$ and $c$ . Thus, defining $\gamma _*$ and $d$ as in (25), we get the result.

3.2. Fokker–Planck asymptotics

In order to model the fact that the formation of the opinions are due to a large number of interactions, each one producing a small change in the point of view of individuals up to the moment in which the final opinion is formed, we regularise the dynamics of the Boltzamnn-like equation (24) by relying on a quasi-invariant scaling. This computation permits us to get some insights on the behaviour of the model (24) by retrieving a Fokker–Planck equation for the asymptotic combined evolution of contacts and opinions. The quasi-invariant scaling is as follows:

(27)

\begin{equation} \alpha \to \varepsilon \alpha,\qquad \sigma ^2 \to \varepsilon \sigma ^2, \end{equation}

for $\varepsilon \ll 1$ , similarly to the relation (6) for the sole contacts dynamics.

We assume that the scaled random variables $\eta _\varepsilon$ , $\xi _{\varepsilon }$ , and $\xi _{\varepsilon *}$ are independent with zero mean and bounded moments at least of order $n=3$ . We also assume that $\xi _{\varepsilon },\xi _{\varepsilon *}$ are identically distributed, and that the following holds

(28)

\begin{equation} \langle \xi _{\varepsilon }\rangle =\langle \xi _{\varepsilon *}\rangle = 0,\quad \langle \xi _{\varepsilon }^2\rangle =\langle \xi _{\varepsilon *}^2\rangle = \varepsilon \sigma ^2,\quad \langle \xi _{\varepsilon }^3\rangle =\langle \xi _{\varepsilon *}^3\rangle = \varepsilon ^{3/2}\varrho, \end{equation}

and for $\eta _\varepsilon$ we have recalled the following:

(29)

\begin{equation} \langle \eta _{\varepsilon }\rangle = 0,\quad \langle \eta _{\varepsilon }^2\rangle = \varepsilon \nu ^2,\quad \langle \eta _{\varepsilon }^3\rangle = \varepsilon ^{3/2}\varkappa. \end{equation}

with $\rho$ and $\varkappa$ two assigned constants. To ease the notation, we now rewrite the value function (4) multiplied by the number of contacts as follows:

(30)

\begin{equation} c\Psi _\delta ^\varepsilon (c/\bar c) = \mu{L}_\varepsilon (c), \end{equation}

and the interaction function in (23) as:

\begin{equation*} E(v,v_*,c,c_*) = P(v,v_*,c,c_*)(v_*-v). \end{equation*}

Using the previous properties of the random quantities $\xi, \xi _*, \eta$ , the equations (1) for the contacts and (23) for the opinions we have the following:

(31)

\begin{equation} \begin{aligned} & \langle c' -c\rangle = \langle -\mu{L}_\varepsilon (c) + \eta _\varepsilon c \rangle = -\mu{L}_\varepsilon (c), \\[5pt] &\langle v' -v \rangle = \langle \alpha P(v,v_*,c,c_*)(v_*-v) +\xi D(v,c) \rangle = \varepsilon \alpha E(v,v_*,c,c_*), \end{aligned} \end{equation}

and

(32)

\begin{equation} \begin{aligned} & \langle (c' - c)^2 \rangle =\mu ^2{L}_\varepsilon (c)^2 + \varepsilon \nu ^2 c^2, \\[5pt] & \langle (v' -v)^2 \rangle = \varepsilon ^2\alpha ^2 E(v,v_*,c,c_*)^2 + \varepsilon \sigma ^2 D^2(v,c), \\[5pt] & \langle (c' -c)(v'-v) \rangle = -\varepsilon \alpha \mu{L}_\varepsilon (c) E(v,v_*,c,c_*) \end{aligned} \end{equation}

while the third-order terms are

(33)

\begin{equation} \begin{aligned} & \langle (c' - c)^3 \rangle = -\mu ^3{L}_\varepsilon (c)^3 + \varepsilon ^{3/2}\varkappa c^3 - 3\varepsilon \mu \nu ^2 L_\varepsilon (c) c^2, \\[5pt] & \langle (v' -v)^3 \rangle = \varepsilon ^3\alpha ^3E(v,v_*,c,c_*)^3 + \varepsilon ^{3/2}\varrho D^3(v,c) +3\varepsilon ^2 \alpha \sigma ^2 E(v,v_*,c,c_*) D(v,c)^2, \\[5pt] & \langle (c' -c)^2(v'-v) \rangle = \varepsilon \alpha E(v,v_*,c,c_*)(\mu ^2 L_\varepsilon (c)^2 + \varepsilon \nu ^2 c^2), \\[5pt] & \langle (c' -c)(v'-v)^2 \rangle =-\mu L_\varepsilon (c)( \varepsilon ^2\alpha ^2E(v,v_*,c,c_*)^2 +\varepsilon \sigma ^2 D^2(v,c) ). \end{aligned} \end{equation}

By expanding the smooth function $\varphi (x^*,v^*)$ in Taylor series up to order two, we have

(34)

\begin{equation} \begin{aligned} &\langle \varphi (v',c')-\varphi (v,c) \rangle = \cr &\quad \displaystyle \varepsilon \!\left ( \alpha E(v,v_*,c,c_*)\frac{\partial \varphi }{\partial v} - \mu \frac{L_\varepsilon (c)}{\varepsilon }\frac{\partial \varphi }{\partial c} + \frac 12 \sigma ^2 D(v,c)^2 \frac{\partial ^2 \varphi }{\partial v^2} + \frac 12 \nu ^2 c^2 \frac{\partial ^2 \varphi }{\partial c^2} \right ) \cr &+ \frac{\varepsilon ^2}2 \!\left ( \alpha ^2E(v,v_*,c,c_*)^2 \frac{\partial ^2 \varphi }{\partial v^2} + \mu ^2 \frac{L_\varepsilon (c)^2}{\varepsilon ^2} \frac{\partial ^2 \varphi }{\partial c^2} - \mu \alpha \frac{L_\varepsilon (c)}{\varepsilon } E(v,v_*,c,c_*)\frac{\partial ^2 \varphi }{\partial v \partial c} \right )\cr &\qquad \qquad +R_\varepsilon (v,v_*,c,c_*), \end{aligned} \end{equation}

where the remainder of the Taylor expansion $R_\varepsilon (v,v_*,c,c_*)$ is expressed as follows:

(35)

\begin{equation} \begin{aligned} &R_\varepsilon (v,v_*,c,c_*) = \displaystyle \frac{\varepsilon ^2}{6} \frac{\partial ^3 \varphi }{\partial c^3}(\hat v, \hat c) \!\left ( -\mu ^3 \frac{{L}_\varepsilon (c)^3}{\varepsilon ^2} + \varepsilon ^{-1/2}\varkappa c^3 - 3 \mu \nu ^2 \frac{ L_\varepsilon (c)}{\varepsilon } c^2 \right ) \\[5pt] & \, + \frac{\varepsilon ^2}{6} \frac{\partial ^3 \varphi }{\partial v^3} (\hat v, \hat c) \!\left ( \varepsilon \alpha ^3E(v,v_*,c,c_*)^3 + \varepsilon ^{-1/2}\varrho D^3(v,c) +3 \alpha \sigma ^2 E(v,v_*,c,c_*) D(v,c)^2\right )\\[5pt] & \quad + \frac{\varepsilon ^2}{2}\frac{\partial ^3 \varphi }{\partial v\partial c^2}(\hat v, \hat c)\!\left ( \alpha E(v,v_*,c,c_*)\!\left (\mu ^2 \frac{L_\varepsilon (c)^2}{\varepsilon } + \nu ^2 c^2\right )\right ) \\[5pt] & \quad \quad - \frac{\varepsilon ^2}2 \frac{\partial ^3 \varphi }{\partial v^2\partial c}(\hat v, \hat c) \!\left (\mu \frac{L_\varepsilon (c)}{\varepsilon }\!\left ( \varepsilon \alpha ^2E(v,v_*,c,c_*)^2 +\sigma ^2 D^2(v,c) \right )\right ), \end{aligned} \end{equation}

for $\hat v = \theta _v v'+(1-\theta _v)v$ with $\theta _v \in [0,1]$ and $\hat c=\theta _c c'+(1-\theta _c)c$ with $\theta _c \in [0,1]$ .

Hence, scaling the time variable $\tau = \varepsilon t$ and using the expansion (34) in (24), the solution $f_\varepsilon$ satisfies the weak relation:

(36)

\begin{equation} \begin{aligned} &\displaystyle \frac{d}{d\tau }\int _{I \times{\mathbb R}_+}f_\varepsilon (v,c,\tau )\varphi (v,c)\,dv\,dc =\displaystyle \int _{I\times{\mathbb R}_+} \!\left ( \mathcal{E}[f_\varepsilon ](v,c,\tau )\frac{\partial \varphi }{\partial v} - \frac{\mu }{\varepsilon } \Psi _\delta ^\varepsilon (c/\bar c) c \frac{\partial \varphi }{\partial c} \right. \\[5pt] &\qquad \quad \displaystyle \left .+ \frac 12 \sigma ^2 D^2(v,c) \frac{\partial ^2 \varphi }{\partial v^2} + \frac 12 \nu ^2 c^2 \frac{\partial ^2 \varphi }{\partial c^2} \right )f_\varepsilon (v,c,\tau )\,dv\,dc+ \mathcal{R}_\varepsilon (\varphi ), \end{aligned} \end{equation}

where we introduced the following notation for the non-local operator:

\begin{equation*} \mathcal{E}[f_\varepsilon ](v,c,\tau ) =\alpha \int _{I\times {\mathbb R}_+} E(v,v_*,c,c_*) f_\varepsilon (v_*,c_*,\tau )\,dv_*\,dc_*, \end{equation*}

and where the scaled reminder is

(37)

\begin{equation} \begin{aligned} &\mathcal{R}_\varepsilon (\varphi ) = \frac{\varepsilon }2\int _{I^2\times{\mathbb R}_+^2} \!\left ( \alpha ^2E(v,v_*,c,c_*)^2 \frac{\partial ^2 \varphi }{\partial v^2} + \mu ^2 \frac{L_\varepsilon (c)^2}{\varepsilon ^2} \frac{\partial ^2 \varphi }{\partial c^2}\right .\cr &\left .\quad \quad - \mu \alpha \frac{L_\varepsilon (c)}{\varepsilon } E(v,v_*,c,c_*)\frac{\partial ^2 \varphi }{\partial v \partial c} \right )f_\varepsilon (v,c,\tau )f_\varepsilon (v_*,c_*,\tau ) \,dv\,dv_*\,dc\,dc_*\cr &\qquad \qquad + \frac{1}{\varepsilon }\int _{I^2\times{\mathbb R}_+^2} R_\varepsilon (v,v_*,c,c_*)f_\varepsilon (v,c,\tau )f_\varepsilon (v_*,c_*,\tau ) \,dv\,dv_*\,dc\,dc_*. \end{aligned} \end{equation}

For $\varepsilon \to 0$ we recall that from (30) and (35), we have the following:

\begin{equation*} \begin {aligned} &L_\varepsilon (c)\to 0,\qquad {L_\varepsilon (c)}/{\varepsilon } \to \Phi _\delta (c),\qquad {R_\varepsilon (v,v_*,c,c_*)}/{\varepsilon } \to 0, \end {aligned} \end{equation*}

where

(38)

\begin{equation} \Phi _\delta (c)\;:\!=\; \begin{cases} \displaystyle \frac{\mu }{2\delta } \!\left (1- \!\left ( \frac{ c}{\bar c}\right )^{-\delta } \right )c,\quad 0\lt \delta \leq 1\\[5pt] \\[5pt] \displaystyle \frac{\mu }{2} \text{ln}\!\left ( \frac{c}{\bar c}\right )c,\,\qquad \delta \to 0. \end{cases} \end{equation}

Hence, in the limit $\varepsilon \to 0$ , the reminder (37) vanishes and the equation (36) collapses to the weak form of the following equation:

(39)

\begin{equation} \displaystyle \frac{\partial f}{\partial \tau }=-\frac{\partial \!\left (\mathcal{E}[f](v,c,\tau ) f \right )}{\partial v} + \frac{\partial ( \Phi _\delta (c)f)}{\partial c}+ \frac 12 \sigma ^2 \frac{\partial ^2 (D^2(v,c)f)}{\partial v^2} + \frac 12 \nu ^2 \frac{\partial ^2( c^2f)}{\partial c^2}. \end{equation}

Equation (39) is the model we will use in the sequel to describe the time evolution of opinion formation over a social network. In particular, in the final part of Section 4 focusing on a specific platform, namely Twitter, we will fit the parameter in our model with experimental data with the scope of describing a realistic phenomenon.

3.3. On the steady state solution for the opinion distribution

In the general case, the steady state of equation (39) is not known. However, it is possible to compute an explicit formula for the asymptotic solution under some particular assumptions. Let us assume that $D(v,c) = 1 - v^2$ and $P(v,v_*,c,c_*)=1$ so that

\begin{equation*}\begin {array}{rcl} \mathcal{E}[f](v,c,\tau ) &=& \displaystyle \alpha \!\left (\int _{I\times {\mathbb R}_+} v_* f(v_*,c_*,\tau )\,dv_*\,dc_* - v \right ) =\alpha \!\left ( m_v(\tau ) - v \right ). \end {array} \end{equation*}

In this situation, one can look to solutions of type $f(v,c,t) = g(v,t)h(c,t)$ leading to asymptotic of the form $f_\infty (v,c)=g_\infty (v) h_\infty (c)$ , where $h_\infty$ is the asymptotic state of the social contact distribution derived in Section 2. Under the above hypothesis, the Fokker–Planck equation (39) can be rewritten as:

(40)

\begin{equation} \begin{array}{rcl} \displaystyle \frac{\partial g}{\partial \tau }h + \displaystyle \frac{\partial h}{\partial \tau }g &=&-\displaystyle \alpha \frac{\partial ( (m_v(\tau ) - v) g)}{\partial v}h + \frac{\mu }{2} \frac{\partial \!\left (c \text{ln} \!\left (\frac{c}{\bar c}\right )h\right )}{\partial c}g \\[5pt] & & \displaystyle + \frac 12 \sigma ^2 \frac{\partial ^2 ((1-v^2)^2g)}{\partial v^2} h + \frac 12 \nu ^2 \frac{\partial ^2( c^2h)}{\partial c^2}g, \end{array} \end{equation}

leading to

(41)

\begin{multline} \!\left (\frac{\partial g}{\partial \tau } + \alpha \frac{\partial ( (m_v(\tau ) - v) g)}{\partial v}- \frac 12 \sigma ^2 \frac{\partial ^2 ((1-v^2)^2g)}{\partial v^2} \right )h \\[5pt] +\displaystyle \!\left (\frac{\partial h}{\partial \tau } - \frac{\mu }{2} \frac{\partial \!\left (c \text{ln} \!\left (\frac{c}{\bar c}\right )h\right )}{\partial c} -\frac 12 \nu ^2 \frac{\partial ^2( c^2h)}{\partial c^2} \right )g= 0. \end{multline}

Non-trivial solution for equation (41) are retrieved for

(42)

\begin{equation} \frac{\partial g}{\partial \tau } = - \alpha \frac{\partial ( (m_v(\tau ) - v) g)}{\partial v} + \frac 12 \sigma ^2 \frac{\partial ^2 ((1-v^2)^2g)}{\partial v^2} \end{equation}

and

(43)

\begin{equation} \frac{\partial h}{\partial \tau } = \frac{\mu }{2} \frac{\partial \!\left (c \text{ln} \!\left (\frac{c}{\bar c}\right )h\right )}{\partial c} + \frac 12 \nu ^2 \frac{\partial ^2( c^2h)}{\partial c^2}. \end{equation}

Thus stationary solutions for (39) are of the form $f_\infty (v,c) = g_\infty (v)h_\infty (c)$ as claimed before, where for (43) we obtain the log-normal distribution (22), while in order to compute the stationary solution to (42), one has to solve

(44)

\begin{equation} \frac{\mathrm{d}((1-v^2)^2g_\infty )}{\mathrm{d}v} = \frac{2\alpha }{\sigma ^2}( (\bar m_v - v) g_\infty ),\qquad \bar m_v = \int _{I\times{\mathbb R}_+} v_* g_\infty (v_*)\,dv_*. \end{equation}

The solution to (44) is given by:

\begin{equation*} g_\infty (v) = K_\infty (1+v)^{-2+\alpha \bar m_v/2\sigma ^2}(1-v)^{-2-\alpha \bar m_v/2\sigma ^2}\text {exp}\left \{ -\frac {\alpha (1-\bar m_v v)}{\sigma ^2(1-v^2)}\right \}, \end{equation*}

where $K_\infty$ is a normalisation constant, such that the total mass of $g_\infty$ is equal to $1$ . Figure 4 shows the comparison between the analytical profile of $g_\infty (v)$ obtained in the case $\alpha = 0.1$ , $\sigma ^2 = 0.1$ and $\alpha = 0.25$ , $\sigma ^2 = 0.05$ , $\bar c = 1$ , $\mu = 0.1$ , $\nu ^2 = 0.0125$ and the numerical simulations of equation (24) through a Monte Carlo method which details are outlined in the Appendix A. In the simulation, we choose the scaling parameter $\varepsilon = 0.01$ in order to retrieve the Fokker–Planck asymptotic from the Boltzmann-type equation (24).

Figure 4. Profiles of the steady state solution $g_\infty (v)$ and its numerical approximation in the case of $\sigma ^2/\alpha = 1$ (left) and $\sigma ^2/\alpha = 0.2$ (right), both with scaling parameter $\varepsilon = 0.01$ .

4. Numerical experiments

In order to get insights about the qualitative behaviour of our new model and to validate it, we perform in the sequel different numerical simulations using a Monte Carlo-like approach for approximating the Boltzmann equation (24) in the Fokker–Planck regime (39). The details of the numerical scheme employed are given in the Appendix A. We will always assume, if not otherwise stated, that in the rest of the section, the interaction kernel $P$ can be expressed as:

(45)

\begin{equation} P(v,v_*,c,c_*)=H(v,v_*,c,c_*)K(c,c_*) \end{equation}

for various choices of functions $H$ and $K$ . Moreover, based on the experimental finding of Section 2.2, we will restrict ourselves to the case $\delta \to 0$ in (38) which corresponds to the situation in which a log-normal distribution of connections describes the network at equilibrium.

4.1. Qualitative behaviour

We start by discussing some qualitative behaviours of the model presented through the use of numerical simulations. We first consider a bounded confidence model, in which the propensity to consensus is influenced by the number of connections. Furthermore, in the second test, we consider a confidence bound depending also on the number of contacts, where we compare homogeneous with heterogeneous cases. In a final test, we study a case in which a part of the population acting on a social network is composed of almost inflexible individuals.

4.1.1. Test 1: Bounded confidence model

In this first test, we build up a bounded confidence model [Reference Deffuant, Neau, Amblard and Weisbuch21, Reference Hegselmann and Krause39] by performing the following choices:

(46)

\begin{equation} H(v,v_*,c,c_*)=\chi _{\{|v-v_*|\lt \Delta \}}(v_*), \qquad K(c,c_*)=\frac{c_*^2}{c^2+c_*^2}, \end{equation}

where $\chi (\!\cdot\!)$ is the indicator function and $\Delta$ is a positive constant. The diffusion part is weighted by $D(v,c,c_*) = 1-v^2$ . In the chosen setting, the interactions take place only if two individuals have sufficiently close opinions. Moreover, we give a higher relevance, in driving the opinions process, to the agents having more connections, that is, the influencers. Instead, individuals with few followers are less likely to be able to change the point of view of the other participants while prone to change their own opinion. In order to stress the importance of the presence of contacts and their relevance in modifying the evolution of the joint density $f(v,c,t)$ , we start from two different initial data: in the first simulation, the initial data is given by:

\begin{equation*} f_0(v,c) = \frac {1}{2}h_\infty (c), \end{equation*}

meaning that the opinion is uniformly distributed in the interval $I=[\!-\!1,1]$ , and the contacts are at the equilibrium (22) with parameters $\lambda = 5$ and $\sigma = 1.56\cdot 10^{-2}$ . In Figure 5, the initial data is shown in the first image, followed by the images of the resulting density $f(v,c,t)$ for different times, namely $t = 4, 8, 12, 16, 20$ . We sample $N_s=10^5$ agents to simulate model (24), with scaling parameter $\varepsilon = 0.01$ . In Figure 5, we clearly see the segmentation of the opinion which is due to the presence of a bounded confidence interaction with $\Delta = 0.55$ , but over time the two clusters merge again, the consensus is reached and we notice the formation of a single cluster in $v=0$ .

Figure 5. Test $1$ , $\sigma ^2/\alpha = 0.005$ . The pictures show the time evolution of the distribution function $f(v,c,t)$ for $t=0,4,8,12,16,20$ for a homogeneous distribution of the number of connections with respect to opinions. After the emergence of two clusters, the agents reach consensus at the final time.

In the second simulation, we suppose that contacts are still at the equilibrium (22) with the same parameters $\lambda$ and $\sigma$ as in the previous simulation, but the opinions are now distributed so that agents with a low number of connections have an opinion closer to $-1$ , while agents with a higher number of contacts have an opinion closer to $+1$ . In Figure 6, the initial data is shown in the first image, followed by the images of the resulting density $f(v,c,t)$ for $t = 4, 8, 12, 16, 20$ . We see the emergence of two clusters, but the symmetry of the previous case is lost: agents with a lower number of contacts are influenced by agents with a higher bound of contacts and behave like followers, whereas agents with a higher bound of contacts are less influenced by lower-contact agents. We stress that the dynamics observed in these two cases are different from the standard dynamics obtained with a bounded confidence opinion model where connections do not play a role. In fact, in this case, the steady state solution is represented by a bimodal distribution similar to the one obtained in Figure 6 for $t=8$ .

Figure 6. Test $1$ , $\sigma ^2/\alpha = 0.005$ . The pictures show the time evolution of the distribution function $f(v,c,t)$ for $t=0,4,8,12,16,20$ in the case of a non-homogeneous distribution of the number of connections with respect to opinions. After the emergence of two clusters, the agents reach consensus at the final time in the positive opinion region.

4.1.2. Test 2: Heterogeneous confidence bound

In this second test, we modify the previous experimental setting introducing confidence bound $\Delta \equiv \Delta (c,c_*)$ in (46) as a function of the contact numbers of the interacting agents. We assume that the confidence bound is above the threshold $1/2$ if the number of contacts of the interacting agent is larger, that is, $c_*\gt c$ , on the other hand, for a lower number of contacts $c_*\lt c$ the confidence bound becomes lower than $1/2$ . To model this behaviour, we consider the following function:

(47)

\begin{equation} \Delta (c,c_*) = \frac{c_*}{c+c_*}. \end{equation}

We assume as initial datum $f^0(v,c)$ the uniform distribution on $ [\!-\!1,1] \times [0,1]$ . Simulations of the opinion dynamics (24) is performed using $N_s= 10^5$ agents and scaling parameter $\varepsilon = 0.01$ . The evolution of social contacts (1), which initially is not at the stationary state, is performed with parameters $\mu = 10^{-1}$ and $\nu ^2 = 0.0125$ .

We compare the case with heterogeneous confidence bound as in (47), with the case with homogeneous confidence bound equal to $\Delta (c,c_*)=1/2$ . In Figure 7, we report the resulting densities $f(v,c,t)$ for $t = 2$ on the left, $t=4$ in the centre, and $t=8$ on the right, where the rows are relative to the homogeneous and to the heterogeneous case, respectively, for top row and bottom row. In the top row, we observe the emergence of two clusters one centred around $-0.5$ and the second one centred around $0.5$ , independent of the number of contacts. The bottom row reports the case with heterogeneous bound $\Delta (c,c_*)$ , where we observe that agents with a high level of connections tend to maintain their opinions in time due to the lower values of $\Delta (c,c_*)$ , instead agents with lower connections reach consensus around the central opinion $v=0$ , since $\Delta (c,c_*)$ is larger.

Figure 7. Test $2$ , $\sigma ^2/\alpha = 0.005$ . The pictures show the distribution $f(v,c,t)$ for $t=4$ (left), $t=6$ (centre), and $t=8$ (right). Top row: constant bound of contacts ( $\Delta (c,c_*)=0.5)$ ), two main clusters emerge at any bound of contacts. Bottom row: heterogeneous confidence bound ( $\Delta (c,c_*)=$ in (47)), consensus is reached for agents with a low number of contacts, whereas for higher bound of contacts two main clusters emerge.

4.1.3. Test 3: Sznajd-type dynamics

We start now with a distribution of opinions given by the sum of two Gaussian distributions centred at different locations in $[\!-\!1,1]$ mimicking a clustering of opinion with respect to a given subject. The initial condition is

\begin{equation*} f_0(v,c) = K_0\begin {cases} \frac {1}{30} \text {exp} \{-(v + \frac {3}{4})^2/(2\sigma _1^2)\},\quad & \text {if } 100 \leq c \leq 130 \\[5pt] \frac {1}{30} \text {exp} \{-(v - \frac {3}{4})^2/(2\sigma _2^2)\},& \text {if } 170 \leq c \leq 200 \\[5pt] 0 &\text {otherwise,}\end {cases} \end{equation*}

with $K_0$ positive constant such that $f_0(v,c)$ has total mass equal to $1$ . The interaction kernels are such that

\begin{equation*}H(v,v_*,c,c_*) = (1-v^2), \qquad K(c,c_*) = \frac {c_*^2}{(c+c_*)^2},\end{equation*}

and with a diffusion term proportional to $D(v,c,c_*) = 1-v^2$ . The interaction function is chosen according to an approximation of Sznajd dynamics [Reference Sznajd-Weron and Sznajd48, Reference Toscani49] in such a way that individuals may interact with everyone on the social network, while alignment towards a given opinion is more frequent for people having weak opinions and less probable for individuals having strong believes (positive or negative). The role played by the number of connections in the function $K(c,_*)$ is similar to the previous cases even if now its impact is more important: agents with a higher number of social media contacts tend to only slightly modify their opinion over time, while the ones with a low number of contacts are more influenced and tend to align their opinion with one of the most popular persons. The simulation uses $N_s = 10^5$ agents in the time interval $[0,T] = [0,6]$ . Figure 8 shows the initial condition on the left and the resulting density $f(v,c,t)$ for $t=3$ and $t=6$ , respectively, on the centre and the right, using $\sigma _1^2=\sigma _2^2=0.005$ and a scaling parameter $\varepsilon = 0.01$ . For the evolution of the social contact dynamics (1), we account for the following parameters $\mu = 10^{-2}$ and $\nu = 3.54\cdot 10^{-2}$ . We observe that agents with a lower number of contacts are strongly influenced by agents with a higher number of contacts and behave like followers, whereas agents with higher bound of contacts are mildly influenced by lower-contact agents. The result of such dynamics is that individuals with a negative opinion at the beginning are driven towards a positive one over time, while people with a large number of contacts only slightly move towards the centre without really modifying their believes.

Figure 8. Test $2$ , $\sigma ^2/\alpha = 0.005$ . The pictures show the time evolution of the distribution function $f(v,c,t)$ for $t=0$ (left), $t=3$ (centre), and $t=6$ (right). Agents with lower bound of connections are strongly influenced by agents with a large number of connections.

4.2. Quantitative analysis

In this last part, we use our model to quantitatively represent the opinion dynamics for what concerns some extrapolated real data taken from Twitter. To that aim, we start by describing the way in which the data are pre-processed through a so-called SA with the scope of obtaining a set of information that can be effectively used in comparison with the model outcomes.

4.2.1. Twitter sentiment analysis

SA (or opinion mining) is a subfield of NLP which works as a text classification tool that analyses text data and extracts its intent, meaning if the underlying sentiment is positive, negative, or neutral. It is widely used for commercial purposes in order to monitor brand and product sentiment in customer feedback and understand customer needs. SA methods can be divided into two main categories: statistical methods based on machine learning algorithms and knowledge-based methods. Knowledge-based SA approaches rely on a list of words called sentiment lexicon labelled as positive or negative, but they typically do not include sentiment-bearing lexical items as acronyms, emoticons or slang terms (which are widely used in social media texts). Moreover, they do not account for differences in the sentiment intensity of words. Since many applications would benefit from determining not only the binary polarity but also the strength of the sentiment, some sentiment intensity lexicons have been implemented in the past that associate a sentiment valence to the words and help measuring the intensity expressed in the sentences. This sentiment valence translates into a value called polarity which ranges between $-1$ and $1$ which is very well adapted to be employed in synergy with our model. One of the main problems of the natural language classifiers is that manually creating and validating a comprehensive sentiment lexicon is time-intensive, so typically some machine learning approaches are incorporated to improve efficiency. These approaches come also with some drawbacks since they require intensive training data which are usually hard to acquire, they rely heavily on the vastness of the training set, they remain expensive in terms of CPU use, and they often make use of ‘black boxes’ not easily interpretable and, therefore, not easily modifiable or generalisable.

In our analysis, we use VADER (Valence Aware Dictionary for sEntiment Reasoning), presented by C.J. Hutto and E. Gilbert in [Reference Hutto and Gilbert40] in 2014. It uses a combination of qualitative and quantitative methods to build a list of lexical features that allow to perform SA. This engine is specially constructed to give reliable results on social media texts and does not require a training dataset, since it is developed on a valence-based human-curated lexicon. VADER’s lexicon incorporates pre-existing well-established word banks and several lexical features typical of social media, such as emoticons, acronyms, initialisms, and commonly used slang terms that are sentiment-related. These features were rated on a scale from $-4$ ‘extremely negative’ to $+4$ ‘extremely positive’ (normalised to $[\!-\!1;+1]$ in Python) using a ‘wisdom-of-the-crowd’ approach, meaning that the ratings were given starting from the collection of answers of a series of independent human raters. VADER’s developers then used a deep qualitative analysis resulting in isolating five generalised heuristics based on grammatical and syntactic cues to determine the sentiment intensity of short sentences. In the next sections 4.2.2 and 4.2.3, we use VADER to perform SA on two different datasets of actual tweets.

4.2.2. Test 4: Trump re-admission on Twitter

For this simulation, we used the Application Programming Interfaces (API) of Twitter to obtain the content of a certain number of tweets on a given topic. More specifically, we used the words ‘Donald Trump’ and some related hashtags to select tweets written (in English) some days after the re-admittance of Donald Trump on Twitter on the 20 November 2022, after an almost two-year-long ban from the platform. We then employed VADER to analyse the texts of the tweets and we obtained a rating between $-1$ and $1$ for each tweet, which we considered to be the agents’ opinions on the subject. To outline the polarised situation we remove tweets, which have scored exactly ‘0’ through VADER analysis.

To perform the model calibration, we introduce a class of interacting functions, where we make explicit the dependency with respect to a new set of parameters $\theta \in \Theta \subseteq \mathbb{R}^4_+$ as follows:

(48a)

\begin{equation} P(v,v_*,c,c_*;\;\theta ) = H(v,v_*,c,c_*;\;\theta )K(c,c_*;\;\theta ), \end{equation}

with

(48b)

\begin{equation} H(v,v_*,c,c_*;\;\theta ) = \chi (|v-v_*|\lt \Delta (c,c_*;\;\theta )), \end{equation}

where

(48c)

\begin{equation} \Delta (c,c_*;\;\theta ) = \theta _1\!\left (\frac{ \log\!(1+c_*)}{\log\!(1+c_*)+\log\!(1+c)}\right )^{\theta _2}, \end{equation}

and

(48d)

\begin{equation} K(c,c_*;\;\theta ) = \left (\frac{ \log\!(1+c_*)}{\log\!(1+c_*)+\log\!(1+c)}\right )^{\theta _3}, \end{equation}

while the diffusion is weighted by:

(48e)

\begin{equation} D(v,c;\;\theta ) = \theta _4\sqrt{1-|v|^2}. \end{equation}

The initial data is well prepared, assuming that the distribution of connections is at the stationary state (22) with parameters estimated in 1, and the joint distribution of opinions and contacts is such that

\begin{equation*} f_0(v,c) = \begin {cases} \frac {h_\infty (c)}{2}, \quad &\mbox {if } 50 \lt c \leq 7500\\[5pt] \frac {h_\infty (c)}{0.2\sqrt {2 \pi }}e^{-\frac {1}{2}\!\left (\frac {v + 0.5}{0.2} \right )^2}, &\mbox {if } c \gt 7500. \end {cases} \end{equation*}

Finally, we identify the distribution obtained from the data with $\hat g(v,t)$ , while $g(v,t)$ is the one obtained simulating the virtual dynamics of particles. Then, we search the optimal value of the parameters $(\theta _1,\theta _2,\theta _3,\theta _4)\in \Theta$ , where $ \Theta =[0.5,1.5]\times [0.2,0.7]\times [1.5,2.5]\times [8,11],$ by minimising the $\ell _1$ distance at final time $T= \{20\}$ of the marginal distribution of the simulated opinions $g(v,T)$ and the one reconstructed from data $\hat g(v,T)$ .

The minimisation is performed using patternsearch() routine in matlab with initial guess $\theta ^{(0)}=(1.1,0.4, 2.0, 10)$ and reaching convergence after $k =60$ iterations, where the estimated parameter is $\theta ^{(k)}=(0.6313,0.2047, 2.3125; 9.7510)$ . Here, we minimise the discrepancy measure as the $\ell _1$ distance between the marginal distribution $g(\cdot,T|\theta ^{(k)})$ and the marginal distribution reconstructed from data $\hat g(\cdot,T)$ , the final value is $\mathcal{D}_1(g(\cdot,T|\theta ^{(k)}),\hat g(\cdot,T)) = 6.376\times 10^{-2}$ . The minimisation procedure is further detailed in the Appendix.

In Figure 9, we depict the comparison between our results and the data: on the left, we have the marginal of the opinions at time $t=20$ and the data, while on the right we have $f(v,c,t)$ (represented as $\log\!(f(v,c,t)+0.025)$ ) compared with the actual data extracted from Twitter (represented by the orange dots). We can claim that the model is capable of fitting the results obtained from a SA with good accuracy.

Figure 9. Test $4$ : marginal distribution of opinions at the final time (left) and the comparison between the reconstructed density of opinions and contacts and the real dataset.

4.2.3. Test 5: Climate change trends on Twitter

For this last simulation, we used the information coming from a pre-existing dataset ([Reference Littman and Wrubel43]) containing the IDs of tweets discussing the climate change. Such data were collected from Twitter’s API between 21 September 2017 and 17 May 2019, using as track parameters some keywords related to the subject, such as ‘climate change’, ‘global warming’ and hashtags like ‘ $\#$ climatechangeisreal’, ‘ $\#$ climatechangeisfalse’ or ‘ $\#$ globalwarminghoax’. We used VADER to perform the SA on the tweets collected on 13, 14, 15, 16 and 18 August 2018, and we remove from the dataset tweets with SA score exactly equal to “ $0$ ”, to reconstruct the data trend starting from a fictitious initial distribution of the opinions.

Similarly to the previous test 4.2.2, we consider the interaction functions reported in (48), and with well-prepared initial data, where we assume that the marginal distribution of the contacts is at the stationary state (22), and that the joint initial density of opinions and contacts is given by:

\begin{equation*} f_0(v,c) = \begin {cases}\frac {h_\infty (c)}{154 \sqrt {2\pi }}\!\left (55\sqrt {2\pi } + 200e^{-\frac {1}{2}\!\left (\frac {v + 0.35}{0.15} \right )^2} + 28e^{-\frac {1}{2}\!\left (\frac {v - 0.25}{0.5} \right )^2}\right ), \quad &\mbox {if } c \lt 40\\[5pt] \frac {h_\infty (c)}{6\sqrt {2\pi }}\!\left (250e^{-\frac {1}{2}\!\left (\frac {v + 0.8}{0.004} \right )^2} + 20e^{-\frac {1}{2}\!\left (\frac {v + 0.3}{0.2} \right )^2} + 5e^{-\frac {1}{2}\!\left (\frac {v - 0.3}{0.2} \right )^2} \right ), &\mbox {if } 40 \leq c \leq 400, \\[5pt] \frac {5 h_\infty (c)}{\sqrt {2 \pi }}e^{-\frac {1}{2}\!\left (\frac {v - 0.4}{0.2} \right )^2}, &\mbox {if } c \gt 400. \end {cases} \end{equation*}

We assume that the data are referred to the following numerical time $t_m = \{1, 2, 3, 4, 11\}$ and we denote by $\hat g(v,t)$ the empirical marginal distribution of the opinions obtained from Twitter, meaning that $\hat g(v,t_1)$ refers to 13 August, $\hat g(v,t_2)$ refers to 14 August and so on.

Hence, to obtain the optimal value of the parameters $(\theta _1,\theta _2,\theta _3,\theta _4)\in \Theta$ in the admissible space:

\begin{equation*} \Theta = [0.5, 1]\times [0.1, 1.5]\times [0.01, 2]\times [0,0.05]. \end{equation*}

we use the fmincon() matlab routine to minimise the discrepancy between data-reconstructed and simulated marginal distribution of opinions computing the sum over $t_m, m = 1,\ldots,5,$ of the $1-$ Wasserstein distances $\mathcal{W}^1_1(g(\cdot,t_m|\theta ),\hat g(\cdot,t_m))$ as follows:

\begin{equation*}\mathcal{D}_1(g(\cdot |\theta ),\hat g(\!\cdot\!)) = \frac {1}{M}\sum _{m=1}^M\mathcal{W}^1_1(g(\cdot,t_i|\theta ),\hat g(\cdot,t_i)), \qquad M=5.\end{equation*}

The chosen method reaches convergence after $k=11$ iterations with initial guess $\theta ^{(0)}=(0.75,1.25,0.65,0.03)$ and the estimated values of the parameters are $\theta ^{(k)} = (0.7432,1.0735,0.9295,0.0306)$ , resulting in a discrepancy of value $\mathcal{D}_1(g(\cdot |\theta ),\hat g(\!\cdot\!)) =4.893\times 10^{-1}$ . We outline that the choice of a good initial guess, in this case, is of paramount importance.

In Figure 10, we depict the evolution of the marginal distribution of opinions compared to the data, while Figure 11 shows the initial and terminal density of opinions and contacts. Again we can claim that we are able to follow the main trend of the opinion during the time even if compared to the previous situation of Test 4 in which the fitting was only about a given, supposed steady, state, here the differences are quite large in some time frameworks.

Figure 10. Test 5: comparison between the marginal distribution of the opinions reconstructed using the presented model and the data at each time step $t \in \{1,2,3,4,11\}$ corresponding to data relative to the 13, 14, 15, 16, and 18 August 2018.

Figure 11. Test $5$ : initial (left) and final (right) joint density of opinions and contacts (the images show $\log\!(f(v,c,t) + 0.025)$ ).

5. Conclusions

In this work, we have proposed a new model for opinion formation and evolution in presence of social media connections. Starting from a set of microscopic interactions characterising the behaviour of individuals acting on a social media platform, we first constructed a model for the network formation and then we tested the validity of our hypothesis through a comparison with a real dataset extrapolated from Twitter. A fitting procedure permitted to recover the best set of parameters, which employed in our model is able to describe a network of people exchanging ideas and information over the net. In the second part, we concentrated on the relationship between opinion and connections, and through a Boltzmann-like approach we recovered an equation which can describe opinion dynamics over a social network. Through a grazing limit procedure, we then obtained a Fokker–Planck asymptotic limit equation from which the main features of the model can be more clearly understood. In the third part, we first performed some numerical simulations with the scope of showing some of the qualitative features of this new model and finally thanks to SA tools which permitted us to obtain realistic distributions of opinions on a certain topic from a given social media platform, we have shown that our model is indeed able to describe such dynamics. The results have been obtained thanks to a Wasserstein minimisation method which permitted to estimating the best set of parameters of a given interaction kernel in the alignment opinion term. Possible future developments consist in improving the data-driven aspects of the model here presented by, for instance, replacing the parameter-dependent interaction kernel with its full reconstruction, that is, without assuming the a priori knowledge of its mathematical expression. The second direction that is worth to be explored regards the improvement of the model by assuming additional dependence on the opinion from the knowledge/education of the individuals.

Competing interests

The authors declare none.

Appendix A: Particle-based kernel calibration

In this section, we report the numerical procedure used in Section 4, in particular for the calibration problem of the kernel parameters (48), introduced in sections 4.2.2 and 4.2.3. To this end, we aim at minimising the discrepancy between the opinion density obtained from numerical simulation and the score obtained from the SA performed over data extracted from Twitter.

Thus, we formulate the calibration of the kernel as a constrained optimisation problem, which in the continuous form writes

(49)

\begin{align} \min _{\theta \in \Theta } \frac{1}{M}\sum _{m=1}^M\mathcal{D}_p(g(v,t_m|\theta ),\hat g_m(\theta )) \end{align}

(50)

\begin{align} &\textrm{s.t.}\qquad \mathcal{F}(f) = 0, \quad f^0(v,c)= f(v,c,0),\nonumber \\[5pt] &\,\qquad g(v,t|\theta )= \int _{{\mathbb R}_+} f(v,c,t)\, dc, \end{align}

where $ \mathcal{F}(f) = 0$ is a shorten notation for the Fokker–Planck equation (39), $g(v,t|\theta )$ represents the marginal opinion distribution of $f(v,c,t)$ , $\hat g_m(v)$ represents the known distribution of opinions at time $\{t_m\}_{m=1}^M$ , and $\mathcal{D}_p$ is a discrepancy measure, such as $p-$ Wasserstein distance, or the $\ell _p$ distance, with $p\geq 1$ . In what follows we propose, as a numerical approximation of this minimisation procedure, a particle scheme based on two steps outlined in what follows.

Asymptotic particle-based scheme

To simulate the evolution of the Fokker–Planck equation (39), we rely on an asymptotic stochastic particle method to solve the kinetic dynamics (24) in the quasi-invariant regime (27). Then, we introduce the following discretisation:

(51)

\begin{equation} f^{n+1} = \left (1-\frac{\Delta t}{\varepsilon }\right )f^n + \frac{\Delta t}{\varepsilon }Q^{\theta,+}_\varepsilon (f^n,f^n), \end{equation}

where the gain operator $Q^{\theta,+}$ encodes the gain of particles in position $(v,c)$ at time $t$ after interactions (6) and (23) have occurred. The particle scheme for the simulation of (51) is reported in Algorithm 1, where we set $\varepsilon =\Delta t$ for simplicity. We refer to [Reference Pareschi and Toscani45] for details on this class of methods. Hence, this simulation scheme produces a sequence of data $\left \{(v^n_i,c^n_i)\right \}_{i=1}^{N_s}\sim f(v,c,t_n|\theta )$ for $n=0,\ldots,N_{t}$ , that we can use in the next calibration procedure.

Algorithm 1 Asymptotic particle-based algorithm (Nanbu-like algorithm)

Minimisation of particle-based discrepancy

To minimise the discrepancy between the densities, we rely directly on the information provided by the particles. Hence, from the particle simulation of (51) we retrieve $ V^n(\theta )=\left \{v^n_i\right \}_{i=1}^{N_s}\sim g(v,t_n|\theta )$ for $n=1,\ldots,N_t$ , whereas we recover the target particles sampling from the real-data distributions $N_s$ particles $\hat V^m=\left \{\hat v^m_i\right \}_{i=1}^{N_s}\sim \hat g_m(v)$ . Hence, we obtain the parameter $\theta ^*$ as the minimiser of the following problem:

(52)

\begin{equation} \theta ^*\in \arg \min _{\theta \in \Theta } \frac{1}{M}\sum _{m=1}^M\mathcal{D}_p(g_m^{N_s}(\theta ),\hat g_m^{N_s}), \end{equation}

constrained to the evolution of the particle scheme (51). Notice that the discrepancy $\mathcal{D}_p$ is evaluated for the empirical densities $g_m^{N_s}(\theta ),\hat g_m^{N_s}$ relative to the samples $V^m(\theta ), \hat V^m$ .

In order to perform the minimisation of (52), we need to produce solutions in the admissible parameter space $\Theta$ . Here, we rely on Matlab routines for constrained minimisation such as fmincon, based on interior point method, and patternsearch, as a gradient-free optimisation method. Indeed the fluctuations of the particle method are reflected in the discrepancy measure. Thus, to reduce the stochasticity induced by the particle simulation, at each iteration of the optimisation procedure, we have fixed the random-seed generator in Algorithm 1.

Remark 1. Computing the discrepancy measure in (52) can be challenging, for example, in the aforementioned case of $p$ -Wasserstein distance. However, in our case, we can exploit the one-dimensional framework of the opinion space, hence computing equivalently

(53)

\begin{equation} \mathcal{D}_p( g_m^{N_s}(\theta ),\hat g_m^{N_s})\equiv \mathcal{W}^p_p(g_m^{N_s}(\theta ),\hat g_m^{N_s}) = \frac{1}{N_s}\sum _{i=1}^{Ns} |v^m_{\pi (i)}(\theta ) - \hat v^m_{\pi^{\prime}(i)}|^p, \end{equation}

where $\pi$ and $\pi '$ are two permutation of the indices $1, \ldots, Ns$ such that $v^m_{\pi (1)} \leq v^m_{\pi (2)}\leq \ldots \leq v^m_{\pi (N_s)}$ and $\hat v^m_{\pi '(1)} \leq \hat v^m_{\pi '(2)}\leq \ldots \leq \hat v^m_{\pi '(N_s)}$ . When $\ell _p$ -distance is considered the discrepancy measure simply writes as follows:

(54)

\begin{equation} \mathcal{D}_p( g_m^{N_s}(\theta ),\hat g_m^{N_s})\equiv \int _{[\!-\!1,1]} |g_m^{N_s}(v|\theta )-\hat g_m^{N_s}(v)|^p\, dv, \end{equation}

where $g_m^{N_s}(\theta ),\hat g_m^{N_s}$ have to be appropriately reconstructed from the samples $V^m(\theta )$ and $\hat V^m$ .

Remark 2. We remark that the optimisation problem (52) is in general a high-dimensional non-convex problem requiring efficient optimisation methods, see for example [Reference Borghi, Herty and Pareschi11, Reference Fornasier, Huang, Pareschi and Sünnen33, Reference Totzeck53]. Different approaches are also advisable, reformulating the calibration into a function approximation framework can give more generalisable results for the kernel inference, see for example [Reference Bongini, Fornasier, Hansen and Maggioni10, Reference Chu, Li and Porter18, Reference Fiedler, Herty, Rom, Segala and Trimpe32, Reference Göttlich and Totzeck37].

References

Acemoglu, D. & Ozdaglar, A. (2011) Opinion dynamics and learning in social networks. Dyn. Games Appl. 1, 3–49.CrossRef Google Scholar

Agnelli, J. P., Colasuonno, F. & Knopoff, D. (2015) A kinetic theory approach to the dynamics of crowd evacuation from bounded domains. Math. Models Methods Appl. Sci. 25(01), 109–129.CrossRef Google Scholar

Albi, G., Bellomo, N., Fermo, L., et al. (2019) Vehicular traffic, crowds, and swarms: From kinetic theory and multiscale methods to applications and research perspectives. Math. Models Methods Appl. Sci. 29(10), 1901–2005.CrossRef Google Scholar

Albi, G., Pareschi, L. & Zanella, M. (2028) Boltzmann-type control of opinion consensus through leaders. Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 372(2028), 18.Google Scholar

Albi, G., Pareschi, L., Toscani, G. & Zanella, M. (2017). Recent advances in opinion modeling: control and social influence. In Active Particles. Vol. 1. Advances in Theory, Models, and Applications, Model. Simul. Sci. Eng. Technol., Birkhäuser/Springer, Cham, pp. 49–98.Google Scholar

Albi, G., Pareschi, L. & Zanella, M. (2017) Opinion dynamics over complex networks: Kinetic modelling and numerical methods. Kinetic Relat. Models 10(1), 1–32.CrossRef Google Scholar

Bellomo, N., Dosi, G., Knopoff, D. A. & Virgillito, M. E. (2020) From particles to firms: On the kinetic theory of climbing up evolutionary landscapes. Math. Models Methods Appl. Sci. 30(7), 1441–1460.CrossRef Google Scholar

Bellomo, N., Gibelli, L., Quaini, A. & Reali, A. (2022) Towards a mathematical theory of behavioral human crowds. Math. Models Methods Appl. Sci. 32(2), 321–358.CrossRef Google Scholar

Bertaglia, G., Boscheri, W., Dimarco, G. & Pareschi, L. (2021) Spatial spread of COVID-19 outbreak in Italy using multiscale kinetic transport equations with uncertainty. Math. Biosci. Eng. 18(5), 7028–7059.CrossRef Google Scholar PubMed

Bongini, M., Fornasier, M., Hansen, M. & Maggioni, M. (2017) Inferring interaction rules from observations of evolutive systems i: The variational approach. Math. Models Methods Appl. Sci. 27(05), 909–951.CrossRef Google Scholar

Borghi, G., Herty, M. & Pareschi, L. (2023) Constrained consensus-based optimization. SIAM J. Optimiz. 33(1), 211–236.CrossRef Google Scholar

Boscheri, W., Dimarco, G. & Pareschi, L. (2021) Modeling and simulating the spatial spread of an epidemic through multiscale kinetic transport equations. Math. Models Methods Appl. Sci. 31(6), 1059–1097.CrossRef Google Scholar

Burger, M. (2021) Network structured kinetic models of social interactions. Vietnam J. Math. 49(3), 937–956.CrossRef Google Scholar PubMed

Burger, M., Pietschmann, J.-F. & Wolfram, M.-T. (2020) Data assimilation in price formation. Inverse Probl. 36(6).CrossRef Google Scholar

Cercignani, C. (1988). The Boltzmann Equation and Its Applications. Applied Mathematical Sciences, 67, Springer-Verlag, New York, pp. xii+455.CrossRef Google Scholar

Chalub, F. A., Markowich, P. A., Perthame, B. & Schmeiser, C. (2004). Kinetic Models for Chemotaxis and Their Drift-Diffusion Limits, Springer, Wien.CrossRef Google Scholar

Chatterjee, A. & Chakrabarti, B. K. (2007) Kinetic exchange models for income and wealth distributions. Eur. Phys. J. B 60, 135–149.CrossRef Google Scholar

Chu, W., Li, Q. & Porter, M. A. (2022). Inference of interaction kernels in mean-field models of opinion dynamics, preprint.Google Scholar

Cordier, S., Pareschi, L. & Toscani, G. (2005) On a kinetic model for a simple market economy. J. Stat. Phys. 120, 253–277.CrossRef Google Scholar

Das, A., Gollapudi, S. & Munagala, K. (2014). Modeling opinion dynamics in social networks. Proceedings of the 7th ACM International Conference on Web Search and Data Mining.CrossRef Google Scholar

Deffuant, G., Neau, D., Amblard, F. & Weisbuch, G. (2000) Mixing beliefs among interacting agents. Adv. Complex Syst. 03(01n04), 87–98.CrossRef Google Scholar

Dimarco, G., Perthame, B., Toscani, G. & Zanella, M. (2021) Kinetic models for epidemic dynamics with social heterogeneity. J Math Biol. 83(1), 4.CrossRef Google Scholar PubMed

Dimarco, G., Toscani, G. & Zanella, M. (2022) Optimal control of epidemic spreading in the presence of social heterogeneity. Philos. Trans. R. Soc. A 380(2224), 16.CrossRef Google Scholar PubMed

Dimarco, G. & Toscani, G. (2019) Kinetic modeling of alcohol consumption. J. Stat. Phys. 177(5), 1022–1042.CrossRef Google Scholar

Dimarco, G. & Toscani, G. (2020) Social climbing and Amoroso distribution. Math. Models Methods Appl. Sci. 30(11), 2229–2262.CrossRef Google Scholar

Dimarco, G. & Tosin, A. (2020) The Aw-Rascle traffic model: Enskog-type kinetic derivation and generalisations. J. Stat. Phys. 178(1), 178–210.CrossRef Google Scholar

Dimarco, G., Tosin, A. & Zanella, M. (2022) Kinetic derivation of Aw-Rascle-Zhang-type traffic models with driver-assist vehicles. J. Stat. Phys. 186(1), 26.CrossRef Google Scholar

Dolfin, M. & Lachowicz, M. (2015) Modeling opinion dynamics: How the network enhances consensus. Netw. Heterog. Media 10(4), 877–896.CrossRef Google Scholar

Düring, B., Markowich, P., Pietschmann, J.-F. & Wolfram, M.-T. (2009) Boltzmann and Fokker–Planck equations modelling opinion formation in the presence of strong leaders. Proc. R. Soc. A: Math. Phys. Eng. Sci. 465(2112), 3687–3708,CrossRef Google Scholar

Düring, B., Pareschi, L. & Toscani, G. (2018) Kinetic models for optimal control of wealth inequalities. Eur. Phys. J. B 91(10), 12.CrossRef Google Scholar

Festa, A., Tosin, A. & Wolfram, M.-T. (2018) Kinetic description of collision avoidance in pedestrian crowds by sidestepping. Kinet. Relat. Models 11(3), 491–520.CrossRef Google Scholar

Fiedler, C., Herty, M., Rom, M., Segala, C. & Trimpe, S. (2023) Reproducing kernel Hilbert spaces in the mean field limit. Kinet. Relat. Models 16(6), 850–870.CrossRef Google Scholar

Fornasier, M., Huang, H., Pareschi, L. & Sünnen, P. (2021) Consensus-based optimization on the sphere: Convergence to global minimizers and machine learning. J. Mach. Learn. Res. 22(1), 10722–10776.Google Scholar

Franceschi, J. & Pareschi, L. (2022) Spreading of fake news, competence and learning: Kinetic modelling and numerical approximation. Philos. Trans. Roy. Soc. A 380(2224), 18.Google Scholar PubMed

Furioli, G., Pulvirenti, A., Terraneo, E. & Toscani, G. (2020) Non-Maxwellian kinetic equations modeling the dynamics of wealth distribution. Math. Models Methods Appl. Sci. 30(4), 685–725.CrossRef Google Scholar

Galam, S., Gefen, Y. & Shapir, Y. (1982) Sociophysics: A new approach of sociological collective behaviour. i. mean-behaviour description of a strike. J. Math. Sociol. 9(1), 1–13.CrossRef Google Scholar

Göttlich, S. & Totzeck, C. (2022) Parameter calibration with stochastic gradient descent for interacting particle systems driven by neural networks. Math. Contr. Signals Syst 34, 1–30.CrossRef Google Scholar

Günther, M., Klar, A., Materne, T. & Wegener, R. (2002) An explicitly solvable kinetic model for vehicular traffic and associated macroscopic equations. Math Comput Model 35(5-6), 591–606.CrossRef Google Scholar

Hegselmann, R. & Krause, U. (2002) Opinion dynamics and bounded confidence models, analysis, and simulation. J. Artif. Soc. Soc. Simul. 5(3).Google Scholar

Hutto, C. & Gilbert, E. (2014) Vader: A parsimonious rule-based model for Sentiment Analysis of social media text. Proc. Int. AAAI Conf. Web Social Media 8(1), 216–225.CrossRef Google Scholar

Kahneman, D. & Tversky, A. (2013). Prospect theory: An analysis of decision under risk. In: Handbook of the Fundamentals of Financial Decision Making: Part I, World Scientific, pp. 99–127.CrossRef Google Scholar

Li, J.-C., Tao, C. & Li, H.-F. (2022) Dynamic forecasting performance and liquidity evaluation of financial market by Econophysics and Bayesian methods. Phys. A: Stat. Mechan. Appl. 588, 126546.CrossRef Google Scholar

Littman, J. & Wrubel, L. (2019). Climate change Tweets IDs.Google Scholar

Medhat, W., Hassan, A. & Korashy, H. (2014) Sentiment analysis algorithms and applications: A survey. Ain Shams Eng. J. 4, 1093–1113.CrossRef Google Scholar

Pareschi, L. & Toscani, G. (2013). Interacting Multiagent Systems: Kinetic Equations and Monte Carlo Methods, OUP Oxford, Walton Street, Oxford.Google Scholar

Pareschi, L. & Toscani, G. (2028) Wealth distribution and collective knowledge: A Boltzmann approach. Philos. Trans. R. Soc. A: Math. Phys. Eng. Sci. 372(2028), 20130396.CrossRef Google Scholar

Puppo, G., Semplice, M., Tosin, A. & Visconti, G. (2016) Fundamental diagrams in traffic flow: The case of heterogeneous kinetic models. Commun. Math. Sci. 14, 643–669.CrossRef Google Scholar

Sznajd-Weron, K. & Sznajd, J. (2000) Opinion evolution in closed community. Int. J. Mod. Phys. C 11, 1157–1165.CrossRef Google Scholar

Toscani, G. (2006) Kinetic models of opinion formation. Commun. Math. Sci. 4, 4–496.CrossRef Google Scholar

Toscani, G. (2013) A kinetic description of mutation processes in bacteria. Kinet. Relat. Mod. 6, 1043–1055.CrossRef Google Scholar

Toscani, G. (2022) A multi-agent approach to the impact of epidemic spreading on commercial activities. Math. Models Methods Appl. Sci. 32(10), 1931–1948.CrossRef Google Scholar

Tosin, A. (2014). Kinetic equations and stochastic game theory for social systems. In: Mathematical Models and Methods for Planet Earth, Springer, Cham. pp. 37–57.CrossRef Google Scholar

Totzeck, C. (2021). Trends in consensus-based optimization. In: Active Particles, Volume 3: Advances in Theory, Models, and Applications, Springer, pp. 201–226.Google Scholar

Zanella, M., Bardelli, C., Dimarco, G., et al. (2021) A data-driven epidemic model with social structure for understanding the COVID-19 infection on a heavily affected Italian province. Math. Models Methods Appl. Sci. 31(12), 2533–2570.CrossRef Google Scholar

Zhang, W., Yoshida, T. & Tang, X. (2008) Text classification based on multi-word with support vector machine. Knowl.-Based Syst. 21(8), 879–886.CrossRef Google Scholar

Zhou, X., Tao, X., Yong, J. & Yang, Z. (2013). Sentiment analysis on tweets for social events. In: Proceedings of the 2013 IEEE 17th International Conference on Computer Supported Cooperative Work in Design (CSCWD), IEEE, pp. 557–562.CrossRef Google Scholar

Figure 1. Profiles of the value function (4) for different choices of $\delta$ and $\mu =0.15$. The red dashed lines represent the bounds (5).

Figure 3. Comparison between the tails of the data distribution and the different possible equilibrium distributions of the Fokker–Planck models of Section 2.1.

Table 1. Fitting of the contact distribution from Twitter data

Figure 5. Test $1$, $\sigma ^2/\alpha = 0.005$. The pictures show the time evolution of the distribution function $f(v,c,t)$ for $t=0,4,8,12,16,20$ for a homogeneous distribution of the number of connections with respect to opinions. After the emergence of two clusters, the agents reach consensus at the final time.

Figure 6. Test $1$, $\sigma ^2/\alpha = 0.005$. The pictures show the time evolution of the distribution function $f(v,c,t)$ for $t=0,4,8,12,16,20$ in the case of a non-homogeneous distribution of the number of connections with respect to opinions. After the emergence of two clusters, the agents reach consensus at the final time in the positive opinion region.

Figure 7. Test $2$, $\sigma ^2/\alpha = 0.005$. The pictures show the distribution $f(v,c,t)$ for $t=4$ (left), $t=6$ (centre), and $t=8$ (right). Top row: constant bound of contacts ($\Delta (c,c_*)=0.5)$), two main clusters emerge at any bound of contacts. Bottom row: heterogeneous confidence bound ($\Delta (c,c_*)=$ in (47)), consensus is reached for agents with a low number of contacts, whereas for higher bound of contacts two main clusters emerge.

Figure 8. Test $2$, $\sigma ^2/\alpha = 0.005$. The pictures show the time evolution of the distribution function $f(v,c,t)$ for $t=0$ (left), $t=3$ (centre), and $t=6$ (right). Agents with lower bound of connections are strongly influenced by agents with a large number of connections.

Figure 9. Test $4$: marginal distribution of opinions at the final time (left) and the comparison between the reconstructed density of opinions and contacts and the real dataset.

Figure 11. Test $5$: initial (left) and final (right) joint density of opinions and contacts (the images show $\log\!(f(v,c,t) + 0.025)$).

Algorithm 1 Asymptotic particle-based algorithm (Nanbu-like algorithm)

A data-driven kinetic model for opinion dynamics with social network contacts – ADDENDUM

Giacomo Albi , Elisa Calzola and Giacomo Dimarco

European Journal of Applied Mathematics

Article contents

A data-driven kinetic model for opinion dynamics with social network contacts

Abstract

Keywords

MSC classification

1. Introduction

2. An evolutionary model for contacts

2.1. The kinetic model for the evolution of social media contacts

2.2. Contact distribution on Twitter and fitting

3. Kinetic model of opinions and contacts

3.1. The binary interaction

3.2. Fokker–Planck asymptotics

3.3. On the steady state solution for the opinion distribution

4. Numerical experiments

4.1. Qualitative behaviour

4.1.1. Test 1: Bounded confidence model

4.1.2. Test 2: Heterogeneous confidence bound

4.1.3. Test 3: Sznajd-type dynamics

4.2. Quantitative analysis

4.2.1. Twitter sentiment analysis

4.2.2. Test 4: Trump re-admission on Twitter

4.2.3. Test 5: Climate change trends on Twitter

5. Conclusions

Competing interests

Appendix A: Particle-based kernel calibration

Asymptotic particle-based scheme

Minimisation of particle-based discrepancy

References

An addendum has been issued for this article:

Linked content

Article contents

A data-driven kinetic model for opinion dynamics with social network contacts

Abstract

Keywords

MSC classification

1. Introduction

2. An evolutionary model for contacts

2.1. The kinetic model for the evolution of social media contacts

2.2. Contact distribution on Twitter and fitting

3. Kinetic model of opinions and contacts

3.1. The binary interaction

3.2. Fokker–Planck asymptotics

3.3. On the steady state solution for the opinion distribution

4. Numerical experiments

4.1. Qualitative behaviour

4.1.1. Test 1: Bounded confidence model

4.1.2. Test 2: Heterogeneous confidence bound

4.1.3. Test 3: Sznajd-type dynamics

4.2. Quantitative analysis

4.2.1. Twitter sentiment analysis

4.2.2. Test 4: Trump re-admission on Twitter

4.2.3. Test 5: Climate change trends on Twitter

5. Conclusions

Competing interests

Appendix A: Particle-based kernel calibration

Asymptotic particle-based scheme

Minimisation of particle-based discrepancy

References

An addendum has been issued for this article:

Linked content

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests