1 Introduction
1.1 Multiagent interaction models in biological, social and economical dynamics
In the course of the past two decades, there has been an explosion of research on models of multiagent interactions [Reference Cucker and Dong28, Reference Cucker and Mordecki29, Reference Cucker and Smale31, Reference Grégoire and Chaté36, Reference Jadbabaie, Lin and Stephen Morse40, Reference Ke, Minett, Au and Wang42, Reference Vicsek, Czirok, BenJacob, Cohen and Shochet74], to describe phenomena beyond physics, for example, in biology, such as cell aggregation and motility [Reference Camazine, Deneubourg, Franks, Sneyd, Theraulaz and Bonabeau11, Reference Keller and Segel43, Reference Koch and White45, Reference Perthame61], coordinated animal motion [Reference Ballerini, Cabibbo, Candelier, Cavagna, Cisbani, Giardina, Lecomte, Orlandi, Parisi, Procaccini, Viale and Zdravkovic8, Reference Carrillo, D’Orsogna and Panferov16, Reference Chuang, D’Orsogna, Marthaler, Bertozzi and Chayes18, Reference Couzin and Franks21, Reference Couzin, Krause, Franks and Levin22, Reference Cristiani, Piccoli and Tosin24, Reference Cucker and Smale31, Reference Niwa54, Reference Parrish and EdelsteinKeshet57, Reference Parrish, Viscido and Gruenbaum58, Reference Romey63, Reference Toner and Tu71, Reference Yates, Erban, Escudero, Couzin, Buhl, Kevrekidis, Maini and Sumpter77], coordinated human [Reference Cristiani, Piccoli and Tosin25, Reference Cucker, Smale and Zhou30, Reference Short, D’Orsogna, Pasour, Tita, Brantingham, Bertozzi and Chayes68], and synthetic agent behaviour and interactions, such as cooperative robots [Reference Chuang, Huang, D’Orsogna and Bertozzi19, Reference Leonard and Fiorelli49, Reference Perea, Gómez and Elosegui60, Reference Sugawara and Sano70]. We refer to [Reference Carrillo, Choi and Hauray13, Reference Carrillo, Choi and Perez15, Reference Carrillo, Fornasier, Toscani and Vecil17, Reference Vicsek and Zafeiris75] for recent surveys.
Two main mechanisms are considered in such models to define the dynamics. The first takes inspiration from physical laws of motion and is based on pairwise forces encoding observed ‘first principles’ of biological, social, or economical interactions, for example, repulsionattraction, alignment, selfpropulsion/friction et cetera. Typically, this leads to Newtonlike first or secondorder equations with ‘social interaction’ forces, see (1.1) below. In the second mechanism, the dynamics are driven by an evolutive game where players simultaneously optimise their cost. Examples are game theoretic models of evolution [Reference Hofbauer and Sigmund38] or meanfield games, introduced in [Reference Lasry and Lions48] and independently under the name Nash Certainty Equivalence (NCE) in [Reference Huang, Caines and Malhamé39], later greatly popularised, for example, within consensus problems, for instance in [Reference Nuorian, Caines and Malhamé55, Reference Nuorian, Caines and Malhamé56].
More recently, the notion of spatially inhomogeneous evolutionary games has been proposed [Reference Almi, Morandotti and Solombrino5, Reference Ambrosio, Fornasier, Morandotti and Savaré6, Reference Morandotti and Solombrino53] where the transport field for the agent population is directed by an evolutionary game on their available strategies. Unlike meanfield games, the optimal dynamics are not chosen via an underlying nonlocal optimal control problem but by the agents’ local (in time and space) decisions, see (1.3) below.
One fundamental goal of these studies is to reveal the relationship between the simple pairwise forces or incentives acting at the individual level and the emergence of global patterns in the behaviour of a given population.
Newtonian models. A common simple class of models for interacting multiagent dynamics with pairwise interactions is inspired by Newtonian mechanics. The evolution of N agents with timedependent locations $x_1(t),\ldots,x_N(t)$ in $\mathbb{R}^d$ is described by the ODE system
where f is a predetermined pairwise interaction force between pairs of agents. The system is welldefined for sufficiently regular f (e.g. for f Lipschitz continuous). In this article, we will refer to such models as Newtonian models.
Firstorder models of the form (1.1) are ubiquitous in the literature and have, for instance, been used to model multiagent interactions in opinion formation [Reference Hegselmann and Krause37, Reference Krause46], vehicular traffic flow [Reference Garavello and Piccoli35], pedestrian motion [Reference Cristiani, Piccoli and Tosin26], and synchronisation of chemical and biological oscillators in neuroscience [Reference Kuramoto47].
Often one is interested in studying the behaviour of a very large number of agents. We may think of the agents at time t to be distributed according to a probability measure $\mu(t)$ over $\mathbb{R}^d$ . The limitdynamics of (1.1) as $N \to \infty$ can under suitable conditions be expressed directly in terms of the evolution of the distribution $\mu(t)$ according to a meanfield equation. The meanfield limit of (1.1) is formally given by
Here $v(\mu(t))$ is a velocity field and intuitively $v(\mu(t))(x)$ gives the velocity of infinitesimal mass particles of $\mu$ at time t and location x [Reference Cañizo, Carrillo and Rosado12, Reference Carrillo, Choi and Hauray14].
While strikingly simple, such models only exhibit limited modelling capabilities. For instance, the resulting velocity $\partial_t x_i(t)$ is simply a linear combination of the influences of all the other agents. ‘Importance’ and ‘magnitude’ of these influences cannot be specified separately: agent j cannot tell agent i to remain motionless, regardless of what other agents suggest. Generally, agent i has no sophisticated mechanism of finding a ‘compromise’ between various potentially conflicting influences and merely uses their linear average. The applicability of such models to economics or sociology raises concerns as it is questionable whether a set of static interaction forces can describe the behaviour of rational agents who are able to anticipate and counteract undesirable situations.
Spatially inhomogeneous evolutionary games. In a game dynamic, the vector field $v(\mu(t))$ is not induced by a rigid force law but by the optimisation of an individual payoff by each agent. In meanfield games, this optimisation is global in time, each agent plans their whole trajectory in advance, optimisation can be thought of as taking place over repetitions of the same game (e.g. the daily commute). Alternatively, in spatially inhomogeneous evolutionary games [Reference Ambrosio, Fornasier, Morandotti and Savaré6] the agents continuously update their mixed strategies in a process that is local in time, which may be more realistic in some scenarios. This is implemented by the wellknown replicator dynamics [Reference Hofbauer and Sigmund38].
As above, agents may move in $\mathbb{R}^d$ . Let U be the set of pure strategies. The map $e \,:\, \mathbb{R}^d \times U \to \mathbb{R}^d$ describes the spatial velocity e(x, u) for an agent at position $x \in \mathbb{R}^d$ with a pure strategy $u \in U$ . For example, one can pick $U \subset \mathbb{R}^d$ to be a set of admissible velocity vectors and simply set $e(x,u) = u$ . e acts as dictionary between strategies and velocities and can therefore assumed to be known. e(x, u) may be more complicated, for instance, when certain velocities are inadmissible at certain locations due to obstacles. A function $J\,:\,(\mathbb R^d\times U)^2 \to \mathbb R$ describes the individual benefit J(x, u, x ^{′}, u ^{′}) for an agent at x for choosing strategy u when another agent sits at x ^{′} and intends to choose strategy u ^{′}. For example, J(x, u, x ^{′}, u ^{′}) may be high when e(x, u) points in the direction that x wants to move, but could be lowered when the other agent’s velocity e(x ^{′}, u ^{′}) suggests an impending collision. The state of each agent is given by their spatial position $x \in \mathbb{R}^d$ and a mixed strategy $\sigma \in \mathcal{M}_1(U)$ (where $\mathcal{M}_1(U)$ denotes probability measures over U). As hinted at, the evolution of $\sigma$ is driven by a replicator dynamic involving the payoff function J, averaged over the benefit obtained from all other agents, the spatial velocity $\partial_t x$ is obtained by averaging $\int_U e(x,u)\,\textrm{d} \sigma(u)$ . The equations of motion are given by
where for $x,x^{\prime} \in \mathbb{R}^d$ and $\sigma,\sigma^{\prime} \in \mathcal{M}_1(U)$
Intuitively, the strategy $\sigma_i$ tends to concentrate on the strategies with the highest benefit for agent i via (1.4), which will then determine their movement via (1.3a).
Equation (1.3a) describes the motion of agents in $\mathbb R^d$ and resembles (1.1). The main difference is that the vector field is determined by the resulting solution of the replicator equation (1.3b), which promotes strategies that perform best with respect to the individual payoff J with rate (1.4). Despite the different nature of the variables $(x_i,\sigma_i)$ of the model, the system (1.3) can also be reinterpreted as a Newtonianlike dynamics on the space $\mathbb{R}^d \times \mathcal{M}_1(U)$ where each agent $y_i(t)=(x_i(t),\sigma_i(t))$ is characterised by its spatial location $x_i(t) \in \mathbb{R}^d$ and its mixed strategy $\sigma_i(t) \in \mathcal{M}_1(U)$ . Accordingly, equations (1.3) can be more concisely expressed as
Similarly to meanfield limit results in [Reference Cañizo, Carrillo and Rosado12, Reference Carrillo, Choi and Hauray14], the main contribution of [Reference Ambrosio, Fornasier, Morandotti and Savaré6] is to show that the large particle limit for $N\to \infty$ of solutions of (1.3) converges in the sense of probability measures to $\Sigma(t) \in \mathcal{M}_1(\mathbb{R}^d \times \mathcal{M}_1(U))$ , which is, in analogy to (1.2), solution of a nonlinear transport equation of the type
Note that (1.6) is a partial differential equation having as domain a (possibly infinitedimensional) Banach space containing $\mathbb{R}^d \times \mathcal{M}_1(U)$ and particular care must be applied to the use of an appropriate calculus needed to define differentials, in particular for the divergence operator. In [Reference Ambrosio, Fornasier, Morandotti and Savaré6], Lagrangian and Eulerian notions of solutions to (1.6) are introduced. As the technical details underlying the wellposedness of (1.6) do not play a central role in this article, we refer to [Reference Ambrosio, Fornasier, Morandotti and Savaré6] for more insights. Nevertheless, we apply some of the general statements in [Reference Ambrosio, Fornasier, Morandotti and Savaré6] to establish wellposedness of the models presented in this paper.
1.2 Learning or inference of multiagent interaction models
An important challenge is to determine the precise form of the interaction force or payoff functions. While physical laws can often be determined with high precision through experiments, such experiments are often either not possible, or the models are not as precisely determined in more complex systems from biology or social sciences. Therefore, very often the governing rules are simply chosen ad hoc to reproduce at some major qualitative effects observed in reality.
Alternatively, one can employ model selection and parameter estimation methods to determine the form of the governing functions. Datadriven estimations have been applied in continuum mechanics [Reference Conti, Müller and Ortiz20, Reference Kirchdoerfer and Ortiz44] computational sociology [Reference Bongini, Fornasier, Hansen and Maggioni10, Reference Lu, Maggioni and Tang50, Reference Lu, Zhong, Tang and Maggioni51, Reference Zhong, Miller and Maggioni78] or economics [Reference Albani, Ascher, Yang and Zubelli3, Reference Crépey23, Reference Egger and Engl33]. However, even the problem of determining whether time shots of a linear dynamical system do fulfil physically meaningful models, in particular have Markovian dynamics, is computationally intractable [Reference Cubitt, Eisert and Wolf27]. For nonlinear models, the intractability of learning the system corresponds to the complexity of determining the set of appropriate candidate functions to fit the data. In order to break the curse of dimensionality, one requires prior knowledge on the system and the potential structure of the governing equations. For instance, in the sequence of recent papers [Reference Schaeffer, Tran and Ward64, Reference Schaeffer, Tran and Ward65, Reference Tran and Ward72] the authors assume that the governing equations are of first order and can be written as sparse polynomials, that is, linear combinations of few monomial terms.
In this work, we present an approach for estimating the payoff function for spatially inhomogeneous evolutionary games from observed data. It is inspired by the papers [Reference Bongini, Fornasier, Hansen and Maggioni10, Reference Lu, Maggioni and Tang50, Reference Lu, Zhong, Tang and Maggioni51, Reference Zhong, Miller and Maggioni78], in particular by the groundbreaking paper [Reference Bongini, Fornasier, Hansen and Maggioni10] which is dedicated to datadriven evolutions of Newtonian models. In these references, the curse of dimensionality is remedied by assuming that the interaction function f in (1.1) is parametrised by a lowerdimensional function a, the identification of which is more computationally tractable. A typical example for such a parametrisation is given by functions $f=f^a$ of the type
The corresponding model (1.1) is used, for instance, in opinion dynamics [Reference Hegselmann and Krause37, Reference Krause46], vehicular traffic [Reference Garavello and Piccoli35], or pedestrian motion [Reference Cristiani, Piccoli and Tosin26]. The learning or inference problem is then about the determination of a, hence of $f=f^a$ , from observations of reallife realisations of the model (1.1). Clearly, the problem can be formulated as an optimal control problem. However, as pointed out in [Reference Bongini, Fornasier, Hansen and Maggioni10], in view of the nonlinear nature of the function mapping a in the corresponding solution $x^{[a]}(t)$ of (1.1), the control cost would be nonconvex and the optimal control problem rather difficult to solve. Instead, in [Reference Bongini, Fornasier, Hansen and Maggioni10] a convex formulation is obtained by considering an empirical risk minimisation in least squares form: given an observed realisation $(x_1^{N}(t),\dots,x_N^{N}(t))$ of the dynamics between N agents, generated by (1.1) governed by $f^a$ , we aim at identifying a by minimising
That is, along the observed trajectories we aim to minimise the discrepancy between observed velocities and those predicted by the model. The functional $\mathcal{E}^{N}$ plays the role of a loss function in the learning or inference task. A timediscrete formulation in the framework of statistical learning theory has been proposed also in [Reference Lu, Maggioni and Tang50, Reference Lu, Zhong, Tang and Maggioni51, Reference Zhong, Miller and Maggioni78]. Under the assumption that a belongs to a suitable compact class X of smooth functions, the main results in [Reference Bongini, Fornasier, Hansen and Maggioni10, Reference Lu, Maggioni and Tang50, Reference Lu, Zhong, Tang and Maggioni51, Reference Zhong, Miller and Maggioni78] establish that minimisers
converge to a in suitable senses for $N\to \infty$ , where the ansatz spaces $V_N \subset X$ are suitable finitedimensional spaces of smooth functions (such as finite element spaces).
1.3 Contribution and outline
The main scope of this paper is to provide a theoretical and practical framework to perform learning/inference of spatially inhomogeneous evolutionary games, so that these models could be used in reallife datadriven applications. First, we discuss some potential issues with model (1.3) and provide some adaptations. We further propose and study in Section 3 several learning functionals for inferring the payoff function J from the observation of the dynamics, that is, extending the approach of [Reference Bongini, Fornasier, Hansen and Maggioni10] to our modified version of [Reference Ambrosio, Fornasier, Morandotti and Savaré6]. Let us detail our contributions.
The proposed changes to the model (1.3) are as follows:

• In Section 2.1, we add entropic regularisation to the dynamics for the mixed strategies of the agents. This avoids degeneracy of the strategies and allows faster reactions to changes in the environment, see Figure 1. Entropic regularisation of games was also considered in [Reference Flåm and Cavazzuti34]. We show that the adapted model is wellposed and has a consistent meanfield limit (Theorem 2.6).

• For interacting agents, the assumption that the mixed strategies of other agents are fully known may often be unrealistic. Therefore, we will focus our analysis on the case where J(x, u, x ^{′}, u ^{′}) does not explicitly depend on u ^{′}, the ‘other’ agent’s pure strategy. We refer to this as the undisclosed setting. (The general case is still considered to some extent.)
For this undisclosed setting, we study the quasistatic fastreaction limit of (1.3) where the dynamics of mixed strategies $(\sigma_i)_{i=1}^N$ run at much faster time scale than dynamics of locations $(x_i)_{i=1}^N$ , that is, agents quickly adjust their strategies to changes in the environment. This will also be important for designing practical inference functionals (see below). The undisclosed fastreaction model is introduced in Section 2.2. Wellposedness and consistent meanfield limit are proved by Theorem 2.13, convergence to the fastreaction limit or quasistatic evolution as the strategy time scale becomes faster is given by Theorem 2.14.
We claim that the resulting undisclosed fastreaction entropic model is a useful alternative of the Newtonian model (1.1) and we support these considerations with theoretical and numerical examples. In particular, we show that any Newtonian model can be described (approximately) as an undisclosed fastreaction entropic model, whereas the converse it not true (Examples 2.7 and 2.8).
We then discuss several inference functionals for the modified game model in Section 3. We start with a rather direct analogue of (1.7) (Section 3.2), which would require not only the observation of the spatial locations $(x_i)_{i=1}^N$ and velocities $(\partial_t x_i)_{i=1}^N$ but also of the mixed strategies $(\sigma_i)_{i=1}^N$ , and their temporal derivatives $(\partial_t \sigma_i)_{i=1}^N$ . Whether the latter two can be observed in practice is doubtful, in particular with sufficient accuracy. Therefore, as an alternative, we propose two inference functionals for the undisclosed fastreaction setting. A functional based on observed spatial velocities is given in Section 3.3.1. A functional based on mixed strategies (but not their derivatives) is proposed in Section 3.3.2, and we discuss some options how the required data could be obtained in practice. In Section 3.3.3, some properties of these functionals are established, such as existence of minimisers and consistency of their meanfield versions.
Numerical examples are given in Section 4. These include the inference on examples where observations were generated with a true underlying undisclosed fastreaction entropic model, as well as inference of Newtonian models and of a model for pedestrian motion adapted from [Reference Bailo, Carrillo and Degond7, Reference Degond, AppertRolland, Moussaïd, Pettré and Theraulaz32]. We close with a brief discussion in Section 5. Some longer technical proofs have been moved to Appendix A. Before presenting the new model, we collect some notation in the next subsection.
1.4 Setting and notation
General setting. Let $(Y,d_Y)$ be a complete and separable metric space. We denote by $\mathcal{M}(Y)$ the space of signed Borel measures with finite total variation and by $\mathcal{M}_1(Y)$ the set of probability measures with bounded first moment, that is,
For a continuous function $\varphi \in C(Y),$ we denote by
its Lipschitz constant. The set of bounded Lipschitz functions is then denoted by $\textrm{Lip}_b(Y) = \{ \varphi \in C(Y) \mid \\varphi\_\infty + \textrm{Lip}(\varphi) < \infty \}$ , with $\\cdot\_\infty$ the classical supnorm.
For $\mu_1,\mu_2 \in \mathcal{M}_1(Y)$ , the 1Wasserstein distance $W_1(\mu_1,\mu_2)$ is defined by
where $\Gamma(\mu_1,\mu_2)$ is the set of admissible coupling between $\mu_1$ and $\mu_2$ , that is, $\Gamma(\mu_1,\mu_2) = \{ \gamma \in \mathcal{M}_1(Y \times Y) \mid \gamma(A \times Y) = \mu_1(A) \text{ and } \gamma(Y \times B) = \mu_2(B)\}$ . Due to Kantorovitch duality, one can also consider the equivalent definition
The metric space $(\mathcal{M}_1(Y), W_1)$ is complete because $(Y, d_Y)$ is complete, and a sequence $(\mu_n)_{n \in \mathbb{N}} \subset \mathcal{M}_1(Y)$ converges to $\mu \in \mathcal{M}_1(Y)$ with respect to the distance $W_1$ if and only if $\mu_n$ converges to $\mu$ in duality with bounded Lipschitz functions and the first moments converge, that is, $\int_{Y} d_Y(\cdot,\bar{y})\,\textrm{d}\mu_n \to \int_{Y} d_Y(\cdot,\bar{y})\,\textrm{d}\mu$ for all $\bar{y} \in Y$ .
Interaction setting. We fix the space of pure strategies U to be a compact metric space with distance $\textrm{d}_U$ . Each agent’s mixed strategy is then described by $\sigma \in \mathcal{M}_1(U)$ . Agents move in $\mathbb{R}^d$ , and we denote with $\\cdot\$ the usual Euclidean norm. For $R > 0$ , $B^d({R})$ is the open ball of radius R and centre the origin in $\mathbb{R}^d$ . For $N \in \mathbb{N}$ and $\mathbf{x} = \left(x_1,\dots,x_N\right) \in [\mathbb{R}^d]^N,$ we introduce the scaled norm $\\cdot\_N$ defined as
For $\sigma_1,\sigma_2 \in \mathcal{M}_1(U),$ we set the Kullback–Leibler divergence to be
where $\textrm{d}\sigma_1/\textrm{d}\sigma_2$ is the Radon–Nikodym derivative of $\sigma_1$ with respect to $\sigma_2$ .
Throughout the paper, we assume $e \in \textrm{Lip}_b(\mathbb{R}^d \times U; \mathbb{R}^d)$ to be fixed and known. Such a function maps each pure strategy $u \in U$ into a given velocity $e(\cdot,u) \in \mathbb{R}^d$ . We may think of e as a label or dictionary for strategies. The sets of admissible interaction rules J are described by
Here, $\mathcal{X}$ consists of all possible payoff functions, modelling the case where each agent has a complete knowledge of both positions and mixed strategies of the other agents. On the other hand, in what we call the undisclosed setting, each agent only knows the physical locations of the others, and thus in this context, the payoff function J will no longer depend on the other’s strategy, making it possible to restrict the analysis to X.
2 From spatially inhomogeneous evolutionary games to undisclosed fastreaction dynamics
We describe in this section how we adapt the spatially inhomogeneous evolutionary games model (1.3) of [Reference Ambrosio, Fornasier, Morandotti and Savaré6] to make it more suitable in practice for the task of modelling and inferring interaction rules between rational agents. In Section 2.1, we add entropic regularisation to avoid degeneracy of the agents’ mixed strategies. In Section 2.2, we focus on the more realistic undisclosed setting, where agents are not aware of other agents’ strategy choices and derive a fastreaction limit, describing the regime where choice of the strategies happens at a much faster time scale than physical movement of the agents.
2.1 Entropic regularisation for spatially inhomogeneous evolutionary games
In the model (1.3), mixed strategies $\sigma_i$ are prone to converging exponentially to very singular distributions so that agents cannot react quickly to changes in the environment and exhibit strong inertia, contradicting the notion of rational agents. This behaviour can be illustrated with a simple example.
Example 2.1 (Slow reaction in spatially inhomogeneous evolutionary games). Let $d=1$ , $U=\{1,+1\}$ , $e(x,u)=u$ , so pure strategies correspond to moving left or right with unit speed. Let now $J\!\left(x,u,x^{\prime},u^{\prime}\right)=x \cdot u$ , that is, agents prefer moving towards the origin, independently of the other agents’ locations and actions. For simplicity, we can set $N=1$ and choose $x(0)=2$ , $\sigma(0)=(0.5,0.5)$ , where we identify measures on $U=\{1,+1\}$ with vectors of the simplex in $\mathbb{R}^2$ . At $t=0,$ the agent perceives strategy $1$ as more attractive. The mixed strategy $\sigma$ starts pivoting towards this pure strategy and the agent starts moving towards the origin. The numerically simulated trajectory for the original model () is shown in Figure 1 (corresponding to line $\varepsilon = 0$ , $\lambda = 1$ ). We can see how $\sigma$ rapidly converges to the state (0,1). Thus when the agent reaches the origin, it cannot react quickly and there is considerable overshoot.
As a remedy, we propose an adaptation of the model (1.3) by adding an entropic term to the dynamics of the mixed strategies $\sigma_i$ . The update rule (1.3b) modifies $\sigma_i$ towards maximising the average benefit
To this objective, we will now add the entropy $\varepsilon\cdot \int_U \log\left(\frac{\textrm{d} {\sigma_i}}{\textrm{d} {\eta}}\right)\!\textrm{d} \sigma_i$ where $\eta \in \mathcal{M}_1(U)$ is a suitable ‘uniform’ reference measure and $\varepsilon>0$ is a weighting parameter. This will pull $\sigma_i$ towards $\eta$ and thus prevent exponential degeneration.
Entropic regularisation. The entropic regularisation implies that mixed strategies are no longer general probability measures over U, but they become densities with respect to a reference measure $\eta \in \mathcal{M}_1(U)$ , which we can assume without loss of generality to have full support, that is, $\textrm{spt}(\eta) = U$ . Thus, for a fixed $\varepsilon > 0$ , set
where $0 < a < 1 < b < \infty$ (the bounds a and b are required for technical reasons, see below). For $\lambda > 0$ , the modified entropic dynamics is then given by
where the function $f^J$ now formally needs to be given in terms of densities, instead of measures:
The additional term $f^\varepsilon$ , corresponding to entropy regularisation, is given by:
In (2.2b), we have explicitly added the factor $\lambda \in (0,\infty)$ to control the relative time scale of the dynamics for mixed strategies and locations.
Example 2.2 (Faster reactions with entropic regularisation). We repeat here Example 2.1 with added entropy. We set $\eta=(0.5,0.5)$ , keeping the symmetry between strategies $+1$ and $1$ in the regularised system. Numerically simulated trajectories for this setup with different choices of $\lambda$ and $\varepsilon$ are shown in Figure 1. For $(\lambda=1,\varepsilon=0.5)$ , the strategy does not get as close to (0,1) as for $(\lambda=1,\varepsilon=0)$ . Therefore, upon crossing the origin the agent can react quicker, there is less overshoot and the trajectory eventually converges to 0. Finally, for $(\lambda=10,\varepsilon=0.5)$ the agent moves quickly and without overshoot to the origin, quickly adapting the mixed strategy.
Remark 2.3 (Entropic gradient flow). Formally, the contribution of $f^\varepsilon$ to the dynamics (2.2b) corresponds to a gradient flow in the Hellinger–Kakutani metric over $\mathcal{S}_{a,b}$ of the (negative) entropy
of the density $\sigma$ with respect to the reference measure $\eta$ .
Remark 2.4 (Multiple agent species). In the interaction models mentioned so far ((1.1), (1.3), (2.2)), all agents are of the same species (i.e. their movement is specified by the same functions). Often one wishes to model the interaction between agents of multiple species (e.g. predators and prey). The latter case can formally be subsumed into the former by expanding the physical space from $\mathbb{R}^d$ to $\mathbb{R}^{1+d}$ and using the first spatial dimension as ‘species label’, that is, an agent $\hat{x}_i=(n_i,x_i) \in \mathbb{R}^{1+d}$ describes an agent of species $n_i \in \mathbb{Z}$ at position $x_i \in \mathbb{R}^d$ . The interaction force $\hat{f} \,:\, \mathbb{R}^{1+d} \times \mathbb{R}^{1+d} \to \mathbb{R}^{1+d}$ in the expanded version of (1.1) is then given by $\hat{f}\big((n,x),(n^{\prime},x^{\prime})) \;:=\; (0,f_{n,n^{\prime}}(x,x^{\prime}))$ where $f_{n,n^{\prime}} \,:\, \mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}^d$ is the interaction function for species (n,n^{′}) and we set the first component of $\hat{f}$ to zero, such that the species of each agent remains unchanged. In an entirely analogous fashion, functions J and e in models (1.3) and (2.2) can be extended.
Wellposedness and meanfield limit of the regularised system. Following [Reference Ambrosio, Fornasier, Morandotti and Savaré6] (see (1.5)), we analyse (2.2) as an interaction system in the Banach space $Y = \mathbb{R}^d \times L^p_\eta(U)$ , $1 \leq p < \infty$ . For $y = (x,\sigma) \in Y$ we set $\y\_Y = \x\ + \\sigma\_{L^p_\eta(U)}$ . Define
and for $y, y^{\prime} \in \mathcal{C}_{a,b}$ let us set
where
so that (2.2) takes the equivalent form
Similar to [Reference Ambrosio, Fornasier, Morandotti and Savaré6], wellposedness of (2.9) relies on the Lipschitz continuity of f and on the following compatibility condition (for the corresponding proofs see Appendix A, Lemmas A.1, A.2 and A.4).
Lemma 2.5 (Compatibility condition). For $J \in \mathcal{X}$ and $\varepsilon > 0$ , let $f^J$ and $f^\varepsilon$ be defined as in (2.3) and (2.4). Then, there exist a,b with $0 < a < 1 < b < \infty$ such that for any $(x,\sigma),(x^{\prime},\sigma^{\prime}) \in \mathbb{R}^d \times \mathcal{S}_{a,b}$ there exists some $\theta>0$ such that
Intuitively, for specific choices of the bounds a and b, (2.10) states that moving from $\sigma \in \mathcal{S}_{a,b}$ into the direction generated by $f^J$ and $f^\varepsilon$ , we will always remain within $\mathcal{S}_{a,b}$ for some finite time. Eventually, similar to (1.6), a meanfield limit description of (2.9) is formally given by
Like (1.6), this is a PDE whose domain is a Banach space and we refer to [Reference Ambrosio, Fornasier, Morandotti and Savaré6] for the technical details. We can then summarise the main result in the following theorem.
Theorem 2.6 (Wellposedness and meanfield limit of entropic model). Let $J \in \mathcal{X}$ , $\lambda, \varepsilon > 0$ , $T<+\infty$ . Let $0 < a < 1 < b < \infty$ in accordance with Lemma 2.5. Then:

1. Given $\bar{\mathbf{y}}^N = \left(\bar{y}_1^N,\dots, \bar{y}_N^N\right) \in \mathcal{C}_{a,b}^N$ there exists a unique trajectory $\hbox{$\mathbf{y}^N=(y_1^N,\ldots,y_N^N)$} \colon [0,T] \to \mathcal{C}_{a,b}^N$ of class $C^1$ solving (2.9) with $\mathbf{y}^N(0) = \bar{\mathbf{y}}^N$ . In particular, $\Sigma^N(t) := \tfrac{1}{N} \sum_{i=1}^N \delta_{y_i^N(t)}$ provides a solution of (2.11) for initial condition $\bar{\Sigma}^N := \tfrac{1}{N} \sum_{i=1}^N \delta_{\bar{y}_i^N}$ .

2. Given $\bar{\Sigma} \in \mathcal{M}_1(\mathcal{C}_{a,b}),$ there exists a unique $\Sigma \in C([0,T]; (\mathcal{M}_1(\mathcal{C}_{a,b}), W_1) )$ satisfying in the weak sense the continuity equation (2.11) with initial condition $\Sigma(0) = \bar{\Sigma}$ .

3. For initial conditions $\bar{\Sigma}_1, \bar{\Sigma}_2 \in \mathcal{M}_1(\mathcal{C}_{a,b})$ and the respective solutions $\Sigma_1$ and $\Sigma_2$ of (2.11) one has the stability estimate
(2.12) \begin{align} W_1(\Sigma_1(t),\Sigma_2(t)) \leq \exp\big(2\textrm{Lip}(f)\,(ts)\big) \cdot W_1(\Sigma_1(s),\Sigma_2(s)) \end{align}for every $0 \leq s \leq t \leq T$ .
Proof. The theorem follows by invoking Theorem 4.1 from [Reference Ambrosio, Fornasier, Morandotti and Savaré6]. On a more technical level, Theorem 5.3 of [Reference Ambrosio, Fornasier, Morandotti and Savaré6] provides the uniqueness of the solution to (2.11) in a Eulerian sense. We now show that the respective requirements are met.
First, the set $\mathcal{C}_{a,b}$ , (2.6), is a closed convex subset of Y with respect to $\\cdot\_Y$ for any $0 < a < b < +\infty$ . Likewise, for any $0 < a < b < +\infty$ the map f driving the evolution (2.9) is Lipschitz continuous: indeed, one can prove that $f^J$ and $f^\varepsilon$ are Lipschitz continuous (see Lemmas A.1 and A.2 in Appendix A) and Lipschitz continuity of $f^e$ follows from the fact that e is Lipschitz continuous and bounded. Furthermore, as a consequence of Lemma 2.5, one can choose a and b so that the extended compatibility condition
holds, cf. [Reference Ambrosio, Fornasier, Morandotti and Savaré6, Theorem B.1]. Therefore, we may invoke [Reference Ambrosio, Fornasier, Morandotti and Savaré6, Theorem 4.1]. Combining this with [Reference Ambrosio, Fornasier, Morandotti and Savaré6, Theorem 5.3], which is applicable since $L^p_\eta(U)$ is separable and thus so is Y, the result follows.
2.2 Undisclosed setting and fast reactions
In the model (2.2), the decision process of each agent potentially involves knowledge of the mixed strategies of the other agents. Often it is reasonable to assume that there is no such knowledge, which can be reflected in the model by assuming that the payoff J(x, u, x ^{′}, u ^{′}) does not actually depend on u ^{′}, the other agent’s strategy. Note that J(x, u, x ^{′}, u ^{′}) may still depend on x ^{′}, the other agent’s location. We call this the undisclosed setting.
Additionally, often it is plausible to assume that the (regularised) dynamic that governs the mixed strategies $(\sigma_i(t))_{i=1}^{N}$ of the agents runs at a much faster time scale compared to the physical motion of the spatial locations $(x_i(t))_{i=1}^N$ . This corresponds to a large value of $\lambda$ in (2.2). Therefore, for the undisclosed setting we study the fastreaction limit $\lambda \to \infty$ .
The main results of this section are Theorems 2.13 and 2.14 which establish that the undisclosed fastreaction limit is in itself a welldefined model (with a consistent meanfield limit as $N \to \infty$ ) and that the undisclosed model converges to this limit as $\lambda \to \infty$ .
Undisclosed setting. In the undisclosed setting, the general formulas (2.2) for the dynamics can be simplified as follows:
where $f^\varepsilon$ is as given in (2.4) and $f^J$ simplifies to
For finite $\lambda < +\infty,$ the dynamics of this model are still covered by Theorem 2.6.
Fast reactions of the agents. Intuitively, as $\lambda \to \infty$ in (2.13), at any given time t the mixed strategies $(\sigma_i(t))_{i=1}^N$ will be in the unique steady state of the dynamics (2.13b) for fixed spatial locations $(x_i(t))_{i=1}^N$ . For given locations $\mathbf{x} = (x_1,\ldots,x_N) \in [\mathbb{R}^d]^N$ this steady state is given by
(This computation is explicitly shown in the proof of Theorem 2.14.) The spatial agent velocities associated with this steady state are given by
and system (2.13) turns into a purely spatial ODE in the form
Unlike in Newtonian models over $\mathbb{R}^d$ , (1.1), here the driving velocity field $v_i^J(x_1(t),\ldots,x_N(t))$ is not a linear superposition of the contributions by each $x_j(t)$ . This nonlinearity is a consequence of the fastreaction limit and allows for additional descriptive power of the model that cannot be captured by the Newtontype model (1.1). This is illustrated in the two subsequent Examples 2.7 and 2.8.
Example 2.7 (Describing Newtonian models as undisclosed fastreaction models). Newtonian models can be approximated by undisclosed fastreaction entropic game models. We give a sketch for this approximation procedure. For a model as in (1.1), choose $U \subset \mathbb{R}^d$ such that it contains the range of f (for simplicity we assume that it is compact), let $e(x,u) \;:=\; u$ and then set
Accordingly, the stationary mixed strategy of agent i is given by
where $\mathcal{N}\left({\cdot}\right)$ denotes the normalisation operator for a density with respect to $\eta$ . Observe now that
while
Hence, since the terms not depending on u are cancelled by the normalisation, we also have
So the maximum of the density $\sigma_{i}^J(\mathbf{x})$ is at $\tfrac{1}{N}\sum_{j=1}^N f(x_i,x_j)$ , which is the velocity imposed by the Newtonian model. For small $\varepsilon$ , $\sigma_{i}^J(\mathbf{x})$ will be increasingly concentrated around this point, and hence, the game model will impose a similar dynamic. Numerically, this is demonstrated in Figure 2 for a 1dimensional Newtonian model as in (1.1) driven by
Example 2.8 (Undisclosed fastreaction models are strictly richer than Newtonian models). In the previous example, each agent j tried to persuade agent i to move with velocity $f(x_i,x_j)$ . Deviations from this velocity were penalised in the payoff function with a quadratic function. By minimising the sum of these functions, the compromise of the linear average of all $f(x_i,x_j)$ then yields the best payoff. By picking different penalties we may obtain other interesting behaviour that cannot be described by Newtonian systems.
As above, let U be a (sufficiently large) compact subset of $\mathbb{R}^d$ , $e(x,u) \;:=\; u$ . Let $g \,:\, \mathbb{R}^d \to \mathbb{R}$ be a Lipschitz nonnegative ‘bump function’ with compact support, that is, g is maximal at 0 with $g(0)=1$ , and $g(u)=0$ for $\u\\geq \delta$ for some small $\delta >0$ . Then we set
where $C>0$ is a suitable, sufficiently large constant. Then approximately, the function $u \mapsto \frac{1}{N}\sum_j J(x_i,u,x_j)$ is maximal at $u=\frac{x_jx_i}{\x_jx_i\}$ where j is the agent that is closest to i (if it is closer than C and not closer than $\delta$ , in particular the last term in J is added to avoid that $u=0$ is the maximiser). Therefore, approximately, we have created a model, where agents are attracted with unit speed by their nearest neighbours, but not by other agents. Such a nonlinear tradeoff between the influences of the other agents is not obtainable by a Newtonian model.
Remark 2.9 (Fastreaction limit in the fully general case). For the fully general setting (i.e. J being also a function of u^{′}), the study of the fastreaction regime is a much harder problem as the stationary state of mixed strategies (for fixed spatial locations) often depends on the initial mixed strategies $(\sigma_i)_{i=1}^N$ , and thus, the fastreaction limit is a ‘genuine’ quasistatic evolution problem. We refer to [Reference Agostiniani and Rossi2, Reference Scilla and Solombrino66, Reference Scilla and Solombrino67] for derivation and wellposedness results of quasistatic evolutions of critical points of nonconvex energies on the Euclidean space by means of vanishingviscosity methods. A similar analysis is in the course of development for nonconvex energies on Hilbert spaces [Reference Agostiniani, Rossi and Savaré1]. Datadriven evolutions of critical points have been considered in [Reference Almi, Fornasier and Huber4].
Wellposedness and meanfield limit of the undisclosed fastreaction system. Similar to the full model (2.2), we are also interested in a meanfield limit of the undisclosed fastreaction limit as the number of agents tends to infinity, $N \to \infty$ . A formal limiting procedure leads to the equation
where for $x \in \mathbb{R}^d$ and $\nu \in \mathcal{M}_1(\mathbb{R}^d)$ we set
Given $\mathbf{x} = \left(x_1,\dots,x_N\right) \in [\mathbb{R}^d]^N$ and setting $\mu^N = \frac1N \sum_{j=1}^N \delta_{x_j}$ , (2.15), (2.18) and (2.19) are related through
The key ingredient for the study of (2.15) and its limiting behaviour is to establish Lipschitz continuity of the fastreaction equilibrium strategies and velocities with respect to payoff function and particle locations.
Lemma 2.10. Let $J, J^{\prime} \in X$ and consider $M > 0$ such that $M \geq \J\_{\infty} + \textrm{Lip}(J)$ and $M \geq \J^{\prime}\_{\infty} + \textrm{Lip}(J^{\prime})$ . Let $\mu, \mu^{\prime} \in \mathcal{M}_1(\mathbb{R}^d)$ and $x, x^{\prime} \in \mathbb{R}^d$ . Then:

1. There exists $C = C(M, \varepsilon)$ such that, for every $u \in U$ ,
(2.20) \begin{align} \left \sigma^{J}(\mu)(x,u)  \sigma^{J^{\prime}}(\mu^{\prime})(x^{\prime},u) \right &\leq C\left( W_1(\mu,\mu^{\prime}) + \J  J^{\prime}\_\infty + \xx^{\prime}\\right) \end{align}and $1/C < \sigma^{J}(\mu)(x,u) < C$ . 
2. There exists $C = C(M, e, \varepsilon)$ such that
(2.21) \begin{align} \left\ v^{J}(\mu)(x)  v^{J^{\prime}}(\mu^{\prime})(x^{\prime}) \right\ &\leq C\!\left(W_1(\mu,\mu^{\prime}) + \J  J^{\prime}\_\infty + \xx^{\prime}\\right). \end{align}
Proof. Let us define two continuous functions $g, g^{\prime} \colon U \to \mathbb{R}$ as
For every $u \in U$ , using $M \geq \J\_{\infty}$ and $M \geq \J^{\prime}\_{\infty}$ , we immediately obtain the global bounds $M \leq g(x) \leq M$ and $M \leq g^{\prime}(x) \leq M$ . Using the triangle inequality and the dual definition for $W_1$ in (1.9), we also estimate
for $C = C(M)$ . Recall now that
Thus, (2.20) follows combining global boundedness of g, g ^{′} with (2.22), the uniform bounds $1/C < \sigma^{J}(\mu)(x,u) < C$ are obtained in the same way, while (2.21) follows from (2.20) combined with $e \in \textrm{Lip}_b(\mathbb{R}^d \times U; \mathbb{R}^d)$ .
For the discrete fastreaction system, the following Lemma adapts the estimate (2.21) as one in terms of discrete particle locations and their velocities.
Lemma 2.11 (Map to undisclosed fastreaction agent velocity is Lipschitz continuous). For any $N > 0$ , consider the map
associated with the fastreaction ODE (2.15). Then, $\mathbf{v}^J$ is Lipschitz continuous under the distance induced by $\\cdot\_N$ , with Lipschitz constant $L = L(J, e, \varepsilon)$ .
Proof. Fix any $\mathbf{x},\mathbf{x}^{\prime} \in \left[\mathbb{R}^d\right]^N$ and define $\mu = \frac{1}{N} \sum_{j=1}^N \delta_{x_i}$ and $\mu^{\prime} = \frac{1}{N} \sum_{j=1}^N \delta_{x^{\prime}_i}$ . Then, by Lemma 2.10, with $J^{\prime} = J$ , $x = x_i$ and $x^{\prime} = x^{\prime}_i$ , we have
for every $i = 1,\dots, N$ , with $C = C(J,e,\varepsilon)$ . Using the definition of $W_1$ in (1.8), we observe
so that
which is the soughtafter estimate.
After clarifying what we mean by a solution of (2.17), we summarise the meanfield result in Theorem 2.13, whose proof then builds on Lipschitz continuity of the velocity field in the discrete system and follows by fairly standard arguments (see, e.g. [Reference Cañizo, Carrillo and Rosado12, Reference Carrillo, Choi and Hauray14]).
Definition 2.12. We say a curve $\mu \in C([0,T]; (\mathcal{M}_1(\mathbb{R}^d), W_1) )$ solves (2.17) if $\mu(t)$ has uniformly compact support for $t \in [0,T]$ and
Theorem 2.13 (Wellposedness and meanfield limit of undisclosed fastreaction model). Let $J \in X$ , $\varepsilon > 0$ , $0 < \bar{R} < +\infty$ , and $0 < T < +\infty$ . Define $R = \bar{R} + T \cdot \e\_\infty$ . Then:

1. Given $\bar{\mathbf{x}}^N = \left(\bar{x}_1^N,\dots, \bar{x}_N^N\right) \in [B^d({\bar{R}})]^N$ there exists a unique trajectory $\mathbf{x}^N=\left(x_1^N,\ldots,x_N^N\right) \colon [0,T] \to [B^d({R})]^N$ of class $C^1$ solving (2.15c) with $\mathbf{x}^N(0) = \bar{\mathbf{x}}^N$ . In particular, $\mu^N(t) \;:=\; \tfrac{1}{N} \sum_{i=1}^N \delta_{x_i^N(t)}$ provides a solution of (2.17) for initial condition $\bar{\mu}^N \;:=\; \tfrac{1}{N} \sum_{i=1}^N \delta_{\bar{x}_i^N}$ .

2. Given $\bar{\mu} \in \mathcal{M}_1(B^d({\bar{R}}))$ there exists a unique $\mu \in C([0,T]; (\mathcal{M}_1(B^d({R})), W_1) )$ satisfying in the weak sense the continuity equation (2.17) with initial condition $\mu(0)=\bar{\mu}$ .

3. For initial conditions $\bar{\mu}_1, \bar{\mu}_2 \in \mathcal{M}_1(B^d({\bar{R}}))$ and the respective solutions $\mu_1$ and $\mu_2$ of (2.17) one has the stability estimate
(2.23) \begin{align} W_1(\mu_1(t),\mu_2(t)) \leq \exp(C\,(ts)) \cdot W_1(\mu_1(s),\mu_2(s)) \end{align}for every $0 \leq s \leq t \leq T$ , with $C = C(J, e, \varepsilon, \bar{R}, T)$ .
Proof outline. The proof is rather standard and builds on the Lipschitz continuity of the finite agents system (2.15). We highlight the main steps without indepth details and refer to Proposition 2.1 and Theorem 2.4 of [Reference Bongini, Fornasier, Hansen and Maggioni10], and references therein, for further information.
Part 1: finite agents setting. For given $\bar{\mathbf{x}}^N = (\bar{x}_{1}^N,\dots, \bar{x}_{N}^N) \in [B^d({\bar{R}})]^N$ , using the Lipschitz continuity provided in Lemma 2.11, there exists a unique curve $\mathbf{x}^N \colon [0,T] \to [\mathbb{R}^d]^N$ of class $C^1$ solving (2.15) with $\mathbf{x}^N(0) = \bar{\mathbf{x}}^N$ . A direct estimate provides $\\partial x_i^N(t)\ \leq \e\_\infty$ , so that $\x_i^N(t)\ \leq \bar{R} + T \cdot \e\_\infty$ for every $t \in [0,T]$ , $i = 1,\dots,N$ . Furthermore, setting $\mu^N(t) \;:=\; \tfrac{1}{N} \sum_{i=1}^N \delta_{x^N_i(t)}$ one can explicitly verify that $\mu^N$ solves (2.17) with initial condition $\bar{\mu}^N \;:=\; \tfrac{1}{N} \sum_{i=1}^N \delta_{\bar{x}^N_i}$ . This establishes point .
Part 2: meanfield solution as limit of finite agents solutions. Fix $\bar{\mu} \in \mathcal{M}_1(B^d({\bar{R}}))$ . For $N > 0$ , let $(\bar{\mu}^N)_N \subset \mathcal{M}_1(B^d({\bar{R}}))$ be a sequence of empirical measures such that $W_1(\bar{\mu}^N,\bar{\mu}) \to 0$ as $N \to \infty$ and let $(\mu^N)_N$ be the respective sequence of solutions of (2.17) with initial conditions $\bar{\mu}^N$ . This sequence of curves $(\mu^N)_{N \in \mathbb{N}} \subset C([0,T]; (\mathcal{M}_1(B^d({R})), W_1))$ is equicontinuous and equibounded. Therefore, an application of the Ascoli–Arzelà theorem provides a cluster point $\mu \in C([0,T]; (\mathcal{M}_1(B^d({R})), W_1))$ such that, up to a subsequence, we have
Invoking Lemma 2.10 for $x=x^{\prime}$ and $J=J^{\prime}$ this implies $\ v^J\!\left(\mu^N(t)\right)(x)  v^J(\mu(t))(x)\ \to 0$ uniformly in $t \in [0,T]$ , $x \in B^d({R})$ as $N \to \infty$ . Consequently, the cluster point $\mu$ is a solution to (2.17) with $\bar{\mu}$ as initial condition. This establishes the existence part of point .
Part 3: stability estimates. For fixed N, a stability estimate of the form $\\mathbf{x}^N_1(t)\mathbf{x} ^N_2(t)\_N \leq \exp(C\,(ts)) \cdot \\mathbf{x}^N_1(s)\mathbf{x} ^N_2(s)\_N$ for solutions to the discrete system (2.15) follows quickly from Grönwall’s lemma. Since the $\\cdot\_N$ norm between two point clouds provides an upper bound for the $W_1$ distance between the respective empirical measures, this would provide point for empirical measures. The extension to arbitrary measures can be done as in the proof of Theorem 2.4 in [Reference Bongini, Fornasier, Hansen and Maggioni10]. This then provides uniqueness of the solution $\mu$ to (2.17) for initial condition $\bar{\mu} \in \mathcal{M}_1(B^d({\bar{R}}))$ , completing point .
Finally, we establish convergence of (2.13) to (2.15) (and their respective meanfield versions) as $\lambda \to \infty$ . The proof is given in Section A.2 of the Appendix, a simple numerical example is shown in Figure 3.
Theorem 2.14 (Convergence to fastreaction limit in undisclosed setting as $\lambda \to \infty$ ).

1. Discrete setting: For initial positions $\mathbf{x}(0)=(x_1(0),\ldots,x_N(0)) \in [B^d({\bar{R}})]^N$ and initial mixed strategies $\boldsymbol{\sigma}(0)=(\sigma_1(0),\ldots,\sigma_N(0)) \in \mathcal{S}_{a,b}^N$ let $\mathbf{x}(t) = (x_1(t),\ldots,x_N(t))$ and $\boldsymbol{\sigma}(t)=(\sigma_1(t),\ldots,\sigma_N(t))$ be the solution to the undisclosed model (2.13). For the same initial positions, let $\mathbf{x}^{\ast \ast}(t)=(x^{\ast \ast}_1(t),\ldots,x^{\ast \ast}_N(t))$ be the solution to the undisclosed fastreaction model (2.15) and let $\boldsymbol{\sigma}^{\ast \ast}(t)=(\sigma_1^{\ast \ast}(t),\ldots,\sigma_N^{\ast \ast}(t))$ with $\sigma_i^{\ast \ast}(t)=\sigma^J_{i}(\mathbf{x}^{\ast \ast}(t))$ be the corresponding fastreaction strategies. Then, there exists $C = C(a,b,\bar{R},J,e)$ (independent of N and i) such that for all $t \in [0,\infty)$ ,
(2.24) \begin{align} \\mathbf{x}(t)\mathbf{x}^{\ast \ast}(t)\_N \leq \frac{C}{\sqrt{\lambda}} \cdot \exp(t \cdot C) \end{align}and(2.25) \begin{align} \\sigma_i(t)\sigma_i^{\ast\ast}(t)\_{L^2_\eta(U)} \leq \frac{C}{\sqrt{\lambda}} \cdot \exp(t \cdot C) + \left[C \left[\dfrac{1}{\lambda}+ \exp\left(\dfrac{\lambda\,t}{C}\right)\right]\right]^{1/2}. \end{align}So on any compact time horizon [0, T], we find that $\\mathbf{x}(t)  \mathbf{x}^{\ast \ast}(t)\_N \to 0$ uniformly in time as $\lambda \to \infty$ . $\boldsymbol{\sigma}$ converges to $\boldsymbol{\sigma}^{\ast \ast}$ uniformly in time on the interval $[\tau,T]$ for any $\tau>0$ . Near $t=0$ we cannot expect uniform convergence of $\boldsymbol{\sigma}$ , since by initialisation it can start at a finite distance from $\boldsymbol{\sigma}^{\ast \ast}(0),$ and thus, it takes a brief moment to relax to the vicinity of the fastreaction state. 
2. Meanfield setting: For an initial configuration $\bar{\Sigma} \in \mathcal{M}_1(\mathcal{C}_{a,b}),$ let $\Sigma$ be the solution to the entropic meanfield model (2.11) for a undisclosed J with $\lambda \in (0,\infty)$ . Let $P \,:\, \mathcal{C}_{a,b} \to \mathbb{R}^d$ , $(x, \sigma) \mapsto x$ be the projection from $\mathcal{C}_{a,b}$ to the spatial component. Set $\bar{\mu} \;:=\; P_\sharp \bar{\Sigma}$ and let $\mu$ be the solution to the fastreaction meanfield model (2.17) with initial condition $\mu(0)=\bar{\mu}$ .
Then for the same C as in (2.24), one has
(2.26) \begin{align} W_1(\mu(t),P_\sharp \Sigma(t)) \leq \frac{C}{\sqrt{\lambda}} \cdot \exp(t \cdot C). \end{align}
Remark 2.15. The statement about the meanfield equations can be expanded further: From the fastreaction solution $\mu$ on $\mathcal{M}_1(\mathbb{R}^d),$ one can construct a ‘lifted trajectory’ $\hat{\Sigma}$ on $\mathcal{M}_1(\mathcal{C}_{a,b})$ , intuitively by attaching to each mass particle in $\mu$ at x its corresponding fastreaction mixed strategy $\sigma^J(\mu)(x,\cdot)$ , (2.19). Using the bounds (2.24) and (2.25), the continuity properties of $\sigma^J(\mu)(x,\cdot)$ (see Lemma 2.10) and arguing as in the proof of (2.26) one can then establish a $W_1$ bound between $\Sigma$ and $\hat{\Sigma}$ .
3 Inference for entropic evolutionary games
After having introduced a new class of models for interacting agents, we now turn to the question of how the payoff function J, that parametrises a model, can be inferred from data. Some remarks are in order before we proceed.
3.1 Discussion
Our motivation is as follows: we observe a set of rational agents, for example, pedestrians in a confined space, and want to learn about their interaction rules. As argued above (e.g. in Examples 2.7 and 2.8), undisclosed fastreaction entropic games can be used to parametrise a rich class of interaction behaviours, in particular subsuming Newtonian models. We may therefore hope that, when feeding our observations into an inference functional for J, that by analysing and interpreting the resulting payoff function we can learn something about the interaction rules of the agents. We focus on the inference of J and assume that U, e and $\varepsilon$ are known. We will demonstrate during the numerical examples (Section 4) that it is usually possible to make plausible choices for U, e and that qualitatively equivalent choices yield qualitatively equivalent results for the inferred J. One may also assume that $\varepsilon=1$ , since rescaling $\varepsilon$ can be compensated by rescaling J in the same way, see (2.15).
In Section 3.2, we discuss a differential inference functional, proposed in analogy to (1.7), for the full entropic game model, that is, without the undisclosed assumption and not in the fastreaction limit. For this we assume that we are able to observe the full dynamics, that is, physical locations and velocities of the agents, as well as their mixed strategies and temporal derivatives, $(x_i,\partial_t x_i,\sigma_i,\partial_t \sigma_i)_{i=1}^N$ and propose a functional that seeks to infer J by comparing its predictions with the observed $\partial_t \sigma_i$ .
In many cases, it may not be possible to observe the mixed strategies $\sigma_i$ , let alone their temporal derivatives $\partial_t \sigma_i$ , since these may correspond to internal states of the agents that we, as external observers, cannot perceive. In Section 3.3, we turn to the undisclosed fastreaction setting, where physical velocities and mixed strategies are functions of current physical locations, and show that there one can still perform inference with limited observations. In Section 3.3.1, we propose an inference functional, again in analogy to (1.7), that only requires the observation of the physical locations $x_i$ and velocities $\partial_t x_i$ . Although it is nonconvex, we observe in our numerical examples that it provides plausible payoff functions J, indicating that nonconvexity does not seem to be a practical issue.
In Section 3.3.2, we propose a convex inference functional that requires knowledge of the mixed strategies $\sigma_i$ , but not of their temporal derivatives. This may be practical in two cases: In Section 4.1, we propose a scheme to provide such data from observed physical velocities, intuitively by ‘inverting’ the approximation scheme of Example 2.7. Also, mixed strategies could represent a stochastic movement of agents and the ‘smooth’ physical velocity $\partial_t x_i$ is merely the result of an averaging over a smaller time scale. Should we indeed be able to observe the fluctuating agent dynamics at small time scales, this could be used to approximate the mixed strategies. This was one of our motivations for providing Example 4.3.
For both functionals, there are natural candidates for the meanfield limit. We analyse these two functionals in more detail in Section 3.3.3, providing an estimate of the approximation quality (Theorem 3.3), as well as existence of minimisers and consistency in the meanfield limit (Theorem 3.6).
Now, we formalise our notion of admissible observations for inference and a notion of consistency of observations in the meanfield limit.
Assumption 3.1 (Admissible observations).

1. Discrete observation: For a fixed number of agents N, a time horizon $T \in (0,\infty)$ , and some radius $R \in (0,\infty)$ we observe the agents’ physical paths $\textbf{x}^N \;:=\; \left(x^N_1,\ldots,x^N_N\right) \in C^1([0,T],B^d(R))^N$ with velocities $\textbf{v}^N \;:=\; \left(v^N_1,\ldots,v^N_N\right)$ , $v^N_i = \partial_t x^N_i$ .
Optionally, in addition we may also observe the agents’ mixed strategies $\boldsymbol{\sigma}^N \;:=\; \left(\sigma^N_1,\ldots,\sigma^N_N\right) \in C([0,T],\mathcal{S}_{a,b})^N$ for some bounds $0<a\leq b<\infty$ in the definition of $\mathcal{S}_{a,b}$ , see (2.1). The mixed strategies are consistent with the observed velocities, that is,
(3.1) \begin{equation}v_i^N(t) = \int_U e\left(x_i^N(t),u\right)\sigma_i^N(t)(u)\,\textrm{d} \eta(u).\end{equation}Note: The assumption that the agent’s velocity is exactly consistent with their mixed strategy may seem unrealistic, as the velocity may be subject to other external influences and noise. However, we will subsequently often assume that the mixed strategies are not observed directly, but are only inferred from the velocities. In this case, satisfying assumption (3.1) is quite natural. In other cases, assumption (3.1) will be used in Theorem 3.3 to bound the error on trajectories based on the error on strategies. If (3.1) only holds approximately, an additional corresponding error term will appear there. 
2. Consistent meanfield behaviour: For fixed T, R, for an increasing sequence of N we make observations as specified in part , and there is a limit observation $t \mapsto \mu^\infty(t) \in \mathcal{M}_1(B^d(R))$ with a velocity field $t \mapsto v^\infty(t)$ such that
(3.2) \begin{align} W_1\!\left(\mu^N(t),\mu^\infty(t)\right) \rightarrow 0 \quad \text{uniformly in $t \in [0,T]$ where}\quad \mu^N(t) \;:=\; \frac1N \sum_{i=1}^N \delta_{x^N_i(t)},\end{align}(3.3) \begin{align} \int_0^T \frac{1}{N} \sum_{i=1}^N \left \langle \phi\left(t,x^N_i(t)\right), v^N_i(t) \right \rangle \,\textrm{d} t \rightarrow \int_0^T \int_{\mathbb{R}^d} \left \langle \phi(t,x) , v^\infty(t)(x) \right \rangle \,\textrm{d} \mu^\infty(t)(x)\,\textrm{d} t \end{align}for all $\phi \in \left[C\!\left([0,T] \times \mathbb{R}^d\right)\right]^d$ , and(3.4) \begin{align} \int_0^T \frac{1}{N} \sum_{i=1}^N \v^N_i(t)\^2 \,\textrm{d} t \rightarrow \int_0^T \int_{\mathbb{R}^d} \v^\infty(t)(x)\^2 \,\textrm{d} \mu^\infty(t)(x)\,\textrm{d} t<\infty.\end{align}(3.2) implies that physical locations are consistent, (3.3) implies that observed velocities converge in a weak sense, and (3.4) implies that they are consistent. Intuitively, for (3.4) to hold, mass particles that converge to the same limit location x as $N \to \infty$ need to converge to the same velocity, otherwise Jensen’s strict inequality contradicts (3.4).If we also observe mixed strategies in part , then the bounds $0<a \leq b<\infty$ are uniform in N and there also is a mixed strategy meanfield $(t,x) \mapsto \sigma^\infty(t)(x) \in \mathcal{S}_{a,b}$ such that
for all $\phi \in \left[C\!\left([0,T] \times \mathbb{R}^d \times U\right)\right]^d$ and
In particular, observations are admissible when they were generated by an entropic game model with some ground truth payoff J.
Lemma 3.2. Let $J \in X$ , $T > 0$ , $\bar{R}>0$ , $N \in \mathbb{N}$ .

1. For initial locations $\bar{\textbf{x}}^N=\left(\bar{x}^N_1,\ldots,\bar{x}^N_N\right) \in [B^d(\bar{R})]^N$ the induced solution to (2.15) with corresponding velocities and mixed strategies provides an admissible discrete observation in the sense of Assumption 3.1, part , for $R=\bar{R}+\e\_\infty \cdot T$ .

2. Let $\bar{\mu} \in \mathcal{M}_1(B^d(\bar{R}))$ and consider a sequence of empirical measures $(\bar{\mu}^N)_N$ in $\mathcal{M}_1(B^d(\bar{R}))$ of the form
\begin{equation*} \bar{\mu}^N = \frac1N \sum_{i=1}^N \delta_{\bar{x}_{i}^N}, \quad \bar{x}_{i}^N \in B^d(\bar{R}), \quad {such that } \lim_{N\to \infty}W_1\!\left(\bar{\mu}^N, \bar{\mu}\right) = 0. \end{equation*}Then, the solutions to (2.15) with initial positions $\left(\bar{x}_1^N, \dots, \bar{x}_N^N\right)$ are a suitable sequence of discrete observations in the sense of Assumption 3.1, part , and the solution to (2.17) provides a corresponding limit observation.
The proof is quite straightforward and provided in Section A.3.
In the following, we assume that observations are admissible in the sense of Assumption 3.1.
3.2 Differential inference functional
In this section, we discuss an inference functional for payoff function J, in close analogy to (1.7). For now, we assume that in addition to $\left(\textbf{x}^N, \textbf{v}^N, \boldsymbol{\sigma}^N\right)$ we can even observe $\partial_t \sigma_i^N$ for all $i \in \{1,\ldots,N\}$ . Our differential inference functional is therefore aimed at recovering J, by comparing the observed $\partial_t \sigma_i^N$ with the predictions by the model (2.2b). In (1.7), the discrepancy between observed $v_i^N$ and predicted velocities is penalised by the squared Euclidean distance. As metric to compare the observed $\partial_t \sigma_i^N$ and the prediction by (2.2b), we choose the (weak) Riemannian tensor of the Hellinger–Kakutani and Fisher–Rao metrics. That is, we set for a base point $\sigma \in \mathcal{S}_{a,b}$ , and two tangent directions $\delta \mu, \delta \nu \in L^p_{\eta}(U)$ attached to it,
A potential inference functional for J could then be:
Minimisation should be restricted to a sufficiently regular and compact class of J. Since $d_\sigma(\cdot,\cdot)^2$ is quadratic in its arguments and $f^{{J}}$ is linear in J, the mismatch term is again quadratic in J and thus restricted to a suitable class, this is a convex optimisation problem.
In Section 4.3, we provide a simple example where we simulate trajectories $(\textbf{x}^N,\boldsymbol{\sigma}^N)$ according to the dynamics (2.2), based on a given J, and then obtain a corresponding minimiser $\kern1pt\hat{{\kern1pt}J}$ of (3.8). We then show that trajectories simulated with J and $\kern1pt\hat{{\kern1pt}J}$ are close even when the initial configurations are not drawn from training data used to infer $\kern1pt\hat{{\kern1pt}J}$ . Indeed, upon selecting a suitable functional framework, one can in principle extend the analysis in [Reference Bongini, Fornasier, Hansen and Maggioni10] (and the analysis presented here below) to prove existence of minimisers, limiting behaviour as $N \to \infty$ and ability of the inferred model to generalise.
However, application to real data seems challenging, as one usually only observes $\left(\textbf{x}^N, \textbf{v}^N\right)$ but not $\boldsymbol{\sigma}^N$ , let alone variations $\partial_t \sigma_i^N$ . To address this, we propose in the next section two new inference functionals for the undisclosed fastreaction setting.
3.3 Undisclosed fastreaction inference functionals
3.3.1 Penalty on velocities
The inference situation is simplified in the undisclosed fastreaction setting since now, for fixed payoff function J, mixed strategies and physical velocities are a direct function of physical locations. We can therefore attempt to minimise the discrepancy between observed physical velocities and those predicted by J.
For a discrete admissible observation $\left(\textbf{x}^N,\textbf{v}^N\right)$ , as in Assumption 3.1, we define the inference functional on hypothetical payoff functions ${J} \in X$
The natural candidate for the limit functional is
where the couple $\left(\mu^\infty, v^\infty\right)$ is the corresponding limit of $\left(\textbf{x}^N,\textbf{v}^N\right)_N$ as introduced in Assumption 3.1, point .
This functional is intuitive and requires only the observations of $\left(\textbf{x}^N,\textbf{v}^N\right)$ and no observations about the mixed strategies. However, it is not convex. This could potentially lead to poor local minima, but we did not observe such problems in our numerical examples.
The motivation for studying meanfield inference functionals is twofold. First, it establishes asymptotic consistency of inference in the limit of many agents. Second, the meanfield equation yields an approximation for the expected behaviour of the finiteagent system under many repetitions with random initial conditions. While we do not prove this approximation property, the existence of a consistent meanfield inference functional is still an encouraging indicator that inference over the collection of many finiteagent samples will yield a reasonable result.
3.3.2 Penalty on mixed strategies
In the case where information about the mixed strategies of the agents is available (see Section 3.1 for a discussion), the discrepancy between observed mixed strategies and those predicted by J can be used for inference. We choose to measure this discrepancy with the Kullback–Leibler (KL) divergence.
Similar to above, for a discrete admissible observation $\left(\mathbf{x}^N,\mathbf{v}^N,\boldsymbol{\sigma}^N\right)$ , we define the inference functional for hypothetical payoff functions ${J} \in X$
The natural candidate for the limit functional is
where $(\mu^\infty, \sigma^\infty)$ are as introduced in Assumption 3.1, point . Due to the particular structure of stationary mixed strategies, (2.15), and the KL divergence, functionals (3.10) are convex on X (Proposition A.5), which is a potential advantage over (3.9).
3.3.3 Analysis of the inference functionals
We now provide some theoretical results about the inference functionals of Sections 3.3.1 and 3.3.2. The first result establishes that the obtained minimal inference functional value provides an upper bound on the accuracy of the trajectories that are simulated with the inferred $\kern1pt\hat{{\kern1.5pt}J}$ (the proof builds upon the proof of [Reference Bongini, Fornasier, Hansen and Maggioni10, Proposition 1.1] and is given in Section A.3).
Theorem 3.3. Let $\kern1pt\hat{{\kern1pt}J} \in X$ and $\left(\mathbf{x}^N,\mathbf{v}^N,\boldsymbol{\sigma}^N\right)$ be an admissible discrete observation for N agents (cf. Assumption 3.1, point ). Let $\hat{\mathbf{x}}^N$ be the solution of (2.15) induced by $\kern1pt\hat{{\kern1pt}J}$ for the initial condition $\hat{\textbf{x}}^N(0) = \textbf{x}^N(0) = \left(\bar{x}_{1}^N,\dots,\bar{x}_{N}^N\right) \in [\mathbb{R}^d]^N$ . Then,
for all $t \in [0,T]$ , with $C = C(T, \kern1pt\hat{{\kern1pt}J}, e, \varepsilon)$ . Analogously, let $(\mu^\infty, \sigma^\infty)$ as introduced in Assumption 3.1, point and let $\hat{\mu}$ be the solution of (2.17) induced by $\kern1pt\hat{{\kern1pt}J}$ for the initial condition $\bar{\mu} = \mu^\infty(0)$ . Then,
for all $t \in [0,T]$ and the same constant C as above.
Next, we address the existence of minimising payoff functions, both in theory and numerical approximation. At the theoretical level, we need to ensure compactness of minimising sequences, which we obtain here by restriction to a suitable compact space. At the numerical level, we are interested in finitedimensional approximations of this space that asymptotically are dense as the discretisation is refined.
Remark 3.4 (Compactness and finitedimensional approximation). In order to obtain compactness, we restrict the variational problems to suitable subspaces of X. In particular, for $R, M > 0$ , let us define
The parameter R bounds the learning domain in space: inference where we have no data available is simply meaningless. The parameter M will ensure compactness with respect to uniform convergence.
For each $R,M > 0,$ we consider a family of closed convex subsets $\left(X_{R,M}^N\right)_{N \in \mathbb{N}} \subset X_{R,M}$ satisfying the uniform approximation property: for every $J \in X_{R,M}$ there exists a sequence $(J^N)_{N \in \mathbb{N}} $ uniformly converging to J with $J^N \in X_{R,M}^N$ for each $N \in \mathbb{N}$ . These approximating spaces can be selected to be finite dimensional.
An alternative way to obtain compactness would be via the introduction of additional regularising functionals. This is discussed in Section 4.1. Conceptually, both approaches serve the same purpose and we find that the former is more convenient for analysis while the latter is more attractive numerically.
The following key lemma ensures convergence of the error functionals under uniform convergence of the payoff functions and/or $W_1$ convergence of measures.
Lemma 3.5. Let $\left(\mathbf{x}^N,\mathbf{v}^N,\boldsymbol{\sigma}^N\right)_N$ be a family of admissible observations and $\mathcal{E}^{N}_v$ , $\mathcal{E}_v$ , $\mathcal{E}^{N}_\sigma$ and $\mathcal{E}_\sigma$ be the corresponding discrete and continuous inference functionals (3.9) and (3.10). For $M>0$ , let $\left({J}^N\right)_N$ be a sequence of payoff functions in $X_{R,M}$ , converging uniformly to some ${J} \in X_{R,M}$ as $N \to \infty$ . Then,
Proof. Thanks to Assumption 3.1, point , we have $W_1\left(\mu^N(t),\mu^\infty(t)\right) \to 0$ uniformly in $t \in [0,T]$ as $N \to \infty$ . Lemma 2.10 provides
uniformly in $x \in B^d({R}), t \in [0,T]$ and N. Hence, $v^{{J}^N}\!\left(\mu^N(t)\right)(x)$ converges uniformly to $v^J(\mu^\infty(t))(x)$ in $C([0,T]\times B^d({R}); \mathbb{R}^d)$ . Since $\v^J(\mu^\infty(t))(x)\ \leq \e\_\infty$ for every $x \in B^d({R})$ and $t \in [0,T]$ , we also have $\v^{J^N}\left(\mu^N(t)\right)(x) \^2 \to \v^J(\mu^\infty(t))(x)\^2$ uniformly in $C([0,T]\times B^d({R}))$ . Recalling that $v_i^{{J}^N}\left(x_1^N(t),\ldots,x_N^N(t)\right)(x) = v^{{J}^N}\left(\mu^N(t)\right)\left(x_i^N(t)\right)$ by definition and using the convergence of velocities provided by Assumption 3.1, point , the result follows by passing to the limit in the definition:
An analogous derivation applies to the $\sigma$ energies (3.10) where we use the Lipschitz estimate (2.20) and the $\sigma$ part of Assumption 3.1, point .
We are now ready to state the main result concerning the inference functionals and convergence of minimisers for $\mathcal{E}_v^{N}$ and $\mathcal{E}_\sigma^{N}$ .
Theorem 3.6. Let $\left(\mathbf{x}^N,\mathbf{v}^N,\boldsymbol{\sigma}^N\right)_N$ be a family of admissible observations and fix $M > 0$ . Then:

1. The functionals (3.9a) and (3.10a) have minimisers over $X_{R,M}^N$ and (3.9b) and (3.10b) have minimisers over $X_{R,M}$ .

2. For $N > 0$ , let
\begin{equation*} {J}^N_v \in arg min_{{J} \in X_{R,M}^N} \mathcal{E}^{N}_v({J}) \quad {and} \quad {J}^N_\sigma \in arg min_{{J} \in X_{R,M}^N} \mathcal{E}^{N}_\sigma({J}). \end{equation*}The sequence $\left({J}^N_v\right)_N$ , resp. $\left({J}^N_\sigma\right)_N$ , has a subsequence converging uniformly to some continuous function ${J}_v \in X_{R,M}$ , resp. ${J}_\sigma \in X_{R,M}$ , such that(3.12) \begin{align} \lim_{N \to \infty} \mathcal{E}^{N}_v\!\left({J}^N_v\right) = \mathcal{E}_v({J}_v) \quad {and} \quad \lim_{N \to \infty} \mathcal{E}^{N}_\sigma\left({J}^N_\sigma\right) = \mathcal{E}_\sigma({J}_\sigma). \end{align}In particular, ${J}_v$ (resp. ${J}_\sigma$ ) is a minimiser of $\mathcal{E}_v^J$ (resp. $\mathcal{E}_\sigma^J$ ) over $X_{R,M}$ .
Proof. Part 1: existence of minimisers. All functionals (3.9) and (3.10) are continuous in J with respect to uniform convergence which follows from Lemma 3.5 setting the initial measure $\bar{\mu}$ to be the empirical one (i.e. just looking at converging J for fixed trajectories). The sets $X_{R,M}$ and $X_{R,M}^N$ are compact with respect to uniform convergence. Therefore, minimisers exist.
Part 2: convergence of minimisers. The proof is inspired by [Reference Bongini, Fornasier, Hansen and Maggioni10, Section 4.2]. Since $X_{R,M}^N \subset X_{R,M}$ and the latter is compact, the sequence of minimisers $\left({J}_v^N\right)_N$ with ${J}^N_v \in X_{R,M}^N$ must have a cluster point ${J}_v$ in $X_{R,M}$ . For simplicity, denote the converging subsequence again by $\left({J}^N_v\right)_N$ . Let $\tilde{J} \in X_{M,R}$ . By the uniform approximation property of $X_{R,M}^N$ let $\left(\tilde{J}^N\right)_N$ be a sequence converging uniformly to $\tilde{J}$ and $\tilde{J}^N \in X_{R,M}^N$ . By Lemma 3.5 and optimality of each ${J}^N_v$ ,
Therefore, ${J}_v$ minimises $\mathcal{E}_v$ and the optimal values converge. The same argument covers the $\sigma$ functionals (3.10) and the corresponding sequence of minimisers $\left({J}_\sigma^N\right)_N$ .
We conclude this section with a couple of comments and remarks on the interpretation of results in Theorem 3.6.
Remark 3.7 (Realisability of data). In general, observed data may or may not be realisable by the model. This distinguishes two regimes.

• Assume there is a true regular J that generated the observed data (as for example in Lemma 3.2). If we pick M large enough in Theorem 3.6, so that $J \in X_{R,M}$ , one has $\mathcal{E}_v({J}_v) \leq \mathcal{E}_v({J}) = 0$ and analogously $\mathcal{E}_v({J}_\sigma) \leq \mathcal{E}_\sigma({J}) = 0$ . Hence, minimisers of $\mathcal{E}_v$ and $\mathcal{E}_\sigma$ over $X_{R,M}$ reproduce exactly the trajectories of the system thanks to Theorem 3.3.

• Assume there is no true regular J that generated the observed data. This means that, no matter how large M is taken in Theorem 3.6, the minimal limit energies $\mathcal{E}_\sigma({J}_\sigma)$ and $\mathcal{E}_v({J}_v)$ may not be equal to 0. However, the bound on the trajectories generated by the inferred model, Theorem 3.3, still holds and so the remaining residual values $\mathcal{E}_\sigma({J}_\sigma)$ and $\mathcal{E}_v({J}_v)$ are then an indication for how well our parametrised family of interaction models was able to capture the structure in the observed data.
Remark 3.8 (Nonuniqueness of minimisers). We point out that the parametrisation of an entropic game model by a payoff function J is not unique. Replacing J by $\kern1pt\hat{{\kern1pt}J}\left(x,u,x^{\prime},u^{\prime}\right) \;:=\; J\!\left(x,u,x^{\prime},u^{\prime}\right) + g\left(x,x^{\prime},u^{\prime}\right)$ yields the same agent trajectories for arbitrary $g \,:\, \Omega \times \Omega \times U$ . This can be seen from the fact that $f^{J+g}=f^J$ for $f^J$ and $f^{J+g}$ as given in (2.3). Intuitively, given some x, x^{′} and u^{′}, which strategy u is most attractive does not change, if we add a constant benefit for all potential choices. The analogous observation holds for the undisclosed case. This implies that minimisers of (3.8), (3.9) and (3.10) are not unique. This has to be taken into account when we try to interpret an inferred J. In our numerical examples with the undisclosed fastreaction model, we usually made additional structural assumptions on J (see Section 4.1) and as a consequence did not observe issues with nonunique minimisers. The phenomenon is encountered in a proofofconcept example on the differential inference functional, Section 4.3.
4 Numerical experiments
In this section, we present some numerical examples for the entropic game models of Section 2 and their inference with the functionals of Section 3 to provide a basic proofofconcept and some intuition about their behaviour.
4.1 Numerical setup
Preliminaries and discretisation. We assume for simplicity that the strategy space U is finite, that is, $U=\{u_1,\ldots,u_K\}$ for some $K>0$ and fix the reference measure $\eta = \sum_{k=1}^K \frac{1}{K}\delta_{u_k}$ as the normalised uniform measure over U.
For a system of N agents starting at positions $x_1(0), \dots, x_N(0)$ , we observe S snapshots of the evolution at times $t_s = s\cdot \frac{T}{S}$ for $s = 1,\dots,S$ , so that we are given locations $\{x_i(t_s)\}_{s,i}$ , velocities $\{\partial_t x_i(t_s)\}_{s,i}$ and (sometimes) mixed strategies $\{\sigma_i(t_s)\}_{s,i}$ where indices s and i run from 1 to S and 1 to N respectively. Here, $\sigma_{i}(t_s) = (\sigma_{i,1}(t_s),\dots,\sigma_{i,K}(t_s))$ is a discrete probability density with respect to $\eta$ .
Let $\Omega = \prod_{i=1}^d [\ell_i, r_i]$ , $\ell_i,r_i \in \mathbb{R}$ , be the smallest hypercube containing all observed locations. We discretise $\Omega$ by a regular Cartesian grid and describe the unknown payoff J by its values $J(\cdot,u_k,\cdot)$ at grid points for $u_k \in U$ (or $J(\cdot,u_k,\cdot,u_{k^{\prime}})$ in the fully general setting). Between grid points, J is extended by coordinatewise linear interpolation, that is, we consider piecewise dlinear finite elements over hypercubes.
Within this setting, the inference functional (3.10) reduces to the discrete error functional
where for $s \in \{0,\ldots,S\}$ , $i \in \{1,\dots,N\}$ and $k \in \{1,\dots,K\}$ , the optimal mixed strategy at time $t_s$ is given by (2.15) as
Similarly, the inference functional (3.9) reduces to the discrete functional
where
Reducing dimensionality. When $d \geq 2$ , describing a general function $J \,:\, \Omega \times U \times \Omega \to \mathbb{R}$ where $\Omega \subset \mathbb{R}^d$ (or even $\Omega \times U \times \Omega \times U \to \mathbb{R}$ in the fully general setting) requires many degrees of freedom (one value for each node on the grid) and inferring them from data in a reliable way would require a large set of observations. After inference, interpreting such a general function to understand the interactions between agents can be a daunting task. For these two reasons, we may wish to incorporate some prior knowledge on the structure of J via a suitable ansatz. A prototypical example is
where the first term models how a single agent prefers to move by itself depending on its absolute location, and the second term models a translation invariant pairwise interaction with other agents. This reduces the dimensionality from one function over $\mathbb{R}^d \times U \times \mathbb{R}^d$ to two functions over $\mathbb{R}^d \times U$ . Further simplifications can be made, for instance, by assuming that the interaction is also rotation invariant. Such parametrisations are very common in Newtonian models, see for instance [Reference Bongini, Fornasier, Hansen and Maggioni10], where radially symmetric interaction functions are assumed. Unless noted otherwise, we will in the following use the ansatz (4.3) in our experiments.
Regularisation and dropping the Lipschitz constraints. Even after the reduction (4.3), inferring the coefficients of J is a highdimensional nonlinear inverse problem. We may observe no (or very few) agents near some grid points and thus have no reliable data to set the corresponding coefficients of J directly. A common approach to avoiding illposedness is to add regularisation to the minimisation problem. Intuitively, this will diffuse information about observations over the grid to some extent. For regularisation in $\Omega,$ we use the squared $L^2$ norm of the gradient of J, which boils down to weighted sums of the finite differences between grid points. For the strategy space U, one can in principle do the same, although this was not necessary in our examples. When J is split as in (4.3), a regulariser can be applied separately to each term. The regularised version of (4.1) then becomes
Of course, overregularisation will lead to loss of contrast and highfrequency features of J. The effect of underregularisation is illustrated in Example 4.5.
While in Theorem 3.6 we consider increasingly finer discretisations $X_{R,M}^N$ , $N \to \infty$ , of the payoff function space $X_{R,M}$ , in our numerical examples a fixed resolution is entirely sufficient (approximately 30 grid points along any spatial dimension). On a fixed grid, the Lipschitz and $L^2$ gradient seminorms are equivalent. Consequently, for simplicity, in our numerical experiments we may drop the explicit constraint on the Lipschitz constant of J and only use the regularisation term to impose regularity. This has the advantage that our finitedimensional discrete problem is then unconstrained, and we may approach it with quasiNewton methods such as LBFGS, for which we use the Julia package Optim [Reference Mogensen and Riseth52].
Inferring ${\boldsymbol\sigma}_{\kern1pt\textbf{i}}(\textbf{t}_{\textbf{s}})$ from observations. In many situations, we are not able to observe the mixed strategies of agents, but merely their locations and velocities. We can then decide to use the nonconvex velocitybased error functional (4.2). Alternatively, in case we want to optimise the convex inference functional $\mathcal{E}_\sigma^N$ (4.1) instead, we can try to infer some meaningful $\sigma_i(t_s)$ from the observed velocities. In both cases, we need to make suitable choices for U and e. Unless stated otherwise, we pick U as a discrete subset of $\mathbb{R}^d$ such that observed velocities lie in the (interior of the) convex hull of U and pick $e(x,u) \;:=\; u$ . This means that every observed velocity can be reproduced by some mixed strategy. As a heuristic method for inferring a viable mixed strategy $\sigma$ from a velocity v, we propose to solve the following minimisation problem:
The constraint enforces that $\sigma$ reproduces the observed velocity v. The term $\uv\^2\,\sigma(u)$ encourages $\sigma$ to be concentrated in the vicinity of v in U, whereas the entropic term $\varepsilon \cdot \log(\sigma(u))\,\sigma(u)$ keeps $\sigma$ from collapsing onto a single point. It is easy to see that minimisers of (4.5) are of the form $\sigma(u)=A \cdot \exp(\u\tilde{v}\^2/\varepsilon)$ where A is a normalisation factor and $\tilde{v}$ is a velocity close to v, potentially slightly shifted to make sure that the constraint $\int_U u \cdot \sigma(u)\,\textrm{d} \eta(u)=v$ is satisfied. (4.5) is readily solved by standard methods. We consider this approach a heuristic ‘inverse’ of the approximation of Newtonian models by game models, as outlined in Example 2.7.
Observations from forward models: Subsampling and multiple realisations. In all the examples here below, observations are obtained by means of simulating a forward model with explicit Eulerstepping with timestep size $\Delta t=0.02$ . Since the numerical complexity of our inference functionals grows linearly with the number of observations, we usually choose to subsample the trajectories in time, keeping only every $\delta$ th step, since intuitively the information contained in subsequent steps is largely redundant as agents are confronted with a very similar configuration. Unless noted otherwise, we set $\delta=2$ .
The quality of the reconstructed model is highly related to the data quality: more observations in a given region imply a better reconstruction of the payoff function in that specific region. Generally, if we consider as inputs only data coming from a single realisation of the system we have highly correlated observations, which tend to concentrate only on a small subset of the reconstruction domain. Vaster regions of the domain are then explored taking $N \to \infty$ . However, another option to enhance the quality of the reconstructed model is to combine data from multiple realisations with different initial conditions, that is, we fix the number of agents N and simulate the system for multiple different initial conditions. This provides a good coverage of the domain. Extending the functionals in Section 3 to multiple observations is straightforward.
4.2 Learning from undisclosed fastreaction game models
We now begin with relatively simple numerical examples: an explicit game model is fixed (including U, e, J), simulations according to this model are performed and subsequently the payoff function is inferred from the observations. By possibly rescaling J, we may w.l.o.g. set $\varepsilon=1$ for all numerical experiments, see (2.15). In this section we focus on the undisclosed fastreaction setting. A proof of concept of inference for the full entropic game models via the differential functional of Section 3.2 is discussed in Section 4.3.
Example 4.1 (Game model, $d=1$ ). Let $d=1$ , $U=\{1,+1\}$ and $e(x,u)=u$ . We set
$J_1$ encourages agents to move towards the origin $x=0$ ; for example, for $x>0$ one has $J_1(x,1)>J_1(x,+1)$ , thus preferring strategy (and velocity) $u=1$ over $u=+1$ . $J_2$ implements repulsion between nearby agents: when $\Delta x \in (0,1)$ , that is, the ‘other agent’ is to the right of ‘me’ and relatively close, then $J_2(\Delta x,1)>J_2(\Delta x,+1)$ , and I am encouraged to move left.
We set $N=8$ , simulated 100 instances over a time span of $0.2$ , collecting 5 data points per instance, that is, a total of 500 observed configurations (consisting of locations and mixed strategies). In each instance, initial locations are sampled uniformly from $[1,1]$ . Observed relative locations between two agents are thus distributed over $[2,2]$ . Describing the discrete J required 178 coefficients. Since observed mixed strategies are available in this example, we use the energy (4.1) for inference augmented with a regulariser as discussed in Section 4.1, with $\lambda_{1}=\lambda_2=10^{6}$ , see (4.4). The results are illustrated in Figure 4. We find that, as intended, the model describes agents moving towards the origin while avoiding getting too close to each other. The functions $J_1$ and $J_2$ are accurately recovered from the data. For newly generated initial conditions, that is, not taken from the training data, the trajectories simulated with the inferred J are in excellent agreement with the underlying true model, demonstrating that the inferred model is able to generalise.
Example 4.2 (Towards the meanfield regime). We can also explore the meanfield regime numerically. Figure 5 shows trajectories for the same model as in the previous example for an increasing number of particles where the initial locations are sampled from some underlying distribution. As anticipated, the behaviour approaches a consistent limit.
Inference can also be performed for larger numbers of particles. We applied the same approach as in the previous example with $N=100$ agents. Since we are dealing with a large number of agents, already a small number of configurations carries enough information for the inference. Thus, we simulate 10 instances over a time span of $0.02$ with timestep size $\Delta t = 0.002$ , collecting 2 data points per instance; that is, a total of 20 observed configurations (for locations and mixed strategies). In each instance, initial locations are sampled uniformly from $[1,1]$ . The inferred payoff function is essentially identical to the one obtained in Figure 4.
Example 4.3 (Game model, $d=1$ , noisy data). We repeat the previous example with some added noise. We consider the same set of observations, but the observed mixed strategies are corrupted by resampling: For a given ‘true’ simulated $\sigma$ , we draw 20 times from U, according to $\sigma$ and now use the resulting empirical distribution for inference. These new corrupted $\sigma$ provide corrupted velocities predictions: in our example, the overall standard deviation between exact and corrupted velocities is $\approx 0.2$ (with the range of admissible velocities being $[1,1]$ ). This could model the situation where observation of the mixed strategies is imperfect or by observing very noisy (stochastic) agent movement and using empirical distributions of agent velocities over short time intervals as substitute for mixed strategies. The results are shown in Figure 6, for parameters $\lambda_{1}=\lambda_2=10^{5}$ to enforce more regularity due to noisy inputs. The inferred payoff functions are similar to the previous example with just a few spurious fluctuations. The trajectories simulated with the inferred J are close to the (unperturbed) true trajectories, the error is consistently smaller than the one caused by the resampling noise, indicating that inference also works well in noisy settings.
Example 4.4 (Game model, $d=2$ ). Let $d=2$ , $U = \{(1,1),(1,1),(1,1),(1,1)\} \subset \mathbb{R}^2$ and $e(x,u)=u$ . We use the same J defined in (4.6), thus combining a driving force towards the origin $x=(0, 0)$ and repulsion between nearby agents (we assume $\tanh$ to be applied componentwise to vectors).
Data collection follows the same approach as in Example 4.1: for $N=8$ agents we simulate 100 instances over a time span of $0.2$ , collecting 5 data points per instance, for a total of 500 observed configurations. In each instance, initial locations are sampled uniformly from $[0.75,0.75]^2$ . Observed relative locations between two agents are thus distributed over $[1.5,1.5]^2$ , as displayed in Figure 7(b). Describing the discrete J required 10,656 coefficients, resulting from a $30\times 30$ spatial grid for $J_1$ and a $42\times 42$ spatial grid for $J_2$ . We use again the energy (4.1) for inference with a regulariser on derivatives, with $\lambda_{1}=\lambda_2=10^{5}$ . The results are illustrated in Figure 7. The recovered functions $J_1$ and $J_2$ reproduce the same structural features of the exact ones and the trajectories simulated with the inferred J follow closely the underlying true model (also in this case newly generated initial conditions are considered).
Example 4.5 (Underregularisation). As is common in inverse problems, if the regularisation parameters $\lambda_1, \lambda_2$ in (4.4) are chosen too high, one obtains an overregularised result, where the large penalty for spatial variation leads to a loss of contrast and highfrequency features of J. Conversely, we may experience underregularisation when the parameters are small and the observed data does not cover the learning domain sufficiently, for example when we only observe few realisations of the system. Such a case is illustrated in Figure 8(c) where the observations are concentrated on a few onedimensional trajectories. For weak regularisation, the reconstructed J then tends to be concentrated on these trajectories as well, see Figure 8(a). And while the inferred J may predict the agent behaviour with high accuracy on these trajectories, it will generalise very poorly to different data. With stronger regularisation, J is extended beyond the trajectories in a meaningful way, see Figure 8(b).
4.3 An example for differential inference
The ‘differential inference’ functional $\mathcal{E}_{\dot{\sigma}}^{J,N}$ introduced in Section 3.2 was constructed in close analogy to (1.7). We have argued that it may be challenging to apply to real data, as it would require the observation of mixed strategies $\{\sigma_{i}(t_s)\}_{s,i}$ and their temporal derivatives $\{\partial_t \sigma_{i}(t_s)\}_{s,i}$ . As a proof of concept, we now demonstrate that if these observations are indeed available, the numerical setup described up to now can be easily adapted to $\mathcal{E}_{\dot{\sigma}}^{J,N}$ . After discretisation, $\mathcal{E}_{\dot{\sigma}}^{J,N}$ reduces to a finitedimensional quadratic objective. For a quadratic regulariser, such as the squared $L^2$ norm of the gradient of J, minimisation then corresponds to solving a linear system. Alternatively, we can implement the Lipschitz constraint on J which amounts to adding linear constraints to the quadratic problem. We solve the resulting finitedimensional optimisation problem using the operator splitting approach provided by the OSQP solver [Reference Stellato, Banjac, Goulart, Bemporad and Boyd69].
As an example, we simulate the entropic regularised model in (2.2) for $U = \{1,1\}$ and $J \colon \mathbb{R} \times \{1,1\} \times \mathbb{R} \times \{1,1\}$ defined by
Our choice of J is not motivated by providing a meaningful model for anything, but simply to illustrate that inference works in principle. We collect data using the same subsampling approach presented above: 8 agents, 100 realisations for a time span of 0.2, initial conditions uniformly sampled from $[1,1]$ , only every other observation is kept for a total of 500 input configurations. Observed positions (x, x ^{′}) with $x \neq x^{\prime}$ are displayed in Figure 9(a). No particular structure is assumed for J.
As discussed in Remark 3.8, the optimal inferred J is not unique. As a remedy, we normalise the reconstructed $J^r$ such that $J^r(x,1,x^{\prime},1) = J^r(x,1,x^{\prime},1) = 0$ . Comparisons for trajectories are reported in Figure 9(b) for new randomly selected initial conditions with $T = 8$ : trajectories with the reconstructed $J^r$ follow closely the trajectories generated with the true J.
4.4 Learning for Newtonian models
In Example 2.7, we have shown that entropic game models with fast reaction can also capture Newtonian interaction models. In this section, we perform inference with an entropic game model ansatz on data generated by a Newtonian model. As an example, we pick
where the function $\tanh$ is assumed to be applied componentwise. The first term in f encourages agents to move towards the origin, the second term implements a pairwise repulsion at close range. Trajectory data are collected for a two dimensional system: for $N=8$ agents, we simulate 100 instances of (4.7) over a time span of $0.2$ , collecting 5 data points per instance to total 500 observed configurations. In each instance, initial locations are sampled uniformly from $[0.75,0.75]^2$ . Observed relative locations between two agents are thus distributed over $[1.5,1.5]^2$ , as displayed in Figure 10(b).
We use the parametrisation (4.3) for the payoff function. Unlike in the previous examples, the original model does not involve a strategy set U or a map e. Thus, prior to inference, we have to make informed choices for these. As in earlier examples, we choose $U \subset \mathbb{R}^2$ and $e(x,u)=u$ (i.e. pure strategies correspond directly to velocities) and in particular we pick $U = \{u_1,\dots,u_K\}$ such that the convex hull of U in $\mathbb{R}^2$ contains (almost) all observed velocities, meaning that they can be reproduced by suitable mixed strategies. So for (almost) every i and s, we can write $\partial_t x_i(t_s) = \sum_k u_k\sigma_{k}$ for some mixed strategy $\sigma$ over U.
Example 4.6. We pick $U = \{(1,1),(1,1),(1,1),(1,1)\} \subset \mathbb{R}^2$ , optimise the velocitybased energy (4.2) because no mixed strategies are provided by observations and consider a regulariser on derivatives as described in Section 4.1, with $\lambda_{1}=\lambda_2=10^{5}$ . Using the same setup as Example 4.4 we describe the discrete J using 10656 coefficients ( $30\times 30$ grid for $J_1$ and $42\times 42$ grid for $J_2$ ). The results are illustrated in Figure 10. The recovered functions $J_1$ and $J_2$ have the same structure as the ones in Example 4.4: the qualitative behaviour of the two systems is indeed the same. In Figure 10(c), we simulate a couple of dynamics with newly generated initial conditions and observe again how Newtonian trajectories and game trajectories compu