Hostname: page-component-cd4964975-g4d8c Total loading time: 0 Render date: 2023-04-01T12:14:13.322Z Has data issue: true Feature Flags: { "useRatesEcommerce": false } hasContentIssue true

Data-driven entropic spatially inhomogeneous evolutionary games

Published online by Cambridge University Press:  14 March 2022

Campus Institute Data Science, University of Göttingen, Göttingen, Germany emails:,
Department of Mathematics, Technical University of Munich, Garching, Germany email:
Campus Institute Data Science, University of Göttingen, Göttingen, Germany emails:,
Rights & Permissions[Opens in a new window]


We introduce novel multi-agent interaction models of entropic spatially inhomogeneous evolutionary undisclosed games and their quasi-static limits. These evolutions vastly generalise first- and second-order dynamics. Besides the well-posedness of these novel forms of multi-agent interactions, we are concerned with the learnability of individual payoff functions from observation data. We formulate the payoff learning as a variational problem, minimising the discrepancy between the observations and the predictions by the payoff function. The inferred payoff function can then be used to simulate further evolutions, which are fully data-driven. We prove convergence of minimising solutions obtained from a finite number of observations to a mean-field limit, and the minimal value provides a quantitative error bound on the data-driven evolutions. The abstract framework is fully constructive and numerically implementable. We illustrate this on computational examples where a ground truth payoff function is known and on examples where this is not the case, including a model for pedestrian movement.

© The Author(s), 2022. Published by Cambridge University Press

1 Introduction

1.1 Multi-agent interaction models in biological, social and economical dynamics

In the course of the past two decades, there has been an explosion of research on models of multi-agent interactions [Reference Cucker and Dong28, Reference Cucker and Mordecki29, Reference Cucker and Smale31, Reference Grégoire and Chaté36, Reference Jadbabaie, Lin and Stephen Morse40, Reference Ke, Minett, Au and Wang42, Reference Vicsek, Czirok, Ben-Jacob, Cohen and Shochet74], to describe phenomena beyond physics, for example, in biology, such as cell aggregation and motility [Reference Camazine, Deneubourg, Franks, Sneyd, Theraulaz and Bonabeau11, Reference Keller and Segel43, Reference Koch and White45, Reference Perthame61], coordinated animal motion [Reference Ballerini, Cabibbo, Candelier, Cavagna, Cisbani, Giardina, Lecomte, Orlandi, Parisi, Procaccini, Viale and Zdravkovic8, Reference Carrillo, D’Orsogna and Panferov16, Reference Chuang, D’Orsogna, Marthaler, Bertozzi and Chayes18, Reference Couzin and Franks21, Reference Couzin, Krause, Franks and Levin22, Reference Cristiani, Piccoli and Tosin24, Reference Cucker and Smale31, Reference Niwa54, Reference Parrish and Edelstein-Keshet57, Reference Parrish, Viscido and Gruenbaum58, Reference Romey63, Reference Toner and Tu71, Reference Yates, Erban, Escudero, Couzin, Buhl, Kevrekidis, Maini and Sumpter77], coordinated human [Reference Cristiani, Piccoli and Tosin25, Reference Cucker, Smale and Zhou30, Reference Short, D’Orsogna, Pasour, Tita, Brantingham, Bertozzi and Chayes68], and synthetic agent behaviour and interactions, such as cooperative robots [Reference Chuang, Huang, D’Orsogna and Bertozzi19, Reference Leonard and Fiorelli49, Reference Perea, Gómez and Elosegui60, Reference Sugawara and Sano70]. We refer to [Reference Carrillo, Choi and Hauray13, Reference Carrillo, Choi and Perez15, Reference Carrillo, Fornasier, Toscani and Vecil17, Reference Vicsek and Zafeiris75] for recent surveys.

Two main mechanisms are considered in such models to define the dynamics. The first takes inspiration from physical laws of motion and is based on pairwise forces encoding observed ‘first principles’ of biological, social, or economical interactions, for example, repulsion-attraction, alignment, self-propulsion/friction et cetera. Typically, this leads to Newton-like first- or second-order equations with ‘social interaction’ forces, see (1.1) below. In the second mechanism, the dynamics are driven by an evolutive game where players simultaneously optimise their cost. Examples are game theoretic models of evolution [Reference Hofbauer and Sigmund38] or mean-field games, introduced in [Reference Lasry and Lions48] and independently under the name Nash Certainty Equivalence (NCE) in [Reference Huang, Caines and Malhamé39], later greatly popularised, for example, within consensus problems, for instance in [Reference Nuorian, Caines and Malhamé55, Reference Nuorian, Caines and Malhamé56].

More recently, the notion of spatially inhomogeneous evolutionary games has been proposed [Reference Almi, Morandotti and Solombrino5, Reference Ambrosio, Fornasier, Morandotti and Savaré6, Reference Morandotti and Solombrino53] where the transport field for the agent population is directed by an evolutionary game on their available strategies. Unlike mean-field games, the optimal dynamics are not chosen via an underlying non-local optimal control problem but by the agents’ local (in time and space) decisions, see (1.3) below.

One fundamental goal of these studies is to reveal the relationship between the simple pairwise forces or incentives acting at the individual level and the emergence of global patterns in the behaviour of a given population.

Newtonian models. A common simple class of models for interacting multi-agent dynamics with pairwise interactions is inspired by Newtonian mechanics. The evolution of N agents with time-dependent locations $x_1(t),\ldots,x_N(t)$ in $\mathbb{R}^d$ is described by the ODE system

(1.1) \begin{align} \partial_t x_i(t) = \frac{1}{N} \sum_{j=1}^N f\!\left(x_i(t),x_j(t)\right) \qquad \text{for } i=1,\ldots,N,\end{align}

where f is a pre-determined pairwise interaction force between pairs of agents. The system is well-defined for sufficiently regular f (e.g. for f Lipschitz continuous). In this article, we will refer to such models as Newtonian models.

First-order models of the form (1.1) are ubiquitous in the literature and have, for instance, been used to model multi-agent interactions in opinion formation [Reference Hegselmann and Krause37, Reference Krause46], vehicular traffic flow [Reference Garavello and Piccoli35], pedestrian motion [Reference Cristiani, Piccoli and Tosin26], and synchronisation of chemical and biological oscillators in neuroscience [Reference Kuramoto47].

Often one is interested in studying the behaviour of a very large number of agents. We may think of the agents at time t to be distributed according to a probability measure $\mu(t)$ over $\mathbb{R}^d$ . The limit-dynamics of (1.1) as $N \to \infty$ can under suitable conditions be expressed directly in terms of the evolution of the distribution $\mu(t)$ according to a mean-field equation. The mean-field limit of (1.1) is formally given by

(1.2) \begin{align} \partial_t \mu(t) + \textrm{div}\big( v(\mu(t)) \cdot \mu(t) \big) = 0 \qquad \text{where} \qquad v(\mu(t))(x) \;:=\; \int_{\mathbb{R}^d} f(x,x^{\prime})\,\textrm{d} \mu(t)(x^{\prime}).\end{align}

Here $v(\mu(t))$ is a velocity field and intuitively $v(\mu(t))(x)$ gives the velocity of infinitesimal mass particles of $\mu$ at time t and location x [Reference Cañizo, Carrillo and Rosado12, Reference Carrillo, Choi and Hauray14].

While strikingly simple, such models only exhibit limited modelling capabilities. For instance, the resulting velocity $\partial_t x_i(t)$ is simply a linear combination of the influences of all the other agents. ‘Importance’ and ‘magnitude’ of these influences cannot be specified separately: agent j cannot tell agent i to remain motionless, regardless of what other agents suggest. Generally, agent i has no sophisticated mechanism of finding a ‘compromise’ between various potentially conflicting influences and merely uses their linear average. The applicability of such models to economics or sociology raises concerns as it is questionable whether a set of static interaction forces can describe the behaviour of rational agents who are able to anticipate and counteract undesirable situations.

Spatially inhomogeneous evolutionary games. In a game dynamic, the vector field $v(\mu(t))$ is not induced by a rigid force law but by the optimisation of an individual payoff by each agent. In mean-field games, this optimisation is global in time, each agent plans their whole trajectory in advance, optimisation can be thought of as taking place over repetitions of the same game (e.g. the daily commute). Alternatively, in spatially inhomogeneous evolutionary games [Reference Ambrosio, Fornasier, Morandotti and Savaré6] the agents continuously update their mixed strategies in a process that is local in time, which may be more realistic in some scenarios. This is implemented by the well-known replicator dynamics [Reference Hofbauer and Sigmund38].

As above, agents may move in $\mathbb{R}^d$ . Let U be the set of pure strategies. The map $e \,:\, \mathbb{R}^d \times U \to \mathbb{R}^d$ describes the spatial velocity e(x, u) for an agent at position $x \in \mathbb{R}^d$ with a pure strategy $u \in U$ . For example, one can pick $U \subset \mathbb{R}^d$ to be a set of admissible velocity vectors and simply set $e(x,u) = u$ . e acts as dictionary between strategies and velocities and can therefore assumed to be known. e(x, u) may be more complicated, for instance, when certain velocities are inadmissible at certain locations due to obstacles. A function $J\,:\,(\mathbb R^d\times U)^2 \to \mathbb R$ describes the individual benefit J(x, u, x , u ) for an agent at x for choosing strategy u when another agent sits at x and intends to choose strategy u . For example, J(x, u, x , u ) may be high when e(x, u) points in the direction that x wants to move, but could be lowered when the other agent’s velocity e(x , u ) suggests an impending collision. The state of each agent is given by their spatial position $x \in \mathbb{R}^d$ and a mixed strategy $\sigma \in \mathcal{M}_1(U)$ (where $\mathcal{M}_1(U)$ denotes probability measures over U). As hinted at, the evolution of $\sigma$ is driven by a replicator dynamic involving the payoff function J, averaged over the benefit obtained from all other agents, the spatial velocity $\partial_t x$ is obtained by averaging $\int_U e(x,u)\,\textrm{d} \sigma(u)$ . The equations of motion are given by

(1.3a) \begin{align} \partial_t x_i(t) = \int_U e(x_i(t),u)\,\textrm{d} \sigma_i(t)(u), \end{align}
(1.3b) \begin{align}\qquad\qquad\partial_t \sigma_i(t) = \frac{1}{N} \sum_{j=1}^N f^J\!\big(x_i(t),\sigma_i(t),x_j(t),\sigma_j(t)\big)\end{align}

where for $x,x^{\prime} \in \mathbb{R}^d$ and $\sigma,\sigma^{\prime} \in \mathcal{M}_1(U)$

(1.4) \begin{align}f^J\!\left(x,\sigma,x^{\prime},\sigma^{\prime}\right) \;:=\; \left[ \int_U J(x,\cdot,x^{\prime},u^{\prime})\,\textrm{d} \sigma^{\prime}\left(u^{\prime}\right) - \int_U \int_U J\!\left(x,v,x^{\prime},u^{\prime}\right)\!\textrm{d} \sigma^{\prime}\left(u^{\prime}\right)\textrm{d} \sigma(v) \right] \cdot \sigma.\end{align}

Intuitively, the strategy $\sigma_i$ tends to concentrate on the strategies with the highest benefit for agent i via (1.4), which will then determine their movement via (1.3a).

Equation (1.3a) describes the motion of agents in $\mathbb R^d$ and resembles (1.1). The main difference is that the vector field is determined by the resulting solution of the replicator equation (1.3b), which promotes strategies that perform best with respect to the individual payoff J with rate (1.4). Despite the different nature of the variables $(x_i,\sigma_i)$ of the model, the system (1.3) can also be re-interpreted as a Newtonian-like dynamics on the space $\mathbb{R}^d \times \mathcal{M}_1(U)$ where each agent $y_i(t)=(x_i(t),\sigma_i(t))$ is characterised by its spatial location $x_i(t) \in \mathbb{R}^d$ and its mixed strategy $\sigma_i(t) \in \mathcal{M}_1(U)$ . Accordingly, equations (1.3) can be more concisely expressed as

(1.5) \begin{align} \partial_t y_i(t) = \frac{1}{N} \sum_{j=1}^N f\!\left(y_i(t),y_j(t)\right) \quad \text{where} \quad f\!\left((x,\sigma),\left(x^{\prime},\sigma^{\prime}\right)\right) \;:=\; \left( {\textstyle \int_U e(x,u)\,\textrm{d} \sigma(u)}, f^J\!\left(x,\sigma,x^{\prime},\sigma^{\prime}\right) \right).\end{align}

Similarly to mean-field limit results in [Reference Cañizo, Carrillo and Rosado12, Reference Carrillo, Choi and Hauray14], the main contribution of [Reference Ambrosio, Fornasier, Morandotti and Savaré6] is to show that the large particle limit for $N\to \infty$ of solutions of (1.3) converges in the sense of probability measures to $\Sigma(t) \in \mathcal{M}_1(\mathbb{R}^d \times \mathcal{M}_1(U))$ , which is, in analogy to (1.2), solution of a nonlinear transport equation of the type

(1.6) \begin{align} \partial_t \Sigma(t) + \textrm{Div}\big( v(\Sigma(t)) \cdot \Sigma(t) \big) = 0 \quad \text{where} \quad v(\Sigma(t))(y) \;:=\; \int f\!\left(y,y^{\prime}\right)\!\textrm{d} \Sigma(t)(y^{\prime}).\end{align}

Note that (1.6) is a partial differential equation having as domain a (possibly infinite-dimensional) Banach space containing $\mathbb{R}^d \times \mathcal{M}_1(U)$ and particular care must be applied to the use of an appropriate calculus needed to define differentials, in particular for the divergence operator. In [Reference Ambrosio, Fornasier, Morandotti and Savaré6], Lagrangian and Eulerian notions of solutions to (1.6) are introduced. As the technical details underlying the well-posedness of (1.6) do not play a central role in this article, we refer to [Reference Ambrosio, Fornasier, Morandotti and Savaré6] for more insights. Nevertheless, we apply some of the general statements in [Reference Ambrosio, Fornasier, Morandotti and Savaré6] to establish well-posedness of the models presented in this paper.

1.2 Learning or inference of multi-agent interaction models

An important challenge is to determine the precise form of the interaction force or payoff functions. While physical laws can often be determined with high precision through experiments, such experiments are often either not possible, or the models are not as precisely determined in more complex systems from biology or social sciences. Therefore, very often the governing rules are simply chosen ad hoc to reproduce at some major qualitative effects observed in reality.

Alternatively, one can employ model selection and parameter estimation methods to determine the form of the governing functions. Data-driven estimations have been applied in continuum mechanics [Reference Conti, Müller and Ortiz20, Reference Kirchdoerfer and Ortiz44] computational sociology [Reference Bongini, Fornasier, Hansen and Maggioni10, Reference Lu, Maggioni and Tang50, Reference Lu, Zhong, Tang and Maggioni51, Reference Zhong, Miller and Maggioni78] or economics [Reference Albani, Ascher, Yang and Zubelli3, Reference Crépey23, Reference Egger and Engl33]. However, even the problem of determining whether time shots of a linear dynamical system do fulfil physically meaningful models, in particular have Markovian dynamics, is computationally intractable [Reference Cubitt, Eisert and Wolf27]. For nonlinear models, the intractability of learning the system corresponds to the complexity of determining the set of appropriate candidate functions to fit the data. In order to break the curse of dimensionality, one requires prior knowledge on the system and the potential structure of the governing equations. For instance, in the sequence of recent papers [Reference Schaeffer, Tran and Ward64, Reference Schaeffer, Tran and Ward65, Reference Tran and Ward72] the authors assume that the governing equations are of first order and can be written as sparse polynomials, that is, linear combinations of few monomial terms.

In this work, we present an approach for estimating the payoff function for spatially inhomogeneous evolutionary games from observed data. It is inspired by the papers [Reference Bongini, Fornasier, Hansen and Maggioni10, Reference Lu, Maggioni and Tang50, Reference Lu, Zhong, Tang and Maggioni51, Reference Zhong, Miller and Maggioni78], in particular by the groundbreaking paper [Reference Bongini, Fornasier, Hansen and Maggioni10] which is dedicated to data-driven evolutions of Newtonian models. In these references, the curse of dimensionality is remedied by assuming that the interaction function f in (1.1) is parametrised by a lower-dimensional function a, the identification of which is more computationally tractable. A typical example for such a parametrisation is given by functions $f=f^a$ of the type

\begin{align*} f^a\!\left(x,x^{\prime}\right) = a\!\left(|x-x^{\prime}|\right) \cdot \left(x^{\prime}-x\right) \qquad \text{for some} \qquad a\,:\, \mathbb{R} \to \mathbb{R}.\end{align*}

The corresponding model (1.1) is used, for instance, in opinion dynamics [Reference Hegselmann and Krause37, Reference Krause46], vehicular traffic [Reference Garavello and Piccoli35], or pedestrian motion [Reference Cristiani, Piccoli and Tosin26]. The learning or inference problem is then about the determination of a, hence of $f=f^a$ , from observations of real-life realisations of the model (1.1). Clearly, the problem can be formulated as an optimal control problem. However, as pointed out in [Reference Bongini, Fornasier, Hansen and Maggioni10], in view of the nonlinear nature of the function mapping a in the corresponding solution $x^{[a]}(t)$ of (1.1), the control cost would be nonconvex and the optimal control problem rather difficult to solve. Instead, in [Reference Bongini, Fornasier, Hansen and Maggioni10] a convex formulation is obtained by considering an empirical risk minimisation in least squares form: given an observed realisation $(x_1^{N}(t),\dots,x_N^{N}(t))$ of the dynamics between N agents, generated by (1.1) governed by $f^a$ , we aim at identifying a by minimising

(1.7) \begin{align} \mathcal{E}^{N}(\hat{a}) & \;:=\; \frac{1}{T} \int_0^T \left[ \frac{1}{N} \sum_{i=1}^N \left\|\partial_t x_i^{N}(t)- \frac{1}{N} \sum_{j=1}^N f^{\hat{a}}\left(x_i^{N}(t),x_j^{N}(t)\right) \right\|^2 \right] \textrm{d} t.\end{align}

That is, along the observed trajectories we aim to minimise the discrepancy between observed velocities and those predicted by the model. The functional $\mathcal{E}^{N}$ plays the role of a loss function in the learning or inference task. A time-discrete formulation in the framework of statistical learning theory has been proposed also in [Reference Lu, Maggioni and Tang50, Reference Lu, Zhong, Tang and Maggioni51, Reference Zhong, Miller and Maggioni78]. Under the assumption that a belongs to a suitable compact class X of smooth functions, the main results in [Reference Bongini, Fornasier, Hansen and Maggioni10, Reference Lu, Maggioni and Tang50, Reference Lu, Zhong, Tang and Maggioni51, Reference Zhong, Miller and Maggioni78] establish that minimisers

\begin{equation*}\hat a_N \in \arg\min_{\hat a \in V_N}\mathcal{E}^{N}(\hat{a}),\end{equation*}

converge to a in suitable senses for $N\to \infty$ , where the ansatz spaces $V_N \subset X$ are suitable finite-dimensional spaces of smooth functions (such as finite element spaces).

1.3 Contribution and outline

The main scope of this paper is to provide a theoretical and practical framework to perform learning/inference of spatially inhomogeneous evolutionary games, so that these models could be used in real-life data-driven applications. First, we discuss some potential issues with model (1.3) and provide some adaptations. We further propose and study in Section 3 several learning functionals for inferring the payoff function J from the observation of the dynamics, that is, extending the approach of [Reference Bongini, Fornasier, Hansen and Maggioni10] to our modified version of [Reference Ambrosio, Fornasier, Morandotti and Savaré6]. Let us detail our contributions.

The proposed changes to the model (1.3) are as follows:

  • In Section 2.1, we add entropic regularisation to the dynamics for the mixed strategies of the agents. This avoids degeneracy of the strategies and allows faster reactions to changes in the environment, see Figure 1. Entropic regularisation of games was also considered in [Reference Flåm and Cavazzuti34]. We show that the adapted model is well-posed and has a consistent mean-field limit (Theorem 2.6).

    Figure 1. Comparison of original game dynamics and entropic regularisation (possibly with accelerated time scale for the strategy dynamics). For details, see Examples 2.1 and 2.2.

  • For interacting agents, the assumption that the mixed strategies of other agents are fully known may often be unrealistic. Therefore, we will focus our analysis on the case where J(x, u, x , u ) does not explicitly depend on u , the ‘other’ agent’s pure strategy. We refer to this as the undisclosed setting. (The general case is still considered to some extent.)

For this undisclosed setting, we study the quasi-static fast-reaction limit of (1.3) where the dynamics of mixed strategies $(\sigma_i)_{i=1}^N$ run at much faster time scale than dynamics of locations $(x_i)_{i=1}^N$ , that is, agents quickly adjust their strategies to changes in the environment. This will also be important for designing practical inference functionals (see below). The undisclosed fast-reaction model is introduced in Section 2.2. Well-posedness and consistent mean-field limit are proved by Theorem 2.13, convergence to the fast-reaction limit or quasi-static evolution as the strategy time scale becomes faster is given by Theorem 2.14.

We claim that the resulting undisclosed fast-reaction entropic model is a useful alternative of the Newtonian model (1.1) and we support these considerations with theoretical and numerical examples. In particular, we show that any Newtonian model can be described (approximately) as an undisclosed fast-reaction entropic model, whereas the converse it not true (Examples 2.7 and 2.8).

We then discuss several inference functionals for the modified game model in Section 3. We start with a rather direct analogue of (1.7) (Section 3.2), which would require not only the observation of the spatial locations $(x_i)_{i=1}^N$ and velocities $(\partial_t x_i)_{i=1}^N$ but also of the mixed strategies $(\sigma_i)_{i=1}^N$ , and their temporal derivatives $(\partial_t \sigma_i)_{i=1}^N$ . Whether the latter two can be observed in practice is doubtful, in particular with sufficient accuracy. Therefore, as an alternative, we propose two inference functionals for the undisclosed fast-reaction setting. A functional based on observed spatial velocities is given in Section 3.3.1. A functional based on mixed strategies (but not their derivatives) is proposed in Section 3.3.2, and we discuss some options how the required data could be obtained in practice. In Section 3.3.3, some properties of these functionals are established, such as existence of minimisers and consistency of their mean-field versions.

Numerical examples are given in Section 4. These include the inference on examples where observations were generated with a true underlying undisclosed fast-reaction entropic model, as well as inference of Newtonian models and of a model for pedestrian motion adapted from [Reference Bailo, Carrillo and Degond7, Reference Degond, Appert-Rolland, Moussaïd, Pettré and Theraulaz32]. We close with a brief discussion in Section 5. Some longer technical proofs have been moved to Appendix A. Before presenting the new model, we collect some notation in the next subsection.

1.4 Setting and notation

General setting. Let $(Y,d_Y)$ be a complete and separable metric space. We denote by $\mathcal{M}(Y)$ the space of signed Borel measures with finite total variation and by $\mathcal{M}_1(Y)$ the set of probability measures with bounded first moment, that is,

\begin{equation*}\mathcal{M}_1(Y) = \left\{ \mu \in \mathcal{M}(Y) \mid \mu\geq 0, \mu(Y) = 1, \int_Y d_Y(y,\bar{y})\textrm{d}\mu(y) < \infty \text{ for some }\bar{y} \in Y \right\}.\end{equation*}

For a continuous function $\varphi \in C(Y),$ we denote by

\begin{equation*}\textrm{Lip}(\varphi) \;:=\; \sup_{\substack{x,y \in Y\\x\neq y}} \frac{|\varphi(x)-\varphi(y)|}{d_Y(x,y)}\end{equation*}

its Lipschitz constant. The set of bounded Lipschitz functions is then denoted by $\textrm{Lip}_b(Y) = \{ \varphi \in C(Y) \mid \|\varphi\|_\infty + \textrm{Lip}(\varphi) < \infty \}$ , with $\|\cdot\|_\infty$ the classical sup-norm.

For $\mu_1,\mu_2 \in \mathcal{M}_1(Y)$ , the 1-Wasserstein distance $W_1(\mu_1,\mu_2)$ is defined by

(1.8) \begin{equation}W_1(\mu_1,\mu_2) \;:=\; \inf \left\{ \int_{Y\times Y} d_Y(y_1,y_2) \textrm{d}\gamma(y_1,y_2) \mid \gamma \in \Gamma(\mu_1,\mu_2) \right\},\end{equation}

where $\Gamma(\mu_1,\mu_2)$ is the set of admissible coupling between $\mu_1$ and $\mu_2$ , that is, $\Gamma(\mu_1,\mu_2) = \{ \gamma \in \mathcal{M}_1(Y \times Y) \mid \gamma(A \times Y) = \mu_1(A) \text{ and } \gamma(Y \times B) = \mu_2(B)\}$ . Due to Kantorovitch duality, one can also consider the equivalent definition

(1.9) \begin{equation}W_1(\mu_1,\mu_2) \;:=\; \sup \left\{ \int_{Y} \varphi\,\textrm{d}(\mu_1-\mu_2) \mid \varphi \in \textrm{Lip}_b(Y), \textrm{Lip}(\varphi) \leq 1 \right\}.\end{equation}

The metric space $(\mathcal{M}_1(Y), W_1)$ is complete because $(Y, d_Y)$ is complete, and a sequence $(\mu_n)_{n \in \mathbb{N}} \subset \mathcal{M}_1(Y)$ converges to $\mu \in \mathcal{M}_1(Y)$ with respect to the distance $W_1$ if and only if $\mu_n$ converges to $\mu$ in duality with bounded Lipschitz functions and the first moments converge, that is, $\int_{Y} d_Y(\cdot,\bar{y})\,\textrm{d}\mu_n \to \int_{Y} d_Y(\cdot,\bar{y})\,\textrm{d}\mu$ for all $\bar{y} \in Y$ .

Interaction setting. We fix the space of pure strategies U to be a compact metric space with distance $\textrm{d}_U$ . Each agent’s mixed strategy is then described by $\sigma \in \mathcal{M}_1(U)$ . Agents move in $\mathbb{R}^d$ , and we denote with $\|\cdot\|$ the usual Euclidean norm. For $R > 0$ , $B^d({R})$ is the open ball of radius R and centre the origin in $\mathbb{R}^d$ . For $N \in \mathbb{N}$ and $\mathbf{x} = \left(x_1,\dots,x_N\right) \in [\mathbb{R}^d]^N,$ we introduce the scaled norm $\|\cdot\|_N$ defined as

\begin{equation*}\|\mathbf{x}\|_N = \frac1N \sum_{i=1}^N \|x_i\|.\end{equation*}

For $\sigma_1,\sigma_2 \in \mathcal{M}_1(U),$ we set the Kullback–Leibler divergence to be

\begin{equation*}\textrm{KL}(\sigma_1|\sigma_2) \;:=\; \begin{cases}\int_U \log \left( \frac{\textrm{d}\sigma_1}{\textrm{d}\sigma_2} \right)\!\textrm{d}\sigma_1 & \text{if } \sigma_1 \ll \sigma_2, \\[4pt]+ \infty & \text{else,}\end{cases}\end{equation*}

where $\textrm{d}\sigma_1/\textrm{d}\sigma_2$ is the Radon–Nikodym derivative of $\sigma_1$ with respect to $\sigma_2$ .

Throughout the paper, we assume $e \in \textrm{Lip}_b(\mathbb{R}^d \times U; \mathbb{R}^d)$ to be fixed and known. Such a function maps each pure strategy $u \in U$ into a given velocity $e(\cdot,u) \in \mathbb{R}^d$ . We may think of e as a label or dictionary for strategies. The sets of admissible interaction rules J are described by

\begin{equation*}\mathcal{X} = \textrm{Lip}_b\!\left(\mathbb{R}^d \times U \times \mathbb{R}^d \times U\right) \quad \text{and} \quad X = \textrm{Lip}_b\!\left(\mathbb{R}^d \times U \times \mathbb{R}^d\right).\end{equation*}

Here, $\mathcal{X}$ consists of all possible payoff functions, modelling the case where each agent has a complete knowledge of both positions and mixed strategies of the other agents. On the other hand, in what we call the undisclosed setting, each agent only knows the physical locations of the others, and thus in this context, the payoff function J will no longer depend on the other’s strategy, making it possible to restrict the analysis to X.

2 From spatially inhomogeneous evolutionary games to undisclosed fast-reaction dynamics

We describe in this section how we adapt the spatially inhomogeneous evolutionary games model (1.3) of [Reference Ambrosio, Fornasier, Morandotti and Savaré6] to make it more suitable in practice for the task of modelling and inferring interaction rules between rational agents. In Section 2.1, we add entropic regularisation to avoid degeneracy of the agents’ mixed strategies. In Section 2.2, we focus on the more realistic undisclosed setting, where agents are not aware of other agents’ strategy choices and derive a fast-reaction limit, describing the regime where choice of the strategies happens at a much faster time scale than physical movement of the agents.

2.1 Entropic regularisation for spatially inhomogeneous evolutionary games

In the model (1.3), mixed strategies $\sigma_i$ are prone to converging exponentially to very singular distributions so that agents cannot react quickly to changes in the environment and exhibit strong inertia, contradicting the notion of rational agents. This behaviour can be illustrated with a simple example.

Example 2.1 (Slow reaction in spatially inhomogeneous evolutionary games). Let $d=1$ , $U=\{-1,+1\}$ , $e(x,u)=u$ , so pure strategies correspond to moving left or right with unit speed. Let now $J\!\left(x,u,x^{\prime},u^{\prime}\right)=-x \cdot u$ , that is, agents prefer moving towards the origin, independently of the other agents’ locations and actions. For simplicity, we can set $N=1$ and choose $x(0)=2$ , $\sigma(0)=(0.5,0.5)$ , where we identify measures on $U=\{-1,+1\}$ with vectors of the simplex in $\mathbb{R}^2$ . At $t=0,$ the agent perceives strategy $-1$ as more attractive. The mixed strategy $\sigma$ starts pivoting towards this pure strategy and the agent starts moving towards the origin. The numerically simulated trajectory for the original model () is shown in Figure 1 (corresponding to line $\varepsilon = 0$ , $\lambda = 1$ ). We can see how $\sigma$ rapidly converges to the state (0,1). Thus when the agent reaches the origin, it cannot react quickly and there is considerable overshoot.

As a remedy, we propose an adaptation of the model (1.3) by adding an entropic term to the dynamics of the mixed strategies $\sigma_i$ . The update rule (1.3b) modifies $\sigma_i$ towards maximising the average benefit

\begin{equation*} \frac{1}{N} \sum_{j=1}^N \int_{U \times U} J(x_i,u,x_j,u^{\prime})\,\textrm{d} \sigma_i(u)\,\textrm{d} \sigma_j\!\left(u^{\prime}\right).\end{equation*}

To this objective, we will now add the entropy $\varepsilon\cdot \int_U \log\left(\frac{\textrm{d} {\sigma_i}}{\textrm{d} {\eta}}\right)\!\textrm{d} \sigma_i$ where $\eta \in \mathcal{M}_1(U)$ is a suitable ‘uniform’ reference measure and $\varepsilon>0$ is a weighting parameter. This will pull $\sigma_i$ towards $\eta$ and thus prevent exponential degeneration.

Entropic regularisation. The entropic regularisation implies that mixed strategies are no longer general probability measures over U, but they become densities with respect to a reference measure $\eta \in \mathcal{M}_1(U)$ , which we can assume without loss of generality to have full support, that is, $\textrm{spt}(\eta) = U$ . Thus, for a fixed $\varepsilon > 0$ , set

(2.1) \begin{align} \mathcal{S}_{a,b} \;:=\; \left\{ \sigma\colon U \to \mathbb{R}_+, \;\sigma \text{ measurable,} \;\sigma(u) \in [a,b]\text{ } \eta\text{-a.e.},\; \int_U \sigma\,\textrm{d} \eta = 1 \right\},\end{align}

where $0 < a < 1 < b < \infty$ (the bounds a and b are required for technical reasons, see below). For $\lambda > 0$ , the modified entropic dynamics is then given by

(2.2a) \begin{align} \partial_t x_i(t) & = \int_U e(x_i(t),u)\, \sigma_i(t)(u)\,\textrm{d} \eta(u),\qquad\qquad\qquad\qquad\qquad \end{align}
(2.2b) \begin{align} \partial_t \sigma_i(t) & = \lambda \cdot \left[ \frac{1}{N} \sum_{j=1}^N f^J\!\big(x_i(t),\sigma_i(t),x_j(t),\sigma_j(t)\big) +f^\varepsilon\big(\sigma_i(t)\big) \right],\end{align}

where the function $f^J$ now formally needs to be given in terms of densities, instead of measures:

(2.3) \begin{align} & f^J\!\big(x,\sigma,x^{\prime},\sigma^{\prime}\big) \;:=\;\nonumber\\[4pt]& \quad\left[\int_U J(x,\cdot,x^{\prime},u^{\prime})\,\sigma^{\prime}\left(u^{\prime}\right)\!\textrm{d} \eta\left(u^{\prime}\right) - \int_U \int_U J(x,v,x^{\prime},u^{\prime})\,\sigma^{\prime}\left(u^{\prime}\right)\,\sigma(v)\,\textrm{d} \eta\left(u^{\prime}\right)\!\textrm{d} \eta(v) \right] \cdot \sigma.\end{align}

The additional term $f^\varepsilon$ , corresponding to entropy regularisation, is given by:

(2.4) \begin{align} f^\varepsilon(\sigma) & \;:=\; \varepsilon \cdot \left[ -\log(\sigma(\cdot)) + \int_U \log(\sigma(v))\,\sigma(v)\,\textrm{d} \eta(v) \right] \cdot \sigma.\end{align}

In (2.2b), we have explicitly added the factor $\lambda \in (0,\infty)$ to control the relative time scale of the dynamics for mixed strategies and locations.

Example 2.2 (Faster reactions with entropic regularisation). We repeat here Example 2.1 with added entropy. We set $\eta=(0.5,0.5)$ , keeping the symmetry between strategies $+1$ and $-1$ in the regularised system. Numerically simulated trajectories for this setup with different choices of $\lambda$ and $\varepsilon$ are shown in Figure 1. For $(\lambda=1,\varepsilon=0.5)$ , the strategy does not get as close to (0,1) as for $(\lambda=1,\varepsilon=0)$ . Therefore, upon crossing the origin the agent can react quicker, there is less overshoot and the trajectory eventually converges to 0. Finally, for $(\lambda=10,\varepsilon=0.5)$ the agent moves quickly and without overshoot to the origin, quickly adapting the mixed strategy.

Remark 2.3 (Entropic gradient flow). Formally, the contribution of $f^\varepsilon$ to the dynamics (2.2b) corresponds to a gradient flow in the Hellinger–Kakutani metric over $\mathcal{S}_{a,b}$ of the (negative) entropy

(2.5) \begin{align} H(\sigma) \;:=\; \int_U \log(\sigma)\,\sigma\,\textrm{d} \eta\end{align}

of the density $\sigma$ with respect to the reference measure $\eta$ .

Remark 2.4 (Multiple agent species). In the interaction models mentioned so far ((1.1), (1.3), (2.2)), all agents are of the same species (i.e. their movement is specified by the same functions). Often one wishes to model the interaction between agents of multiple species (e.g. predators and prey). The latter case can formally be subsumed into the former by expanding the physical space from $\mathbb{R}^d$ to $\mathbb{R}^{1+d}$ and using the first spatial dimension as ‘species label’, that is, an agent $\hat{x}_i=(n_i,x_i) \in \mathbb{R}^{1+d}$ describes an agent of species $n_i \in \mathbb{Z}$ at position $x_i \in \mathbb{R}^d$ . The interaction force $\hat{f} \,:\, \mathbb{R}^{1+d} \times \mathbb{R}^{1+d} \to \mathbb{R}^{1+d}$ in the expanded version of (1.1) is then given by $\hat{f}\big((n,x),(n^{\prime},x^{\prime})) \;:=\; (0,f_{n,n^{\prime}}(x,x^{\prime}))$ where $f_{n,n^{\prime}} \,:\, \mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}^d$ is the interaction function for species (n,n) and we set the first component of $\hat{f}$ to zero, such that the species of each agent remains unchanged. In an entirely analogous fashion, functions J and e in models (1.3) and (2.2) can be extended.

Well-posedness and mean-field limit of the regularised system. Following [Reference Ambrosio, Fornasier, Morandotti and Savaré6] (see (1.5)), we analyse (2.2) as an interaction system in the Banach space $Y = \mathbb{R}^d \times L^p_\eta(U)$ , $1 \leq p < \infty$ . For $y = (x,\sigma) \in Y$ we set $\|y\|_Y = \|x\| + \|\sigma\|_{L^p_\eta(U)}$ . Define

(2.6) \begin{align} \mathcal{C}_{a,b} \;:=\; \mathbb{R}^d \times \mathcal{S}_{a,b},\end{align}

and for $y, y^{\prime} \in \mathcal{C}_{a,b}$ let us set

(2.7) \begin{align}f \colon \mathcal{C}_{a,b} \times \mathcal{C}_{a,b} \to Y, \quad f\!\left(y,y^{\prime}\right) = \left[f^{e}(x,\sigma), \lambda f^{J,\varepsilon}\left(x,\sigma,x^{\prime},\sigma^{\prime}\right)\right],\end{align}


(2.8) \begin{align}f^{e}(x,\sigma) = \int_U e(x,u)\, \sigma(u)\,\textrm{d} \eta(u) \quad \text{and} \quad f^{J,\varepsilon}\!\!\left(x,\sigma,x^{\prime},\sigma^{\prime}\right) = f^J\!\big(x,\sigma,x^{\prime},\sigma^{\prime}\big) +f^\varepsilon(\sigma),\end{align}

so that (2.2) takes the equivalent form

(2.9) \begin{equation}\partial_t y_i(t) = \frac{1}{N} \sum_{j=1}^N f(y_i(t), y_j(t)) \quad \text{ for }i = 1,\dots, N.\end{equation}

Similar to [Reference Ambrosio, Fornasier, Morandotti and Savaré6], well-posedness of (2.9) relies on the Lipschitz continuity of f and on the following compatibility condition (for the corresponding proofs see Appendix A, Lemmas A.1, A.2 and A.4).

Lemma 2.5 (Compatibility condition). For $J \in \mathcal{X}$ and $\varepsilon > 0$ , let $f^J$ and $f^\varepsilon$ be defined as in (2.3) and (2.4). Then, there exist a,b with $0 < a < 1 < b < \infty$ such that for any $(x,\sigma),(x^{\prime},\sigma^{\prime}) \in \mathbb{R}^d \times \mathcal{S}_{a,b}$ there exists some $\theta>0$ such that

(2.10) \begin{align} \sigma + \theta \lambda \!\left[f^J\!\big(x,\sigma,x^{\prime},\sigma^{\prime}\big) +f^\varepsilon(\sigma) \right] \in \mathcal{S}_{a,b}. \end{align}

Intuitively, for specific choices of the bounds a and b, (2.10) states that moving from $\sigma \in \mathcal{S}_{a,b}$ into the direction generated by $f^J$ and $f^\varepsilon$ , we will always remain within $\mathcal{S}_{a,b}$ for some finite time. Eventually, similar to (1.6), a mean-field limit description of (2.9) is formally given by

(2.11) \begin{align} \partial_t \Sigma(t) + \textrm{Div}\big(b(\Sigma(t)) \cdot \Sigma(t)\big) = 0 \quad \text{with} \quad b(\Sigma(t))(y) \;:=\; \int_{\mathcal{C}_{a,b}}\!\! f\!\left(y,y^{\prime}\right)\!\textrm{d}\Sigma(t)(y^{\prime}).\end{align}

Like (1.6), this is a PDE whose domain is a Banach space and we refer to [Reference Ambrosio, Fornasier, Morandotti and Savaré6] for the technical details. We can then summarise the main result in the following theorem.

Theorem 2.6 (Well-posedness and mean-field limit of entropic model). Let $J \in \mathcal{X}$ , $\lambda, \varepsilon > 0$ , $T<+\infty$ . Let $0 < a < 1 < b < \infty$ in accordance with Lemma 2.5. Then:

  1. 1. Given $\bar{\mathbf{y}}^N = \left(\bar{y}_1^N,\dots, \bar{y}_N^N\right) \in \mathcal{C}_{a,b}^N$ there exists a unique trajectory $\hbox{$\mathbf{y}^N=(y_1^N,\ldots,y_N^N)$} \colon [0,T] \to \mathcal{C}_{a,b}^N$ of class $C^1$ solving (2.9) with $\mathbf{y}^N(0) = \bar{\mathbf{y}}^N$ . In particular, $\Sigma^N(t) := \tfrac{1}{N} \sum_{i=1}^N \delta_{y_i^N(t)}$ provides a solution of (2.11) for initial condition $\bar{\Sigma}^N := \tfrac{1}{N} \sum_{i=1}^N \delta_{\bar{y}_i^N}$ .

  2. 2. Given $\bar{\Sigma} \in \mathcal{M}_1(\mathcal{C}_{a,b}),$ there exists a unique $\Sigma \in C([0,T]; (\mathcal{M}_1(\mathcal{C}_{a,b}), W_1) )$ satisfying in the weak sense the continuity equation (2.11) with initial condition $\Sigma(0) = \bar{\Sigma}$ .

  3. 3. For initial conditions $\bar{\Sigma}_1, \bar{\Sigma}_2 \in \mathcal{M}_1(\mathcal{C}_{a,b})$ and the respective solutions $\Sigma_1$ and $\Sigma_2$ of (2.11) one has the stability estimate

    (2.12) \begin{align} W_1(\Sigma_1(t),\Sigma_2(t)) \leq \exp\big(2\textrm{Lip}(f)\,(t-s)\big) \cdot W_1(\Sigma_1(s),\Sigma_2(s)) \end{align}
    for every $0 \leq s \leq t \leq T$ .

Proof. The theorem follows by invoking Theorem 4.1 from [Reference Ambrosio, Fornasier, Morandotti and Savaré6]. On a more technical level, Theorem 5.3 of [Reference Ambrosio, Fornasier, Morandotti and Savaré6] provides the uniqueness of the solution to (2.11) in a Eulerian sense. We now show that the respective requirements are met.

First, the set $\mathcal{C}_{a,b}$ , (2.6), is a closed convex subset of Y with respect to $\|\cdot\|_Y$ for any $0 < a < b < +\infty$ . Likewise, for any $0 < a < b < +\infty$ the map f driving the evolution (2.9) is Lipschitz continuous: indeed, one can prove that $f^J$ and $f^\varepsilon$ are Lipschitz continuous (see Lemmas A.1 and A.2 in Appendix A) and Lipschitz continuity of $f^e$ follows from the fact that e is Lipschitz continuous and bounded. Furthermore, as a consequence of Lemma 2.5, one can choose a and b so that the extended compatibility condition

\begin{align*} \forall\,R > 0 \, \exists \, \theta > 0\,:\, \quad y,y^{\prime} \in \mathcal{C}_{a,b} \cap B_R(0) \quad \Rightarrow \quad y + \theta\,f\!\left(y,y^{\prime}\right) \in \mathcal{C}_{a,b} \end{align*}

holds, cf. [Reference Ambrosio, Fornasier, Morandotti and Savaré6, Theorem B.1]. Therefore, we may invoke [Reference Ambrosio, Fornasier, Morandotti and Savaré6, Theorem 4.1]. Combining this with [Reference Ambrosio, Fornasier, Morandotti and Savaré6, Theorem 5.3], which is applicable since $L^p_\eta(U)$ is separable and thus so is Y, the result follows.

2.2 Undisclosed setting and fast reactions

In the model (2.2), the decision process of each agent potentially involves knowledge of the mixed strategies of the other agents. Often it is reasonable to assume that there is no such knowledge, which can be reflected in the model by assuming that the payoff J(x, u, x , u ) does not actually depend on u , the other agent’s strategy. Note that J(x, u, x , u ) may still depend on x , the other agent’s location. We call this the undisclosed setting.

Additionally, often it is plausible to assume that the (regularised) dynamic that governs the mixed strategies $(\sigma_i(t))_{i=1}^{N}$ of the agents runs at a much faster time scale compared to the physical motion of the spatial locations $(x_i(t))_{i=1}^N$ . This corresponds to a large value of $\lambda$ in (2.2). Therefore, for the undisclosed setting we study the fast-reaction limit $\lambda \to \infty$ .

The main results of this section are Theorems 2.13 and 2.14 which establish that the undisclosed fast-reaction limit is in itself a well-defined model (with a consistent mean-field limit as $N \to \infty$ ) and that the undisclosed model converges to this limit as $\lambda \to \infty$ .

Undisclosed setting. In the undisclosed setting, the general formulas (2.2) for the dynamics can be simplified as follows:

(2.13a) \begin{align} \partial_t x_i(t) & = \int_U e(x_i(t),u)\, \sigma_i(t)(u)\,\textrm{d} \eta(u),\qquad\qquad\ \ \qquad \end{align}
(2.13b) \begin{align} \partial_t \sigma_i(t) & = \lambda \cdot \left[ \frac{1}{N} \sum_{j=1}^N f^J\!\big(x_i(t),\sigma_i(t),x_j(t)\big) +f^\varepsilon\big(\sigma_i(t)\big) \right] \end{align}

where $f^\varepsilon$ is as given in (2.4) and $f^J$ simplifies to

(2.14) \begin{align} f^J\!\big(x,\sigma,x^{\prime}\big) \;:=\; \left[ J(x,\cdot,x^{\prime}) - \int_U J(x,v,x^{\prime})\,\sigma(v)\,\textrm{d} \eta(v) \right] \cdot \sigma.\end{align}

For finite $\lambda < +\infty,$ the dynamics of this model are still covered by Theorem 2.6.

Fast reactions of the agents. Intuitively, as $\lambda \to \infty$ in (2.13), at any given time t the mixed strategies $(\sigma_i(t))_{i=1}^N$ will be in the unique steady state of the dynamics (2.13b) for fixed spatial locations $(x_i(t))_{i=1}^N$ . For given locations $\mathbf{x} = (x_1,\ldots,x_N) \in [\mathbb{R}^d]^N$ this steady state is given by

(2.15a) \begin{align} \sigma_{i}^J(\mathbf{x}) &\equiv \sigma_{i}^J(x_1,\ldots,x_N) \;:=\; \frac{\exp\left(\dfrac{1}{\varepsilon N}\sum_{j=1}^N J\!\left(x_i,\cdot,x_j\right)\right)}{ \int_U \exp\left(\dfrac{1}{\varepsilon N}\sum_{j=1}^N J\!\left(x_i,v,x_j\right)\right)\!\textrm{d} \eta(v) }. \end{align}

(This computation is explicitly shown in the proof of Theorem 2.14.) The spatial agent velocities associated with this steady state are given by

(2.15b) \begin{align} v_{i}^J(\mathbf{x}) &\equiv v_i^J\!\left(x_1,\dots,x_N\right) \;:=\; \int_U e(x_i,u)\, \sigma_{i}^J(x_1,\ldots,x_N)(u)\,\textrm{d} \eta(u)\end{align}

and system (2.13) turns into a purely spatial ODE in the form

(2.15c) \begin{align} \partial_t x_i(t) &= v_i^J(x_1(t),\dots,x_N(t)) \quad \text{for } i = 1,\dots,N.\end{align}

Unlike in Newtonian models over $\mathbb{R}^d$ , (1.1), here the driving velocity field $v_i^J(x_1(t),\ldots,x_N(t))$ is not a linear superposition of the contributions by each $x_j(t)$ . This nonlinearity is a consequence of the fast-reaction limit and allows for additional descriptive power of the model that cannot be captured by the Newton-type model (1.1). This is illustrated in the two subsequent Examples 2.7 and 2.8.

Example 2.7 (Describing Newtonian models as undisclosed fast-reaction models). Newtonian models can be approximated by undisclosed fast-reaction entropic game models. We give a sketch for this approximation procedure. For a model as in (1.1), choose $U \subset \mathbb{R}^d$ such that it contains the range of f (for simplicity we assume that it is compact), let $e(x,u) \;:=\; u$ and then set

\begin{equation*}J\!\left(x,u,x^{\prime}\right) \;:=\; -\|u-f\!\left(x,x^{\prime}\right)\!\|^2.\end{equation*}

Accordingly, the stationary mixed strategy of agent i is given by

\begin{align*}\sigma_{i}^J(\mathbf{x})(u) & = \mathcal{N}\left({\exp\left(-\frac{1}{\varepsilon N}\sum_{j=1}^N \|u-f\!\left(x_i,x_j\right)\!\|^2\right)}\right),\end{align*}

where $\mathcal{N}\left({\cdot}\right)$ denotes the normalisation operator for a density with respect to $\eta$ . Observe now that

\begin{align*}\frac{1}{N} \sum_{j=1}^N \|u-f\!\left(x_i,x_j\right)\!\|^2 &= \|u\|^2 - \frac2N \sum_{j=1}^N u \cdot f\!\left(x_i,x_j\right) + \frac1N \sum_{j=1}^N \|{\kern1pt}f\!\left(x_i,x_j\right)\!\|^2\end{align*}


\begin{align*}\left\|u-\frac{1}{N}\sum_{j=1}^N f\!\left(x_i,x_j\right)\right\|^2 &= \|u\|^2 - \frac2N \sum_{j=1}^N u \cdot f\!\left(x_i,x_j\right) + \frac{1}{N^2}\!\left\| \sum_{j=1}^N f\!\left(x_i,x_j\right)\right\|^2.\end{align*}

Hence, since the terms not depending on u are cancelled by the normalisation, we also have

\begin{align*}\sigma_{i}^J(\mathbf{x})(u) & = \mathcal{N}\left({\exp\left(-\frac{1}{\varepsilon}\left\|u-\frac{1}{N}\sum_{j=1}^N f(x_i,x_j)\right\|^2\right)}\right).\end{align*}

So the maximum of the density $\sigma_{i}^J(\mathbf{x})$ is at $\tfrac{1}{N}\sum_{j=1}^N f(x_i,x_j)$ , which is the velocity imposed by the Newtonian model. For small $\varepsilon$ , $\sigma_{i}^J(\mathbf{x})$ will be increasingly concentrated around this point, and hence, the game model will impose a similar dynamic. Numerically, this is demonstrated in Figure 2 for a 1-dimensional Newtonian model as in (1.1) driven by

(2.16) \begin{align}f(x,x^{\prime}) = - x - \frac{\tanh\left(5\left(x^{\prime}-x\right)\right)}{\left(1+\|x^{\prime}-x\|\right)^2}.\end{align}

Figure 2. Approximation of a Newtonian model by an undisclosed fast-reaction entropic game model. Solid lines: original model, dashed lines: approximation. The Newtonian model is driven by (2.16), the approximation procedure is described in Example 2.7. The approximation becomes more accurate as $\varepsilon$ decreases.

Example 2.8 (Undisclosed fast-reaction models are strictly richer than Newtonian models). In the previous example, each agent j tried to persuade agent i to move with velocity $f(x_i,x_j)$ . Deviations from this velocity were penalised in the payoff function with a quadratic function. By minimising the sum of these functions, the compromise of the linear average of all $f(x_i,x_j)$ then yields the best payoff. By picking different penalties we may obtain other interesting behaviour that cannot be described by Newtonian systems.

As above, let U be a (sufficiently large) compact subset of $\mathbb{R}^d$ , $e(x,u) \;:=\; u$ . Let $g \,:\, \mathbb{R}^d \to \mathbb{R}$ be a Lipschitz non-negative ‘bump function’ with compact support, that is, g is maximal at 0 with $g(0)=1$ , and $g(u)=0$ for $\|u\|\geq \delta$ for some small $\delta >0$ . Then we set

\begin{equation*}J\!\left(x,u,x^{\prime}\right) \;:=\; \left(C-\|x-x^{\prime}\|\right) \cdot g\left(u-\frac{x^{\prime}-x}{\|x^{\prime}-x\|} \right)-C \cdot g(u),\end{equation*}

where $C>0$ is a suitable, sufficiently large constant. Then approximately, the function $u \mapsto \frac{1}{N}\sum_j J(x_i,u,x_j)$ is maximal at $u=\frac{x_j-x_i}{\|x_j-x_i\|}$ where j is the agent that is closest to i (if it is closer than C and not closer than $\delta$ , in particular the last term in J is added to avoid that $u=0$ is the maximiser). Therefore, approximately, we have created a model, where agents are attracted with unit speed by their nearest neighbours, but not by other agents. Such a nonlinear trade-off between the influences of the other agents is not obtainable by a Newtonian model.

Remark 2.9 (Fast-reaction limit in the fully general case). For the fully general setting (i.e. J being also a function of u), the study of the fast-reaction regime is a much harder problem as the stationary state of mixed strategies (for fixed spatial locations) often depends on the initial mixed strategies $(\sigma_i)_{i=1}^N$ , and thus, the fast-reaction limit is a ‘genuine’ quasi-static evolution problem. We refer to [Reference Agostiniani and Rossi2, Reference Scilla and Solombrino66, Reference Scilla and Solombrino67] for derivation and well-posedness results of quasi-static evolutions of critical points of nonconvex energies on the Euclidean space by means of vanishing-viscosity methods. A similar analysis is in the course of development for nonconvex energies on Hilbert spaces [Reference Agostiniani, Rossi and Savaré1]. Data-driven evolutions of critical points have been considered in [Reference Almi, Fornasier and Huber4].

Well-posedness and mean-field limit of the undisclosed fast-reaction system. Similar to the full model (2.2), we are also interested in a mean-field limit of the undisclosed fast-reaction limit as the number of agents tends to infinity, $N \to \infty$ . A formal limiting procedure leads to the equation

(2.17) \begin{equation} \partial_t \mu(t) +\textrm{div} \left( v^J(\mu(t))\cdot \mu(t) \right) = 0\end{equation}

where for $x \in \mathbb{R}^d$ and $\nu \in \mathcal{M}_1(\mathbb{R}^d)$ we set

(2.18) \begin{align} v^J(\nu)(x) & \;:=\; \int_U e(x,u)\sigma^J(\nu)(x, u)\,d\eta(u) \qquad\qquad\,\,\end{align}
(2.19) \begin{align} \sigma^J(\nu)(x,\cdot) & \;:=\; \dfrac{\exp\left(\dfrac{1}{\varepsilon}\int_{\mathbb{R}^d} J(x,\cdot,x^{\prime})\,d\nu(x^{\prime})\right)}{ \int_U \exp\left(\dfrac{1}{\varepsilon}\int_{\mathbb{R}^d} J(x,v,x^{\prime})\,d\nu(x^{\prime})\right)\!\textrm{d} \eta(v)}.\end{align}

Given $\mathbf{x} = \left(x_1,\dots,x_N\right) \in [\mathbb{R}^d]^N$ and setting $\mu^N = \frac1N \sum_{j=1}^N \delta_{x_j}$ , (2.15), (2.18) and (2.19) are related through

\begin{align*}\sigma_i^J(\mathbf{x}) = \sigma_i^J\!\left(x_1,\dots,x_N\right) = \sigma^J\!\left(\mu^N\right)(x_i,\cdot) \quad \text{and} \quad v_i^J(\mathbf{x}) = v_i^J\!\left(x_1,\dots,x_N\right) = v^J\!\left(\mu^N\right)(x_i).\end{align*}

The key ingredient for the study of (2.15) and its limiting behaviour is to establish Lipschitz continuity of the fast-reaction equilibrium strategies and velocities with respect to payoff function and particle locations.

Lemma 2.10. Let $J, J^{\prime} \in X$ and consider $M > 0$ such that $M \geq \|J\|_{\infty} + \textrm{Lip}(J)$ and $M \geq \|J^{\prime}\|_{\infty} + \textrm{Lip}(J^{\prime})$ . Let $\mu, \mu^{\prime} \in \mathcal{M}_1(\mathbb{R}^d)$ and $x, x^{\prime} \in \mathbb{R}^d$ . Then:

  1. 1. There exists $C = C(M, \varepsilon)$ such that, for every $u \in U$ ,

    (2.20) \begin{align} \left| \sigma^{J}(\mu)(x,u) - \sigma^{J^{\prime}}(\mu^{\prime})(x^{\prime},u) \right| &\leq C\left( W_1(\mu,\mu^{\prime}) + \|J - J^{\prime}\|_\infty + \|x-x^{\prime}\|\right) \end{align}
    and $1/C < \sigma^{J}(\mu)(x,u) < C$ .
  2. 2. There exists $C = C(M, e, \varepsilon)$ such that

    (2.21) \begin{align} \left\| v^{J}(\mu)(x) - v^{J^{\prime}}(\mu^{\prime})(x^{\prime}) \right\| &\leq C\!\left(W_1(\mu,\mu^{\prime}) + \|J - J^{\prime}\|_\infty + \|x-x^{\prime}\|\right). \end{align}

Proof. Let us define two continuous functions $g, g^{\prime} \colon U \to \mathbb{R}$ as

\begin{equation*} \begin{aligned} g(u) = \int_{\mathbb{R}^d} J(x, u, y)\,d\mu(y) \quad \text{and} \quad g^{\prime}(u) = \int_{\mathbb{R}^d} J^{\prime}(x^{\prime}, u, y)\,d\mu^{\prime}(y) \end{aligned} \end{equation*}

For every $u \in U$ , using $M \geq \|J\|_{\infty}$ and $M \geq \|J^{\prime}\|_{\infty}$ , we immediately obtain the global bounds $-M \leq g(x) \leq M$ and $-M \leq g^{\prime}(x) \leq M$ . Using the triangle inequality and the dual definition for $W_1$ in (1.9), we also estimate

(2.22) \begin{equation} |g(u)-g^{\prime}(u)| \leq C\left( W_1(\mu,\mu^{\prime}) + \|J - J^{\prime}\|_\infty + \|x-x^{\prime}\|\right) \end{equation}

for $C = C(M)$ . Recall now that

\begin{equation*} \sigma^{J}(\mu)(x,u) = \frac{\exp( g(u)/\varepsilon )}{\int_U \exp\left( g(v)/\varepsilon \right)\,d\eta(v)} \quad \text{and} \quad \sigma^{J^{\prime}}(\mu^{\prime})(x^{\prime},u) = \frac{\exp( g^{\prime}(u)/\varepsilon )}{\int_U \exp\left( g^{\prime}(v)/\varepsilon \right)\,d\eta(v)}. \end{equation*}

Thus, (2.20) follows combining global boundedness of g, g with (2.22), the uniform bounds $1/C < \sigma^{J}(\mu)(x,u) < C$ are obtained in the same way, while (2.21) follows from (2.20) combined with $e \in \textrm{Lip}_b(\mathbb{R}^d \times U; \mathbb{R}^d)$ .

For the discrete fast-reaction system, the following Lemma adapts the estimate (2.21) as one in terms of discrete particle locations and their velocities.

Lemma 2.11 (Map to undisclosed fast-reaction agent velocity is Lipschitz continuous). For any $N > 0$ , consider the map

\begin{equation*} \mathbf{v}^J \colon \left[\mathbb{R}^d\right]^N \to \left[\mathbb{R}^d\right]^N, \quad \left(x_1,\dots,x_N\right) \mapsto \left(v_1^J\!\left(x_1,\dots,x_N\right), \dots, v_N^J\!\left(x_1,\dots,x_N\right)\right) \end{equation*}

associated with the fast-reaction ODE (2.15). Then, $\mathbf{v}^J$ is Lipschitz continuous under the distance induced by $\|\cdot\|_N$ , with Lipschitz constant $L = L(J, e, \varepsilon)$ .

Proof. Fix any $\mathbf{x},\mathbf{x}^{\prime} \in \left[\mathbb{R}^d\right]^N$ and define $\mu = \frac{1}{N} \sum_{j=1}^N \delta_{x_i}$ and $\mu^{\prime} = \frac{1}{N} \sum_{j=1}^N \delta_{x^{\prime}_i}$ . Then, by Lemma 2.10, with $J^{\prime} = J$ , $x = x_i$ and $x^{\prime} = x^{\prime}_i$ , we have

\begin{equation*} \left\| v^{J}(\mu)(x_i) - v^{J}(\mu^{\prime})\left(x^{\prime}_i\right) \right\| \leq C\left( W_1(\mu,\mu^{\prime}) + \|x_i-x^{\prime}_i\|\right) \end{equation*}

for every $i = 1,\dots, N$ , with $C = C(J,e,\varepsilon)$ . Using the definition of $W_1$ in (1.8), we observe

\begin{equation*} W_1(\mu,\mu^{\prime}) \leq \frac{1}{N}\sum_{j=1}^N \|x_i - x^{\prime}_i\| = \|x - x^{\prime}\|_N \end{equation*}

so that

\begin{align} &\|\mathbf{v}^J(\mathbf{x})-\mathbf{v}^J\!\left(\mathbf{x}^{\prime}\right)\!\|_N =\frac{1}{N}\sum_{i=1}^N \| v_i^J\!\left(x_1,\dots,x_N\right) - v_i^J\!\left(x^{\prime}_1,\dots,x^{\prime}_N\right) \| \nonumber \\ &\quad = \frac{1}{N}\sum_{i=1}^N\left\| v^{J}(\mu)(x_i) - v^{J}(\mu^{\prime})\left(x^{\prime}_i\right) \right\|\leq \frac{C}{N} \sum_{i=1}^N \left( \|x-x^{\prime}\|_N + \|x_i-x^{\prime}_i\| \right) = 2C \|\mathbf{x}-\mathbf{x}^{\prime}\|_N \nonumber \end{align}

which is the sought-after estimate.

After clarifying what we mean by a solution of (2.17), we summarise the mean-field result in Theorem 2.13, whose proof then builds on Lipschitz continuity of the velocity field in the discrete system and follows by fairly standard arguments (see, e.g. [Reference Cañizo, Carrillo and Rosado12, Reference Carrillo, Choi and Hauray14]).

Definition 2.12. We say a curve $\mu \in C([0,T]; (\mathcal{M}_1(\mathbb{R}^d), W_1) )$ solves (2.17) if $\mu(t)$ has uniformly compact support for $t \in [0,T]$ and

\begin{align*} \frac{d}{dt} \int_{\mathbb{R}^d} \phi(x)\textrm{d}\mu(t)(x) = \int_{\mathbb{R}^d} \nabla\phi(x) \cdot v^J(\mu(t))(x)\,\textrm{d}\mu(t)(x) \quad \text{for every $\phi \in C^\infty_c(\mathbb{R}^d)$}.\end{align*}

Theorem 2.13 (Well-posedness and mean-field limit of undisclosed fast-reaction model). Let $J \in X$ , $\varepsilon > 0$ , $0 < \bar{R} < +\infty$ , and $0 < T < +\infty$ . Define $R = \bar{R} + T \cdot \|e\|_\infty$ . Then:

  1. 1. Given $\bar{\mathbf{x}}^N = \left(\bar{x}_1^N,\dots, \bar{x}_N^N\right) \in [B^d({\bar{R}})]^N$ there exists a unique trajectory $\mathbf{x}^N=\left(x_1^N,\ldots,x_N^N\right) \colon [0,T] \to [B^d({R})]^N$ of class $C^1$ solving (2.15c) with $\mathbf{x}^N(0) = \bar{\mathbf{x}}^N$ . In particular, $\mu^N(t) \;:=\; \tfrac{1}{N} \sum_{i=1}^N \delta_{x_i^N(t)}$ provides a solution of (2.17) for initial condition $\bar{\mu}^N \;:=\; \tfrac{1}{N} \sum_{i=1}^N \delta_{\bar{x}_i^N}$ .

  2. 2. Given $\bar{\mu} \in \mathcal{M}_1(B^d({\bar{R}}))$ there exists a unique $\mu \in C([0,T]; (\mathcal{M}_1(B^d({R})), W_1) )$ satisfying in the weak sense the continuity equation (2.17) with initial condition $\mu(0)=\bar{\mu}$ .

  3. 3. For initial conditions $\bar{\mu}_1, \bar{\mu}_2 \in \mathcal{M}_1(B^d({\bar{R}}))$ and the respective solutions $\mu_1$ and $\mu_2$ of (2.17) one has the stability estimate

    (2.23) \begin{align} W_1(\mu_1(t),\mu_2(t)) \leq \exp(C\,(t-s)) \cdot W_1(\mu_1(s),\mu_2(s)) \end{align}
    for every $0 \leq s \leq t \leq T$ , with $C = C(J, e, \varepsilon, \bar{R}, T)$ .

Proof outline. The proof is rather standard and builds on the Lipschitz continuity of the finite agents system (2.15). We highlight the main steps without in-depth details and refer to Proposition 2.1 and Theorem 2.4 of [Reference Bongini, Fornasier, Hansen and Maggioni10], and references therein, for further information.

Part 1: finite agents setting. For given $\bar{\mathbf{x}}^N = (\bar{x}_{1}^N,\dots, \bar{x}_{N}^N) \in [B^d({\bar{R}})]^N$ , using the Lipschitz continuity provided in Lemma 2.11, there exists a unique curve $\mathbf{x}^N \colon [0,T] \to [\mathbb{R}^d]^N$ of class $C^1$ solving (2.15) with $\mathbf{x}^N(0) = \bar{\mathbf{x}}^N$ . A direct estimate provides $\|\partial x_i^N(t)\| \leq \|e\|_\infty$ , so that $\|x_i^N(t)\| \leq \bar{R} + T \cdot \|e\|_\infty$ for every $t \in [0,T]$ , $i = 1,\dots,N$ . Furthermore, setting $\mu^N(t) \;:=\; \tfrac{1}{N} \sum_{i=1}^N \delta_{x^N_i(t)}$ one can explicitly verify that $\mu^N$ solves (2.17) with initial condition $\bar{\mu}^N \;:=\; \tfrac{1}{N} \sum_{i=1}^N \delta_{\bar{x}^N_i}$ . This establishes point .

Part 2: mean-field solution as limit of finite agents solutions. Fix $\bar{\mu} \in \mathcal{M}_1(B^d({\bar{R}}))$ . For $N > 0$ , let $(\bar{\mu}^N)_N \subset \mathcal{M}_1(B^d({\bar{R}}))$ be a sequence of empirical measures such that $W_1(\bar{\mu}^N,\bar{\mu}) \to 0$ as $N \to \infty$ and let $(\mu^N)_N$ be the respective sequence of solutions of (2.17) with initial conditions $\bar{\mu}^N$ . This sequence of curves $(\mu^N)_{N \in \mathbb{N}} \subset C([0,T]; (\mathcal{M}_1(B^d({R})), W_1))$ is equicontinuous and equibounded. Therefore, an application of the Ascoli–Arzelà theorem provides a cluster point $\mu \in C([0,T]; (\mathcal{M}_1(B^d({R})), W_1))$ such that, up to a subsequence, we have

\begin{equation*} \lim_{N\to\infty} W_1\!\left(\mu^N(t), \mu(t)\right) = 0 \quad \text{ uniformly for } t \in [0,T]. \end{equation*}

Invoking Lemma 2.10 for $x=x^{\prime}$ and $J=J^{\prime}$ this implies $\| v^J\!\left(\mu^N(t)\right)(x) - v^J(\mu(t))(x)\| \to 0$ uniformly in $t \in [0,T]$ , $x \in B^d({R})$ as $N \to \infty$ . Consequently, the cluster point $\mu$ is a solution to (2.17) with $\bar{\mu}$ as initial condition. This establishes the existence part of point .

Part 3: stability estimates. For fixed N, a stability estimate of the form $\|\mathbf{x}^N_1(t)-\mathbf{x} ^N_2(t)\|_N \leq \exp(C\,(t-s)) \cdot \|\mathbf{x}^N_1(s)-\mathbf{x} ^N_2(s)\|_N$ for solutions to the discrete system (2.15) follows quickly from Grönwall’s lemma. Since the $\|\cdot\|_N$ -norm between two point clouds provides an upper bound for the $W_1$ distance between the respective empirical measures, this would provide point for empirical measures. The extension to arbitrary measures can be done as in the proof of Theorem 2.4 in [Reference Bongini, Fornasier, Hansen and Maggioni10]. This then provides uniqueness of the solution $\mu$ to (2.17) for initial condition $\bar{\mu} \in \mathcal{M}_1(B^d({\bar{R}}))$ , completing point .

Finally, we establish convergence of (2.13) to (2.15) (and their respective mean-field versions) as $\lambda \to \infty$ . The proof is given in Section A.2 of the Appendix, a simple numerical example is shown in Figure 3.

Figure 3. Convergence of the undisclosed model to the fast-reaction limit as $\lambda \to \infty$ . The payoff function encourages agents to move to the origin, but penalises small pairwise distances (see Example 4.1 for a detailed description). With $\lambda$ small, agents cannot adjust their strategies fast enough. They overshoot the origin and completely fail to avoid each other. The situation improves as $\lambda$ increases. For $\lambda=100,$ the model closely resembles the fast-reaction limit, in accordance with (2.24).

Theorem 2.14 (Convergence to fast-reaction limit in undisclosed setting as $\lambda \to \infty$ ).

  1. 1. Discrete setting: For initial positions $\mathbf{x}(0)=(x_1(0),\ldots,x_N(0)) \in [B^d({\bar{R}})]^N$ and initial mixed strategies $\boldsymbol{\sigma}(0)=(\sigma_1(0),\ldots,\sigma_N(0)) \in \mathcal{S}_{a,b}^N$ let $\mathbf{x}(t) = (x_1(t),\ldots,x_N(t))$ and $\boldsymbol{\sigma}(t)=(\sigma_1(t),\ldots,\sigma_N(t))$ be the solution to the undisclosed model (2.13). For the same initial positions, let $\mathbf{x}^{\ast \ast}(t)=(x^{\ast \ast}_1(t),\ldots,x^{\ast \ast}_N(t))$ be the solution to the undisclosed fast-reaction model (2.15) and let $\boldsymbol{\sigma}^{\ast \ast}(t)=(\sigma_1^{\ast \ast}(t),\ldots,\sigma_N^{\ast \ast}(t))$ with $\sigma_i^{\ast \ast}(t)=\sigma^J_{i}(\mathbf{x}^{\ast \ast}(t))$ be the corresponding fast-reaction strategies. Then, there exists $C = C(a,b,\bar{R},J,e)$ (independent of N and i) such that for all $t \in [0,\infty)$ ,

    (2.24) \begin{align} \|\mathbf{x}(t)-\mathbf{x}^{\ast \ast}(t)\|_N \leq \frac{C}{\sqrt{\lambda}} \cdot \exp(t \cdot C) \end{align}
    (2.25) \begin{align} \|\sigma_i(t)-\sigma_i^{\ast\ast}(t)\|_{L^2_\eta(U)} \leq \frac{C}{\sqrt{\lambda}} \cdot \exp(t \cdot C) + \left[C \left[\dfrac{1}{\lambda}+ \exp\left(-\dfrac{\lambda\,t}{C}\right)\right]\right]^{1/2}. \end{align}
    So on any compact time horizon [0, T], we find that $\|\mathbf{x}(t) - \mathbf{x}^{\ast \ast}(t)\|_N \to 0$ uniformly in time as $\lambda \to \infty$ . $\boldsymbol{\sigma}$ converges to $\boldsymbol{\sigma}^{\ast \ast}$ uniformly in time on the interval $[\tau,T]$ for any $\tau>0$ . Near $t=0$ we cannot expect uniform convergence of $\boldsymbol{\sigma}$ , since by initialisation it can start at a finite distance from $\boldsymbol{\sigma}^{\ast \ast}(0),$ and thus, it takes a brief moment to relax to the vicinity of the fast-reaction state.
  2. 2. Mean-field setting: For an initial configuration $\bar{\Sigma} \in \mathcal{M}_1(\mathcal{C}_{a,b}),$ let $\Sigma$ be the solution to the entropic mean-field model (2.11) for a undisclosed J with $\lambda \in (0,\infty)$ . Let $P \,:\, \mathcal{C}_{a,b} \to \mathbb{R}^d$ , $(x, \sigma) \mapsto x$ be the projection from $\mathcal{C}_{a,b}$ to the spatial component. Set $\bar{\mu} \;:=\; P_\sharp \bar{\Sigma}$ and let $\mu$ be the solution to the fast-reaction mean-field model (2.17) with initial condition $\mu(0)=\bar{\mu}$ .

    Then for the same C as in (2.24), one has

    (2.26) \begin{align} W_1(\mu(t),P_\sharp \Sigma(t)) \leq \frac{C}{\sqrt{\lambda}} \cdot \exp(t \cdot C). \end{align}

Remark 2.15. The statement about the mean-field equations can be expanded further: From the fast-reaction solution $\mu$ on $\mathcal{M}_1(\mathbb{R}^d),$ one can construct a ‘lifted trajectory’ $\hat{\Sigma}$ on $\mathcal{M}_1(\mathcal{C}_{a,b})$ , intuitively by attaching to each mass particle in $\mu$ at x its corresponding fast-reaction mixed strategy $\sigma^J(\mu)(x,\cdot)$ , (2.19). Using the bounds (2.24) and (2.25), the continuity properties of $\sigma^J(\mu)(x,\cdot)$ (see Lemma 2.10) and arguing as in the proof of (2.26) one can then establish a $W_1$ -bound between $\Sigma$ and $\hat{\Sigma}$ .

3 Inference for entropic evolutionary games

After having introduced a new class of models for interacting agents, we now turn to the question of how the payoff function J, that parametrises a model, can be inferred from data. Some remarks are in order before we proceed.

3.1 Discussion

Our motivation is as follows: we observe a set of rational agents, for example, pedestrians in a confined space, and want to learn about their interaction rules. As argued above (e.g. in Examples 2.7 and 2.8), undisclosed fast-reaction entropic games can be used to parametrise a rich class of interaction behaviours, in particular subsuming Newtonian models. We may therefore hope that, when feeding our observations into an inference functional for J, that by analysing and interpreting the resulting payoff function we can learn something about the interaction rules of the agents. We focus on the inference of J and assume that U, e and $\varepsilon$ are known. We will demonstrate during the numerical examples (Section 4) that it is usually possible to make plausible choices for U, e and that qualitatively equivalent choices yield qualitatively equivalent results for the inferred J. One may also assume that $\varepsilon=1$ , since re-scaling $\varepsilon$ can be compensated by re-scaling J in the same way, see (2.15).

In Section 3.2, we discuss a differential inference functional, proposed in analogy to (1.7), for the full entropic game model, that is, without the undisclosed assumption and not in the fast-reaction limit. For this we assume that we are able to observe the full dynamics, that is, physical locations and velocities of the agents, as well as their mixed strategies and temporal derivatives, $(x_i,\partial_t x_i,\sigma_i,\partial_t \sigma_i)_{i=1}^N$ and propose a functional that seeks to infer J by comparing its predictions with the observed $\partial_t \sigma_i$ .

In many cases, it may not be possible to observe the mixed strategies $\sigma_i$ , let alone their temporal derivatives $\partial_t \sigma_i$ , since these may correspond to internal states of the agents that we, as external observers, cannot perceive. In Section 3.3, we turn to the undisclosed fast-reaction setting, where physical velocities and mixed strategies are functions of current physical locations, and show that there one can still perform inference with limited observations. In Section 3.3.1, we propose an inference functional, again in analogy to (1.7), that only requires the observation of the physical locations $x_i$ and velocities $\partial_t x_i$ . Although it is nonconvex, we observe in our numerical examples that it provides plausible payoff functions J, indicating that nonconvexity does not seem to be a practical issue.

In Section 3.3.2, we propose a convex inference functional that requires knowledge of the mixed strategies $\sigma_i$ , but not of their temporal derivatives. This may be practical in two cases: In Section 4.1, we propose a scheme to provide such data from observed physical velocities, intuitively by ‘inverting’ the approximation scheme of Example 2.7. Also, mixed strategies could represent a stochastic movement of agents and the ‘smooth’ physical velocity $\partial_t x_i$ is merely the result of an averaging over a smaller time scale. Should we indeed be able to observe the fluctuating agent dynamics at small time scales, this could be used to approximate the mixed strategies. This was one of our motivations for providing Example 4.3.

For both functionals, there are natural candidates for the mean-field limit. We analyse these two functionals in more detail in Section 3.3.3, providing an estimate of the approximation quality (Theorem 3.3), as well as existence of minimisers and consistency in the mean-field limit (Theorem 3.6).

Now, we formalise our notion of admissible observations for inference and a notion of consistency of observations in the mean-field limit.

Assumption 3.1 (Admissible observations).

  1. 1. Discrete observation: For a fixed number of agents N, a time horizon $T \in (0,\infty)$ , and some radius $R \in (0,\infty)$ we observe the agents’ physical paths $\textbf{x}^N \;:=\; \left(x^N_1,\ldots,x^N_N\right) \in C^1([0,T],B^d(R))^N$ with velocities $\textbf{v}^N \;:=\; \left(v^N_1,\ldots,v^N_N\right)$ , $v^N_i = \partial_t x^N_i$ .

    Optionally, in addition we may also observe the agents’ mixed strategies $\boldsymbol{\sigma}^N \;:=\; \left(\sigma^N_1,\ldots,\sigma^N_N\right) \in C([0,T],\mathcal{S}_{a,b})^N$ for some bounds $0<a\leq b<\infty$ in the definition of $\mathcal{S}_{a,b}$ , see (2.1). The mixed strategies are consistent with the observed velocities, that is,

    (3.1) \begin{equation}v_i^N(t) = \int_U e\left(x_i^N(t),u\right)\sigma_i^N(t)(u)\,\textrm{d} \eta(u).\end{equation}
    Note: The assumption that the agent’s velocity is exactly consistent with their mixed strategy may seem unrealistic, as the velocity may be subject to other external influences and noise. However, we will subsequently often assume that the mixed strategies are not observed directly, but are only inferred from the velocities. In this case, satisfying assumption (3.1) is quite natural. In other cases, assumption (3.1) will be used in Theorem 3.3 to bound the error on trajectories based on the error on strategies. If (3.1) only holds approximately, an additional corresponding error term will appear there.
  2. 2. Consistent mean-field behaviour: For fixed T, R, for an increasing sequence of N we make observations as specified in part , and there is a limit observation $t \mapsto \mu^\infty(t) \in \mathcal{M}_1(B^d(R))$ with a velocity field $t \mapsto v^\infty(t)$ such that

    (3.2) \begin{align} W_1\!\left(\mu^N(t),\mu^\infty(t)\right) \rightarrow 0 \quad \text{uniformly in $t \in [0,T]$ where}\quad \mu^N(t) \;:=\; \frac1N \sum_{i=1}^N \delta_{x^N_i(t)},\end{align}
    (3.3) \begin{align} \int_0^T \frac{1}{N} \sum_{i=1}^N \left \langle \phi\left(t,x^N_i(t)\right), v^N_i(t) \right \rangle \,\textrm{d} t \rightarrow \int_0^T \int_{\mathbb{R}^d} \left \langle \phi(t,x) , v^\infty(t)(x) \right \rangle \,\textrm{d} \mu^\infty(t)(x)\,\textrm{d} t \end{align}
    for all $\phi \in \left[C\!\left([0,T] \times \mathbb{R}^d\right)\right]^d$ , and
    (3.4) \begin{align} \int_0^T \frac{1}{N} \sum_{i=1}^N \|v^N_i(t)\|^2 \,\textrm{d} t \rightarrow \int_0^T \int_{\mathbb{R}^d} \|v^\infty(t)(x)\|^2 \,\textrm{d} \mu^\infty(t)(x)\,\textrm{d} t<\infty.\end{align}
    (3.2) implies that physical locations are consistent, (3.3) implies that observed velocities converge in a weak sense, and (3.4) implies that they are consistent. Intuitively, for (3.4) to hold, mass particles that converge to the same limit location x as $N \to \infty$ need to converge to the same velocity, otherwise Jensen’s strict inequality contradicts (3.4).

    If we also observe mixed strategies in part , then the bounds $0<a \leq b<\infty$ are uniform in N and there also is a mixed strategy mean-field $(t,x) \mapsto \sigma^\infty(t)(x) \in \mathcal{S}_{a,b}$ such that

(3.5) \begin{multline} \int_0^T \frac{1}{N} \sum_{i=1}^N \int_U \phi\left(t,x^N_i(t),u\right) \cdot \sigma^N_i(t)(u)\, \textrm{d} \eta(u) \,\textrm{d} t\\ \rightarrow \int_0^T \int_{\mathbb{R}^d} \int_U \phi(t,x,u) \cdot \sigma^\infty(t)(x)(u) \textrm{d} \eta(u)\,\textrm{d} \mu^\infty(t)(x)\,\textrm{d} t \end{multline}

for all $\phi \in \left[C\!\left([0,T] \times \mathbb{R}^d \times U\right)\right]^d$ and

(3.6) \begin{align} & \int_0^T \frac{1}{N} \sum_{i=1}^N \int_U \sigma^N_i(t)(u)\log\!\left(\sigma^N_i(t)(u)\right)\, \textrm{d} \eta(u) \,\textrm{d} t\nonumber\\[4pt] & \qquad\qquad \rightarrow \int_0^T \int_{\mathbb{R}^d} \int_U \sigma^\infty(t)(x)(u)\log(\sigma^\infty(t)(x)(u)) \textrm{d} \eta(u)\,\textrm{d} \mu^\infty(t)(x)\,\textrm{d} t.\end{align}

These are direct equivalents of (3.3) and (3.4).

In particular, observations are admissible when they were generated by an entropic game model with some ground truth payoff J.

Lemma 3.2. Let $J \in X$ , $T > 0$ , $\bar{R}>0$ , $N \in \mathbb{N}$ .

  1. 1. For initial locations $\bar{\textbf{x}}^N=\left(\bar{x}^N_1,\ldots,\bar{x}^N_N\right) \in [B^d(\bar{R})]^N$ the induced solution to (2.15) with corresponding velocities and mixed strategies provides an admissible discrete observation in the sense of Assumption 3.1, part , for $R=\bar{R}+\|e\|_\infty \cdot T$ .

  2. 2. Let $\bar{\mu} \in \mathcal{M}_1(B^d(\bar{R}))$ and consider a sequence of empirical measures $(\bar{\mu}^N)_N$ in $\mathcal{M}_1(B^d(\bar{R}))$ of the form

    \begin{equation*} \bar{\mu}^N = \frac1N \sum_{i=1}^N \delta_{\bar{x}_{i}^N}, \quad \bar{x}_{i}^N \in B^d(\bar{R}), \quad {such that } \lim_{N\to \infty}W_1\!\left(\bar{\mu}^N, \bar{\mu}\right) = 0. \end{equation*}
    Then, the solutions to (2.15) with initial positions $\left(\bar{x}_1^N, \dots, \bar{x}_N^N\right)$ are a suitable sequence of discrete observations in the sense of Assumption 3.1, part , and the solution to (2.17) provides a corresponding limit observation.

The proof is quite straightforward and provided in Section A.3.

In the following, we assume that observations are admissible in the sense of Assumption 3.1.

3.2 Differential inference functional

In this section, we discuss an inference functional for payoff function J, in close analogy to (1.7). For now, we assume that in addition to $\left(\textbf{x}^N, \textbf{v}^N, \boldsymbol{\sigma}^N\right)$ we can even observe $\partial_t \sigma_i^N$ for all $i \in \{1,\ldots,N\}$ . Our differential inference functional is therefore aimed at recovering J, by comparing the observed $\partial_t \sigma_i^N$ with the predictions by the model (2.2b). In (1.7), the discrepancy between observed $v_i^N$ and predicted velocities is penalised by the squared Euclidean distance. As metric to compare the observed $\partial_t \sigma_i^N$ and the prediction by (2.2b), we choose the (weak) Riemannian tensor of the Hellinger–Kakutani and Fisher–Rao metrics. That is, we set for a base point $\sigma \in \mathcal{S}_{a,b}$ , and two tangent directions $\delta \mu, \delta \nu \in L^p_{\eta}(U)$ attached to it,

(3.7) \begin{align} d_\sigma(\delta \mu,\delta \nu)^2 \;:=\; \frac{1}{4} \int_U \frac{\left(\delta \mu- \delta \nu\right)^2}{\sigma}\,\textrm{d} \eta.\end{align}

A potential inference functional for J could then be:

(3.8) \begin{align} & \mathcal{E}^{N}_{\dot{\sigma}}({J}) \;:=\; \nonumber\\ & \frac{1}{T} \int_0^T \left[ \frac{1}{N} \sum_{i=1}^N d_{\sigma_i^N(t)}\left( \partial_t \sigma_i^N(t), \frac{1}{N} \sum_{j=1}^N f^{{J}}\left(x_i^N(t),\sigma_i^N(t),x_j^N(t),\sigma_j^N(t)\right) +f^\varepsilon\left(\sigma_i^N(t)\right) \right)^2 \right] \textrm{d} t.\end{align}

Minimisation should be restricted to a sufficiently regular and compact class of J. Since $d_\sigma(\cdot,\cdot)^2$ is quadratic in its arguments and $f^{{J}}$ is linear in J, the mismatch term is again quadratic in J and thus restricted to a suitable class, this is a convex optimisation problem.

In Section 4.3, we provide a simple example where we simulate trajectories $(\textbf{x}^N,\boldsymbol{\sigma}^N)$ according to the dynamics (2.2), based on a given J, and then obtain a corresponding minimiser $\kern1pt\hat{{\kern-1pt}J}$ of (3.8). We then show that trajectories simulated with J and $\kern1pt\hat{{\kern-1pt}J}$ are close even when the initial configurations are not drawn from training data used to infer $\kern1pt\hat{{\kern-1pt}J}$ . Indeed, upon selecting a suitable functional framework, one can in principle extend the analysis in [Reference Bongini, Fornasier, Hansen and Maggioni10] (and the analysis presented here below) to prove existence of minimisers, limiting behaviour as $N \to \infty$ and ability of the inferred model to generalise.

However, application to real data seems challenging, as one usually only observes $\left(\textbf{x}^N, \textbf{v}^N\right)$ but not $\boldsymbol{\sigma}^N$ , let alone variations $\partial_t \sigma_i^N$ . To address this, we propose in the next section two new inference functionals for the undisclosed fast-reaction setting.

3.3 Undisclosed fast-reaction inference functionals

3.3.1 Penalty on velocities

The inference situation is simplified in the undisclosed fast-reaction setting since now, for fixed payoff function J, mixed strategies and physical velocities are a direct function of physical locations. We can therefore attempt to minimise the discrepancy between observed physical velocities and those predicted by J.

For a discrete admissible observation $\left(\textbf{x}^N,\textbf{v}^N\right)$ , as in Assumption 3.1, we define the inference functional on hypothetical payoff functions ${J} \in X$

(3.9a) \begin{align} \mathcal{E}^{N}_{v}({J}) & \;:=\; \frac{1}{T} \int_0^T \left[ \frac{1}{N} \sum_{i=1}^N \left\|v_i^N(t)-v_i^{{J}}\left(x_1^N(t),\ldots,x_N^N(t)\right) \right\|^2 \right] \textrm{d} t. \end{align}

The natural candidate for the limit functional is

(3.9b) \begin{align} \mathcal{E}_v({J}) & \;:=\; \frac{1}{T} \int_0^T \int_{\mathbb{R}^d} \left\| v^\infty(t)(x) - v^{{J}}\left(\mu^\infty(t)\right)(x) \right\|^2 \textrm{d} \mu^\infty(t)(x) \textrm{d} t, \end{align}

where the couple $\left(\mu^\infty, v^\infty\right)$ is the corresponding limit of $\left(\textbf{x}^N,\textbf{v}^N\right)_N$ as introduced in Assumption 3.1, point .

This functional is intuitive and requires only the observations of $\left(\textbf{x}^N,\textbf{v}^N\right)$ and no observations about the mixed strategies. However, it is not convex. This could potentially lead to poor local minima, but we did not observe such problems in our numerical examples.

The motivation for studying mean-field inference functionals is twofold. First, it establishes asymptotic consistency of inference in the limit of many agents. Second, the mean-field equation yields an approximation for the expected behaviour of the finite-agent system under many repetitions with random initial conditions. While we do not prove this approximation property, the existence of a consistent mean-field inference functional is still an encouraging indicator that inference over the collection of many finite-agent samples will yield a reasonable result.

3.3.2 Penalty on mixed strategies

In the case where information about the mixed strategies of the agents is available (see Section 3.1 for a discussion), the discrepancy between observed mixed strategies and those predicted by J can be used for inference. We choose to measure this discrepancy with the Kullback–Leibler (KL) divergence.

Similar to above, for a discrete admissible observation $\left(\mathbf{x}^N,\mathbf{v}^N,\boldsymbol{\sigma}^N\right)$ , we define the inference functional for hypothetical payoff functions ${J} \in X$

(3.10a) \begin{align} \mathcal{E}^{N}_\sigma({J}) & \;:=\; \frac{1}{T} \int_0^T \left[ \frac{1}{N} \sum_{i=1}^N \textrm{KL}\left(\sigma_i^N(t)|\sigma_i^{J}\left(x_1^N(t),\ldots,x_N^N(t)\right)\right) \right] \textrm{d} t. \end{align}

The natural candidate for the limit functional is

(3.10b) \begin{align} \mathcal{E}_\sigma({J}) & \;:=\; \frac{1}{T} \int_0^T \int_{\mathbb{R}^d} \textrm{KL}\big(\sigma^\infty(t)(x,\cdot)|\sigma^{{J}}(\mu^\infty(t))(x,\cdot)\big) \textrm{d}\mu^\infty(t)(x)\, \textrm{d} t \nonumber \\[5pt] & = \frac1T \int_0^T \int_{\mathbb{R}^d} \int_U \log\left( \frac{\sigma^\infty(t)(x,u)}{\sigma^{{J}}(\mu^\infty(t))(x,u)} \right) \sigma^\infty(t)(x,u) \textrm{d} \eta(u) \textrm{d} \mu^\infty(t)(x)\,\textrm{d} t, \end{align}

where $(\mu^\infty, \sigma^\infty)$ are as introduced in Assumption 3.1, point . Due to the particular structure of stationary mixed strategies, (2.15), and the KL divergence, functionals (3.10) are convex on X (Proposition A.5), which is a potential advantage over (3.9).

3.3.3 Analysis of the inference functionals

We now provide some theoretical results about the inference functionals of Sections 3.3.1 and 3.3.2. The first result establishes that the obtained minimal inference functional value provides an upper bound on the accuracy of the trajectories that are simulated with the inferred $\kern1pt\hat{{\kern-1.5pt}J}$ (the proof builds upon the proof of [Reference Bongini, Fornasier, Hansen and Maggioni10, Proposition 1.1] and is given in Section A.3).

Theorem 3.3. Let $\kern1pt\hat{{\kern-1pt}J} \in X$ and $\left(\mathbf{x}^N,\mathbf{v}^N,\boldsymbol{\sigma}^N\right)$ be an admissible discrete observation for N agents (cf. Assumption 3.1, point ). Let $\hat{\mathbf{x}}^N$ be the solution of (2.15) induced by $\kern1pt\hat{{\kern-1pt}J}$ for the initial condition $\hat{\textbf{x}}^N(0) = \textbf{x}^N(0) = \left(\bar{x}_{1}^N,\dots,\bar{x}_{N}^N\right) \in [\mathbb{R}^d]^N$ . Then,

\begin{equation*} \|\mathbf{x}^N(t)-\hat{\mathbf{x}}^N(t)\|_N \leq C \sqrt{\mathcal{E}^{N}_v(\kern1pt\hat{{\kern-1pt}J})} \quad \text{and} \quad \|\mathbf{x}^N(t)-\hat{\mathbf{x}}^N(t)\|_N \leq C \sqrt{\mathcal{E}^{N}_\sigma(\kern1pt\hat{{\kern-1pt}J})} \end{equation*}

for all $t \in [0,T]$ , with $C = C(T, \kern1pt\hat{{\kern-1pt}J}, e, \varepsilon)$ . Analogously, let $(\mu^\infty, \sigma^\infty)$ as introduced in Assumption 3.1, point and let $\hat{\mu}$ be the solution of (2.17) induced by $\kern1pt\hat{{\kern-1pt}J}$ for the initial condition $\bar{\mu} = \mu^\infty(0)$ . Then,

\begin{equation*} W_1(\mu^\infty(t),\hat{\mu}(t)) \leq C \sqrt{\mathcal{E}_v(\kern1pt\hat{{\kern-1pt}J})} \quad \text{and} \quad W_1(\mu^\infty(t),\hat{\mu}(t)) \leq C \sqrt{\mathcal{E}_\sigma(\kern1pt\hat{{\kern-1pt}J})} \end{equation*}

for all $t \in [0,T]$ and the same constant C as above.

Next, we address the existence of minimising payoff functions, both in theory and numerical approximation. At the theoretical level, we need to ensure compactness of minimising sequences, which we obtain here by restriction to a suitable compact space. At the numerical level, we are interested in finite-dimensional approximations of this space that asymptotically are dense as the discretisation is refined.

Remark 3.4 (Compactness and finite-dimensional approximation). In order to obtain compactness, we restrict the variational problems to suitable subspaces of X. In particular, for $R, M > 0$ , let us define

\begin{equation*}X_{R,M} = \left\{ J \in \textrm{Lip}_b\!\left(B^d({R}) \times U \times B^d({R})\right) \mid \|J\|_{\infty} + \textrm{Lip}(J) \leq M \right\}.\end{equation*}

The parameter R bounds the learning domain in space: inference where we have no data available is simply meaningless. The parameter M will ensure compactness with respect to uniform convergence.

For each $R,M > 0,$ we consider a family of closed convex subsets $\left(X_{R,M}^N\right)_{N \in \mathbb{N}} \subset X_{R,M}$ satisfying the uniform approximation property: for every $J \in X_{R,M}$ there exists a sequence $(J^N)_{N \in \mathbb{N}} $ uniformly converging to J with $J^N \in X_{R,M}^N$ for each $N \in \mathbb{N}$ . These approximating spaces can be selected to be finite dimensional.

An alternative way to obtain compactness would be via the introduction of additional regularising functionals. This is discussed in Section 4.1. Conceptually, both approaches serve the same purpose and we find that the former is more convenient for analysis while the latter is more attractive numerically.

The following key lemma ensures convergence of the error functionals under uniform convergence of the payoff functions and/or $W_1$ convergence of measures.

Lemma 3.5. Let $\left(\mathbf{x}^N,\mathbf{v}^N,\boldsymbol{\sigma}^N\right)_N$ be a family of admissible observations and $\mathcal{E}^{N}_v$ , $\mathcal{E}_v$ , $\mathcal{E}^{N}_\sigma$ and $\mathcal{E}_\sigma$ be the corresponding discrete and continuous inference functionals (3.9) and (3.10). For $M>0$ , let $\left({J}^N\right)_N$ be a sequence of payoff functions in $X_{R,M}$ , converging uniformly to some ${J} \in X_{R,M}$ as $N \to \infty$ . Then,

(3.11) \begin{align} \lim_{N\to\infty} \mathcal{E}^{N}_v\!\!\left({J}^N\right) = \mathcal{E}_v({J}) \quad \text{ and } \quad \lim_{N\to\infty} \mathcal{E}^{N}_\sigma\!\!\left({J}^N\right) = \mathcal{E}_\sigma({J}). \end{align}

Proof. Thanks to Assumption 3.1, point , we have $W_1\left(\mu^N(t),\mu^\infty(t)\right) \to 0$ uniformly in $t \in [0,T]$ as $N \to \infty$ . Lemma 2.10 provides

\begin{align*} \left\| v^{{J}^N}\left(\mu^N(t)\right)(x) - v^{{J}}(\mu^\infty(t))(x) \right\| & \leq C \cdot \left( W_1\left(\mu^N(t),\mu^\infty(t)\right) + \|\kern1pt\hat{{\kern-1pt}J}^N-\kern1pt\hat{{\kern-1pt}J}\| \right) \end{align*}

uniformly in $x \in B^d({R}), t \in [0,T]$ and N. Hence, $v^{{J}^N}\!\left(\mu^N(t)\right)(x)$ converges uniformly to $v^J(\mu^\infty(t))(x)$ in $C([0,T]\times B^d({R}); \mathbb{R}^d)$ . Since $\|v^J(\mu^\infty(t))(x)\| \leq \|e\|_\infty$ for every $x \in B^d({R})$ and $t \in [0,T]$ , we also have $\|v^{J^N}\left(\mu^N(t)\right)(x) \|^2 \to \|v^J(\mu^\infty(t))(x)\|^2$ uniformly in $C([0,T]\times B^d({R}))$ . Recalling that $v_i^{{J}^N}\left(x_1^N(t),\ldots,x_N^N(t)\right)(x) = v^{{J}^N}\left(\mu^N(t)\right)\left(x_i^N(t)\right)$ by definition and using the convergence of velocities provided by Assumption 3.1, point , the result follows by passing to the limit in the definition:

\begin{align*} \lim_{N \to \infty} \mathcal{E}^{N}_v\!\left({J}^N\right) &= \lim_{N \to \infty} \frac{1}{T} \int_0^T \left[ \frac{1}{N} \sum_{i=1}^N \left\|v_i^N(t)-v_i^{{J}^N}\left(x_1^N(t),\ldots,x_N^N(t)\right) \right\|^2 \right] \textrm{d} t \\[4pt] &= \frac{1}{T} \int_0^T \int_{\mathbb{R}^d} \left\|v^\infty\left(\mu^\infty(t)\right)(x) - v^{{J}}\left(\mu^\infty(t)\right)(x)\right\|^2 \textrm{d} \mu^\infty(t)(x) \textrm{d} t = \mathcal{E}_v({J}). \end{align*}

An analogous derivation applies to the $\sigma$ -energies (3.10) where we use the Lipschitz estimate (2.20) and the $\sigma$ -part of Assumption 3.1, point .

We are now ready to state the main result concerning the inference functionals and convergence of minimisers for $\mathcal{E}_v^{N}$ and $\mathcal{E}_\sigma^{N}$ .

Theorem 3.6. Let $\left(\mathbf{x}^N,\mathbf{v}^N,\boldsymbol{\sigma}^N\right)_N$ be a family of admissible observations and fix $M > 0$ . Then:

  1. 1. The functionals (3.9a) and (3.10a) have minimisers over $X_{R,M}^N$ and (3.9b) and (3.10b) have minimisers over $X_{R,M}$ .

  2. 2. For $N > 0$ , let

    \begin{equation*} {J}^N_v \in arg min_{{J} \in X_{R,M}^N} \mathcal{E}^{N}_v({J}) \quad {and} \quad {J}^N_\sigma \in arg min_{{J} \in X_{R,M}^N} \mathcal{E}^{N}_\sigma({J}). \end{equation*}
    The sequence $\left({J}^N_v\right)_N$ , resp. $\left({J}^N_\sigma\right)_N$ , has a subsequence converging uniformly to some continuous function ${J}_v \in X_{R,M}$ , resp. ${J}_\sigma \in X_{R,M}$ , such that
    (3.12) \begin{align} \lim_{N \to \infty} \mathcal{E}^{N}_v\!\left({J}^N_v\right) = \mathcal{E}_v({J}_v) \quad {and} \quad \lim_{N \to \infty} \mathcal{E}^{N}_\sigma\left({J}^N_\sigma\right) = \mathcal{E}_\sigma({J}_\sigma). \end{align}
    In particular, ${J}_v$ (resp. ${J}_\sigma$ ) is a minimiser of $\mathcal{E}_v^J$ (resp. $\mathcal{E}_\sigma^J$ ) over $X_{R,M}$ .

Proof. Part 1: existence of minimisers. All functionals (3.9) and (3.10) are continuous in J with respect to uniform convergence which follows from Lemma 3.5 setting the initial measure $\bar{\mu}$ to be the empirical one (i.e. just looking at converging J for fixed trajectories). The sets $X_{R,M}$ and $X_{R,M}^N$ are compact with respect to uniform convergence. Therefore, minimisers exist.

Part 2: convergence of minimisers. The proof is inspired by [Reference Bongini, Fornasier, Hansen and Maggioni10, Section 4.2]. Since $X_{R,M}^N \subset X_{R,M}$ and the latter is compact, the sequence of minimisers $\left({J}_v^N\right)_N$ with ${J}^N_v \in X_{R,M}^N$ must have a cluster point ${J}_v$ in $X_{R,M}$ . For simplicity, denote the converging subsequence again by $\left({J}^N_v\right)_N$ . Let $\tilde{J} \in X_{M,R}$ . By the uniform approximation property of $X_{R,M}^N$ let $\left(\tilde{J}^N\right)_N$ be a sequence converging uniformly to $\tilde{J}$ and $\tilde{J}^N \in X_{R,M}^N$ . By Lemma 3.5 and optimality of each ${J}^N_v$ ,

\begin{align*} \mathcal{E}_v({J}_v) = \lim_{N\to\infty} \mathcal{E}^{N}_v\!\left({J}^N_v\right) \leq \lim_{N\to\infty} \mathcal{E}^{N}_v\!\left(\tilde{J}^N\right) = \mathcal{E}_v(\tilde{J}). \end{align*}

Therefore, ${J}_v$ minimises $\mathcal{E}_v$ and the optimal values converge. The same argument covers the $\sigma$ -functionals (3.10) and the corresponding sequence of minimisers $\left({J}_\sigma^N\right)_N$ .

We conclude this section with a couple of comments and remarks on the interpretation of results in Theorem 3.6.

Remark 3.7 (Realisability of data). In general, observed data may or may not be realisable by the model. This distinguishes two regimes.

  • Assume there is a true regular J that generated the observed data (as for example in Lemma 3.2). If we pick M large enough in Theorem 3.6, so that $J \in X_{R,M}$ , one has $\mathcal{E}_v({J}_v) \leq \mathcal{E}_v({J}) = 0$ and analogously $\mathcal{E}_v({J}_\sigma) \leq \mathcal{E}_\sigma({J}) = 0$ . Hence, minimisers of $\mathcal{E}_v$ and $\mathcal{E}_\sigma$ over $X_{R,M}$ reproduce exactly the trajectories of the system thanks to Theorem 3.3.

  • Assume there is no true regular J that generated the observed data. This means that, no matter how large M is taken in Theorem 3.6, the minimal limit energies $\mathcal{E}_\sigma({J}_\sigma)$ and $\mathcal{E}_v({J}_v)$ may not be equal to 0. However, the bound on the trajectories generated by the inferred model, Theorem 3.3, still holds and so the remaining residual values $\mathcal{E}_\sigma({J}_\sigma)$ and $\mathcal{E}_v({J}_v)$ are then an indication for how well our parametrised family of interaction models was able to capture the structure in the observed data.

Remark 3.8 (Non-uniqueness of minimisers). We point out that the parametrisation of an entropic game model by a payoff function J is not unique. Replacing J by $\kern1pt\hat{{\kern-1pt}J}\left(x,u,x^{\prime},u^{\prime}\right) \;:=\; J\!\left(x,u,x^{\prime},u^{\prime}\right) + g\left(x,x^{\prime},u^{\prime}\right)$ yields the same agent trajectories for arbitrary $g \,:\, \Omega \times \Omega \times U$ . This can be seen from the fact that $f^{J+g}=f^J$ for $f^J$ and $f^{J+g}$ as given in (2.3). Intuitively, given some x, x and u, which strategy u is most attractive does not change, if we add a constant benefit for all potential choices. The analogous observation holds for the undisclosed case. This implies that minimisers of (3.8), (3.9) and (3.10) are not unique. This has to be taken into account when we try to interpret an inferred J. In our numerical examples with the undisclosed fast-reaction model, we usually made additional structural assumptions on J (see Section 4.1) and as a consequence did not observe issues with non-unique minimisers. The phenomenon is encountered in a proof-of-concept example on the differential inference functional, Section 4.3.

4 Numerical experiments

In this section, we present some numerical examples for the entropic game models of Section 2 and their inference with the functionals of Section 3 to provide a basic proof-of-concept and some intuition about their behaviour.

4.1 Numerical setup

Preliminaries and discretisation. We assume for simplicity that the strategy space U is finite, that is, $U=\{u_1,\ldots,u_K\}$ for some $K>0$ and fix the reference measure $\eta = \sum_{k=1}^K \frac{1}{K}\delta_{u_k}$ as the normalised uniform measure over U.

For a system of N agents starting at positions $x_1(0), \dots, x_N(0)$ , we observe S snapshots of the evolution at times $t_s = s\cdot \frac{T}{S}$ for $s = 1,\dots,S$ , so that we are given locations $\{x_i(t_s)\}_{s,i}$ , velocities $\{\partial_t x_i(t_s)\}_{s,i}$ and (sometimes) mixed strategies $\{\sigma_i(t_s)\}_{s,i}$ where indices s and i run from 1 to S and 1 to N respectively. Here, $\sigma_{i}(t_s) = (\sigma_{i,1}(t_s),\dots,\sigma_{i,K}(t_s))$ is a discrete probability density with respect to $\eta$ .

Let $\Omega = \prod_{i=1}^d [\ell_i, r_i]$ , $\ell_i,r_i \in \mathbb{R}$ , be the smallest hypercube containing all observed locations. We discretise $\Omega$ by a regular Cartesian grid and describe the unknown payoff J by its values $J(\cdot,u_k,\cdot)$ at grid points for $u_k \in U$ (or $J(\cdot,u_k,\cdot,u_{k^{\prime}})$ in the fully general setting). Between grid points, J is extended by coordinate-wise linear interpolation, that is, we consider piecewise d-linear finite elements over hypercubes.

Within this setting, the inference functional (3.10) reduces to the discrete error functional

(4.1) \begin{align}\mathcal{E}_\sigma^N(J) = \frac{1}{SNK} \sum_{s=0}^S \sum_{i=1}^N \sum_{k=1}^K\log\left( \frac{\sigma_{i,k}(t_s)}{\sigma^{J}_{i,k}(t_s)} \right)\sigma_{i,k}(t_s),\end{align}

where for $s \in \{0,\ldots,S\}$ , $i \in \{1,\dots,N\}$ and $k \in \{1,\dots,K\}$ , the optimal mixed strategy at time $t_s$ is given by (2.15) as

\begin{equation*}\sigma^{J}_{i,k}(t_s) = \sigma^{J}_{i}(x_1(t_s),\dots,x_N(t_s))(u_k) = \frac{\exp\left(\dfrac{1}{\varepsilon N}\sum_{j=1}^N J(x_i(t_s),u_k,x_j(t_s))\right)}{\frac{1}{K}\sum_{k^{\prime}=1}^K \exp\left(\dfrac{1}{\varepsilon N}\sum_{j=1}^N J(x_i(t_s),u_{k^{\prime}},x_j(t_s))\right)}.\end{equation*}

Similarly, the inference functional (3.9) reduces to the discrete functional

(4.2) \begin{align}\mathcal{E}_v^N(J) = \frac{1}{SN} \sum_{s=0}^S \sum_{i=1}^N \left\| v_i^J(t_s) - \partial_t x_i(t_s)\right\|^2,\end{align}


\begin{equation*}v_i^J(t_s) = v^{J}_{i}\left(x_1(t_s),\dots,x_N(t_s)\right) = \frac{1}{K}\sum_{k=1}^K e\left(x_i(t_s), u_k\right) \sigma_{i,k}^J(t_s).\end{equation*}

Reducing dimensionality. When $d \geq 2$ , describing a general function $J \,:\, \Omega \times U \times \Omega \to \mathbb{R}$ where $\Omega \subset \mathbb{R}^d$ (or even $\Omega \times U \times \Omega \times U \to \mathbb{R}$ in the fully general setting) requires many degrees of freedom (one value for each node on the grid) and inferring them from data in a reliable way would require a large set of observations. After inference, interpreting such a general function to understand the interactions between agents can be a daunting task. For these two reasons, we may wish to incorporate some prior knowledge on the structure of J via a suitable ansatz. A prototypical example is

(4.3) \begin{align}J\!\left(x,u,x^{\prime}\right)=J_1(x,u) + J_2(x^{\prime}-x,u),\end{align}

where the first term models how a single agent prefers to move by itself depending on its absolute location, and the second term models a translation invariant pairwise interaction with other agents. This reduces the dimensionality from one function over $\mathbb{R}^d \times U \times \mathbb{R}^d$ to two functions over $\mathbb{R}^d \times U$ . Further simplifications can be made, for instance, by assuming that the interaction is also rotation invariant. Such parametrisations are very common in Newtonian models, see for instance [Reference Bongini, Fornasier, Hansen and Maggioni10], where radially symmetric interaction functions are assumed. Unless noted otherwise, we will in the following use the ansatz (4.3) in our experiments.

Regularisation and dropping the Lipschitz constraints. Even after the reduction (4.3), inferring the coefficients of J is a high-dimensional nonlinear inverse problem. We may observe no (or very few) agents near some grid points and thus have no reliable data to set the corresponding coefficients of J directly. A common approach to avoiding ill-posedness is to add regularisation to the minimisation problem. Intuitively, this will diffuse information about observations over the grid to some extent. For regularisation in $\Omega,$ we use the squared $L^2$ -norm of the gradient of J, which boils down to weighted sums of the finite differences between grid points. For the strategy space U, one can in principle do the same, although this was not necessary in our examples. When J is split as in (4.3), a regulariser can be applied separately to each term. The regularised version of (4.1) then becomes

(4.4) \begin{align}\mathcal{E}_\sigma^N(J) + \lambda_1 \cdot \mathcal{R}_1(J_1) + \lambda_2 \cdot \mathcal{R}_2(J_2).\end{align}

Of course, over-regularisation will lead to loss of contrast and high-frequency features of J. The effect of under-regularisation is illustrated in Example 4.5.

While in Theorem 3.6 we consider increasingly finer discretisations $X_{R,M}^N$ , $N \to \infty$ , of the payoff function space $X_{R,M}$ , in our numerical examples a fixed resolution is entirely sufficient (approximately 30 grid points along any spatial dimension). On a fixed grid, the Lipschitz and $L^2$ -gradient semi-norms are equivalent. Consequently, for simplicity, in our numerical experiments we may drop the explicit constraint on the Lipschitz constant of J and only use the regularisation term to impose regularity. This has the advantage that our finite-dimensional discrete problem is then unconstrained, and we may approach it with quasi-Newton methods such as L-BFGS, for which we use the Julia package Optim [Reference Mogensen and Riseth52].

Inferring ${\boldsymbol\sigma}_{\kern-1pt\textbf{i}}(\textbf{t}_{\textbf{s}})$ from observations. In many situations, we are not able to observe the mixed strategies of agents, but merely their locations and velocities. We can then decide to use the nonconvex velocity-based error functional (4.2). Alternatively, in case we want to optimise the convex inference functional $\mathcal{E}_\sigma^N$ (4.1) instead, we can try to infer some meaningful $\sigma_i(t_s)$ from the observed velocities. In both cases, we need to make suitable choices for U and e. Unless stated otherwise, we pick U as a discrete subset of $\mathbb{R}^d$ such that observed velocities lie in the (interior of the) convex hull of U and pick $e(x,u) \;:=\; u$ . This means that every observed velocity can be reproduced by some mixed strategy. As a heuristic method for inferring a viable mixed strategy $\sigma$ from a velocity v, we propose to solve the following minimisation problem:

(4.5) \begin{align}\arg \min\left\{\int_U \left[ \varepsilon \cdot \log(\sigma(u)) + \|u-v\|^2 \right]\,\sigma(u)\,\textrm{d} \eta(u)\middle|\sigma \in \mathcal{S}_{a,b}, \,\int_U u \cdot \sigma(u)\,\textrm{d} \eta(u)=v\right\}\end{align}

The constraint enforces that $\sigma$ reproduces the observed velocity v. The term $\|u-v\|^2\,\sigma(u)$ encourages $\sigma$ to be concentrated in the vicinity of v in U, whereas the entropic term $\varepsilon \cdot \log(\sigma(u))\,\sigma(u)$ keeps $\sigma$ from collapsing onto a single point. It is easy to see that minimisers of (4.5) are of the form $\sigma(u)=A \cdot \exp(-\|u-\tilde{v}\|^2/\varepsilon)$ where A is a normalisation factor and $\tilde{v}$ is a velocity close to v, potentially slightly shifted to make sure that the constraint $\int_U u \cdot \sigma(u)\,\textrm{d} \eta(u)=v$ is satisfied. (4.5) is readily solved by standard methods. We consider this approach a heuristic ‘inverse’ of the approximation of Newtonian models by game models, as outlined in Example 2.7.

Observations from forward models: Subsampling and multiple realisations. In all the examples here below, observations are obtained by means of simulating a forward model with explicit Euler-stepping with time-step size $\Delta t=0.02$ . Since the numerical complexity of our inference functionals grows linearly with the number of observations, we usually choose to sub-sample the trajectories in time, keeping only every $\delta$ -th step, since intuitively the information contained in subsequent steps is largely redundant as agents are confronted with a very similar configuration. Unless noted otherwise, we set $\delta=2$ .

The quality of the reconstructed model is highly related to the data quality: more observations in a given region imply a better reconstruction of the payoff function in that specific region. Generally, if we consider as inputs only data coming from a single realisation of the system we have highly correlated observations, which tend to concentrate only on a small subset of the reconstruction domain. Vaster regions of the domain are then explored taking $N \to \infty$ . However, another option to enhance the quality of the reconstructed model is to combine data from multiple realisations with different initial conditions, that is, we fix the number of agents N and simulate the system for multiple different initial conditions. This provides a good coverage of the domain. Extending the functionals in Section 3 to multiple observations is straightforward.

4.2 Learning from undisclosed fast-reaction game models

We now begin with relatively simple numerical examples: an explicit game model is fixed (including U, e, J), simulations according to this model are performed and subsequently the payoff function is inferred from the observations. By possibly re-scaling J, we may w.l.o.g. set $\varepsilon=1$ for all numerical experiments, see (2.15). In this section we focus on the undisclosed fast-reaction setting. A proof of concept of inference for the full entropic game models via the differential functional of Section 3.2 is discussed in Section 4.3.

Example 4.1 (Game model, $d=1$ ). Let $d=1$ , $U=\{-1,+1\}$ and $e(x,u)=u$ . We set

(4.6) \begin{align} J\!\left(x,u,x^{\prime}\right) & \;:=\; J_1(x,u) + J_2(x^{\prime}-x,u), \qquad J_1(x,u) \;:=\; -u \cdot x, \nonumber \\ J_2(\Delta x,u) & \;:=\; -u \cdot \tanh(5\Delta x) \cdot \left(\max\left\{1-|\Delta x|^2,0\right\}\right)^2. \end{align}

$J_1$ encourages agents to move towards the origin $x=0$ ; for example, for $x>0$ one has $J_1(x,-1)>J_1(x,+1)$ , thus preferring strategy (and velocity) $u=-1$ over $u=+1$ . $J_2$ implements repulsion between nearby agents: when $\Delta x \in (0,1)$ , that is, the ‘other agent’ is to the right of ‘me’ and relatively close, then $J_2(\Delta x,-1)>J_2(\Delta x,+1)$ , and I am encouraged to move left.

We set $N=8$ , simulated 100 instances over a time span of $0.2$ , collecting 5 data points per instance, that is, a total of 500 observed configurations (consisting of locations and mixed strategies). In each instance, initial locations are sampled uniformly from $[-1,1]$ . Observed relative locations between two agents are thus distributed over $[-2,2]$ . Describing the discrete J required 178 coefficients. Since observed mixed strategies are available in this example, we use the energy (4.1) for inference augmented with a regulariser as discussed in Section 4.1, with $\lambda_{1}=\lambda_2=10^{-6}$ , see (4.4). The results are illustrated in Figure 4. We find that, as intended, the model describes agents moving towards the origin while avoiding getting too close to each other. The functions $J_1$ and $J_2$ are accurately recovered from the data. For newly generated initial conditions, that is, not taken from the training data, the trajectories simulated with the inferred J are in excellent agreement with the underlying true model, demonstrating that the inferred model is able to generalise.

Example 4.2 (Towards the mean-field regime). We can also explore the mean-field regime numerically. Figure 5 shows trajectories for the same model as in the previous example for an increasing number of particles where the initial locations are sampled from some underlying distribution. As anticipated, the behaviour approaches a consistent limit.

Inference can also be performed for larger numbers of particles. We applied the same approach as in the previous example with $N=100$ agents. Since we are dealing with a large number of agents, already a small number of configurations carries enough information for the inference. Thus, we simulate 10 instances over a time span of $0.02$ with time-step size $\Delta t = 0.002$ , collecting 2 data points per instance; that is, a total of 20 observed configurations (for locations and mixed strategies). In each instance, initial locations are sampled uniformly from $[-1,1]$ . The inferred payoff function is essentially identical to the one obtained in Figure 4.

Figure 4. Inference from 1d observations of an undisclosed fast-reaction system. (a,b): reconstruction of the payoff function. (c): distribution of the data in the training set. (d): comparison between true trajectories (solid lines) and trajectories generated by the inferred model (dashed lines) for two new realisations (not part of the training data).

Figure 5. Numerical approximation of the mean-field limit. The forward model of Example 4.1 is simulated for an increasing number of agents. Each panel shows a discrete histogram of the particle distribution over time. As N increases, the histograms approach a consistent limit.

Example 4.3 (Game model, $d=1$ , noisy data). We repeat the previous example with some added noise. We consider the same set of observations, but the observed mixed strategies are corrupted by re-sampling: For a given ‘true’ simulated $\sigma$ , we draw 20 times from U, according to $\sigma$ and now use the resulting empirical distribution for inference. These new corrupted $\sigma$ provide corrupted velocities predictions: in our example, the overall standard deviation between exact and corrupted velocities is $\approx 0.2$ (with the range of admissible velocities being $[-1,1]$ ). This could model the situation where observation of the mixed strategies is imperfect or by observing very noisy (stochastic) agent movement and using empirical distributions of agent velocities over short time intervals as substitute for mixed strategies. The results are shown in Figure 6, for parameters $\lambda_{1}=\lambda_2=10^{-5}$ to enforce more regularity due to noisy inputs. The inferred payoff functions are similar to the previous example with just a few spurious fluctuations. The trajectories simulated with the inferred J are close to the (unperturbed) true trajectories, the error is consistently smaller than the one caused by the re-sampling noise, indicating that inference also works well in noisy settings.

Figure 6. Inference from noisy 1d observations of an undisclosed fast-reaction system. (a): reconstruction of the payoff function for $u=-1$ . (b): comparison between true trajectories (solid lines) and trajectories generated by the inferred model (dashed lines) for multiple new realisations (not part of the training data).

Example 4.4 (Game model, $d=2$ ). Let $d=2$ , $U = \{(1,1),(-1,1),(-1,-1),(1,-1)\} \subset \mathbb{R}^2$ and $e(x,u)=u$ . We use the same J defined in (4.6), thus combining a driving force towards the origin $x=(0, 0)$ and repulsion between nearby agents (we assume $\tanh$ to be applied componentwise to vectors).

Data collection follows the same approach as in Example 4.1: for $N=8$ agents we simulate 100 instances over a time span of $0.2$ , collecting 5 data points per instance, for a total of 500 observed configurations. In each instance, initial locations are sampled uniformly from $[-0.75,0.75]^2$ . Observed relative locations between two agents are thus distributed over $[-1.5,1.5]^2$ , as displayed in Figure 7(b). Describing the discrete J required 10,656 coefficients, resulting from a $30\times 30$ spatial grid for $J_1$ and a $42\times 42$ spatial grid for $J_2$ . We use again the energy (4.1) for inference with a regulariser on derivatives, with $\lambda_{1}=\lambda_2=10^{-5}$ . The results are illustrated in Figure 7. The recovered functions $J_1$ and $J_2$ reproduce the same structural features of the exact ones and the trajectories simulated with the inferred J follow closely the underlying true model (also in this case newly generated initial conditions are considered).

Figure 7. Inference from 2d observations of an undisclosed fast-reaction system. (a): inferred payoff function and ground truth for pure strategies (1,1) and $(-1,1)$ . (b): distribution of the data in the training set (agents locations and relative locations), dark blue : low, yellow: high. (c): comparison between exact trajectories (solid lines) and trajectories generated by the inferred model (dashed lines) for two new realisations (not part of the training data).

Example 4.5 (Under-regularisation). As is common in inverse problems, if the regularisation parameters $\lambda_1, \lambda_2$ in (4.4) are chosen too high, one obtains an over-regularised result, where the large penalty for spatial variation leads to a loss of contrast and high-frequency features of J. Conversely, we may experience under-regularisation when the parameters are small and the observed data does not cover the learning domain sufficiently, for example when we only observe few realisations of the system. Such a case is illustrated in Figure 8(c) where the observations are concentrated on a few one-dimensional trajectories. For weak regularisation, the reconstructed J then tends to be concentrated on these trajectories as well, see Figure 8(a). And while the inferred J may predict the agent behaviour with high accuracy on these trajectories, it will generalise very poorly to different data. With stronger regularisation, J is extended beyond the trajectories in a meaningful way, see Figure 8(b).

Figure 8. Influence of the regularisation parameter. Analogous to Figure 7 but with fewer observations, as displayed in (c). (a): reconstruction for low regularisation parameters, $\lambda_1=\lambda_2=10^{-15}$ . (a): reconstruction for higher regularisation parameters, $\lambda_1=\lambda_2=10^{-5}$ . See Example 4.5 for more details.

4.3 An example for differential inference

The ‘differential inference’ functional $\mathcal{E}_{\dot{\sigma}}^{J,N}$ introduced in Section 3.2 was constructed in close analogy to (1.7). We have argued that it may be challenging to apply to real data, as it would require the observation of mixed strategies $\{\sigma_{i}(t_s)\}_{s,i}$ and their temporal derivatives $\{\partial_t \sigma_{i}(t_s)\}_{s,i}$ . As a proof of concept, we now demonstrate that if these observations are indeed available, the numerical setup described up to now can be easily adapted to $\mathcal{E}_{\dot{\sigma}}^{J,N}$ . After discretisation, $\mathcal{E}_{\dot{\sigma}}^{J,N}$ reduces to a finite-dimensional quadratic objective. For a quadratic regulariser, such as the squared $L^2$ -norm of the gradient of J, minimisation then corresponds to solving a linear system. Alternatively, we can implement the Lipschitz constraint on J which amounts to adding linear constraints to the quadratic problem. We solve the resulting finite-dimensional optimisation problem using the operator splitting approach provided by the OSQP solver [Reference Stellato, Banjac, Goulart, Bemporad and Boyd69].

As an example, we simulate the entropic regularised model in (2.2) for $U = \{-1,1\}$ and $J \colon \mathbb{R} \times \{-1,1\} \times \mathbb{R} \times \{-1,1\}$ defined by

\begin{align*}J\!\left(x,u,x^{\prime},u^{\prime}\right) &= -\frac{1}{2}\left(u+u^{\prime}\right)\left((u+1)x^5+(u-1)(x+x^{\prime})^3\right)\end{align*}

Our choice of J is not motivated by providing a meaningful model for anything, but simply to illustrate that inference works in principle. We collect data using the same sub-sampling approach presented above: 8 agents, 100 realisations for a time span of 0.2, initial conditions uniformly sampled from $[-1,1]$ , only every other observation is kept for a total of 500 input configurations. Observed positions (x, x ) with $x \neq x^{\prime}$ are displayed in Figure 9(a). No particular structure is assumed for J.

Figure 9. Inference from 1d observations generated with (2.2). (a): distribution of observed pairs (x, x ), $x \neq x^{\prime}$ . (b): comparison between exact trajectories (solid lines) and inferred model (dashed lines) on new realisations (not part of the training data).

As discussed in Remark 3.8, the optimal inferred J is not unique. As a remedy, we normalise the reconstructed $J^r$ such that $J^r(x,1,x^{\prime},-1) = J^r(x,-1,x^{\prime},1) = 0$ . Comparisons for trajectories are reported in Figure 9(b) for new randomly selected initial conditions with $T = 8$ : trajectories with the reconstructed $J^r$ follow closely the trajectories generated with the true J.

4.4 Learning for Newtonian models

In Example 2.7, we have shown that entropic game models with fast reaction can also capture Newtonian interaction models. In this section, we perform inference with an entropic game model ansatz on data generated by a Newtonian model. As an example, we pick

(4.7) \begin{align}\partial_t x_i(t) = \frac{1}{N} \sum_{j=1}^N f\!\left(x_i(t),x_j(t)\right)\qquad \text{with} \qquad f(x,x^{\prime}) = - x - \frac{\tanh\left(5\left(x^{\prime}-x\right)\right)}{\left(1+\|x^{\prime}-x\|\right)^2},\end{align}

where the function $\tanh$ is assumed to be applied componentwise. The first term in f encourages agents to move towards the origin, the second term implements a pairwise repulsion at close range. Trajectory data are collected for a two dimensional system: for $N=8$ agents, we simulate 100 instances of (4.7) over a time span of $0.2$ , collecting 5 data points per instance to total 500 observed configurations. In each instance, initial locations are sampled uniformly from $[-0.75,0.75]^2$ . Observed relative locations between two agents are thus distributed over $[-1.5,1.5]^2$ , as displayed in Figure 10(b).

Figure 10. Inference from 2d observations of a Newtonian system. Top rows: inferred payoff function for pure strategies (1,1) and $(-1,1)$ , bottom line: distribution of the data in the training set (agents locations and pairwise distances) and comparison between exact Newtonian trajectories (solid lines) and trajectories generated by the inferred game model (dashed lines).

We use the parametrisation (4.3) for the payoff function. Unlike in the previous examples, the original model does not involve a strategy set U or a map e. Thus, prior to inference, we have to make informed choices for these. As in earlier examples, we choose $U \subset \mathbb{R}^2$ and $e(x,u)=u$ (i.e. pure strategies correspond directly to velocities) and in particular we pick $U = \{u_1,\dots,u_K\}$ such that the convex hull of U in $\mathbb{R}^2$ contains (almost) all observed velocities, meaning that they can be reproduced by suitable mixed strategies. So for (almost) every i and s, we can write $\partial_t x_i(t_s) = \sum_k u_k\sigma_{k}$ for some mixed strategy $\sigma$ over U.

Example 4.6. We pick $U = \{(1,1),(-1,1),(-1,-1),(1,-1)\} \subset \mathbb{R}^2$ , optimise the velocity-based energy (4.2) because no mixed strategies are provided by observations and consider a regulariser on derivatives as described in Section 4.1, with $\lambda_{1}=\lambda_2=10^{-5}$ . Using the same setup as Example 4.4 we describe the discrete J using 10656 coefficients ( $30\times 30$ grid for $J_1$ and $42\times 42$ grid for $J_2$ ). The results are illustrated in Figure 10. The recovered functions $J_1$ and $J_2$ have the same structure as the ones in Example 4.4: the qualitative behaviour of the two systems is indeed the same. In Figure 10(c), we simulate a couple of dynamics with newly generated initial conditions and observe again how Newtonian trajectories and game trajectories compu