Geometric properties of disintegration of measures

RENATA POSSOBON; CHRISTIAN S. RODRIGUES

doi:10.1017/etds.2024.74

Geometric properties of disintegration of measures

Part of: Smooth dynamical systems: general theory Miscellaneous topics in calculus of variations and optimal control Optimality conditions Random dynamical systems

Published online by Cambridge University Press: 11 October 2024

RENATA POSSOBON

and

CHRISTIAN S. RODRIGUES

Show author details

RENATA POSSOBON: Affiliation:
Institute of Mathematics, Department of Applied Mathematics, Universidade Estadual de Campinas, 13.083-859 Campinas, SP, Brazil (e-mail: re.possobon@gmail.com)
CHRISTIAN S. RODRIGUES*: Affiliation:
Institute of Mathematics, Department of Applied Mathematics, Universidade Estadual de Campinas, 13.083-859 Campinas, SP, Brazil (e-mail: re.possobon@gmail.com)
*: e-mail: rodrigues@ime.unicamp.br

Article contents

Abstract
Introduction
Spaces of probability measures, Wasserstein spaces and optimal transport
Disintegration of measures
Disintegration maps
Absolute continuity from disintegration maps
References

Rights & Permissions

Abstract

In this paper, we study a connection between disintegration of measures and geometric properties of probability spaces. We prove a disintegration theorem, addressing disintegration from the perspective of an optimal transport problem. We look at the disintegration of transport plans, which are used to define and study disintegration maps. Using these objects, we study the regularity and absolute continuity of disintegration of measures. In particular, we exhibit conditions for which the disintegration map is weakly continuous and one can obtain a path of measures given by this map. We show a rigidity condition for the disintegration of measures to be given into absolutely continuous measures.

Keywords

disintegration theorem probability optimal transport ergodic theory dynamical systems

MSC classification

Primary: 37C40: Smooth ergodic theory, invariant measures 49K45: Problems involving randomness 49N60: Regularity of solutions

Secondary: 37H10: Generation, random and stochastic difference and differential equations 37C05: Smooth mappings and diffeomorphisms

Type: Original Article
Information: Ergodic Theory and Dynamical Systems , First View , pp. 1 - 30

DOI: https://doi.org/10.1017/etds.2024.74 [Opens in a new window]
Copyright: © The Author(s), 2024. Published by Cambridge University Press

1. Introduction

The disintegration of a measure over a partition of the space on which it is defined is a way to rewrite this measure as a combination of probability measures, which are concentrated on the elements of the partition. As an example, consider the probability space $(X, \mathcal {F}, \mu )$ and its partition into a finite number of measurable subsets $P_{1}, \ldots , P_{n}$ with positive measure. A disintegration of $\mu $ with respect to (w.r.t.) this partition is a family of probabilities $\{\mu _1, \ldots \mu _n \}$ on X such that for $i=1, \ldots , n$ , we have $\mu _{i} (P_i) = 1$ and, for every measurable set $E \subset X$ , the conditional measures are given by $\mu _i (E) = {\mu (E \cap P_i)}/ {\mu (P_i)}$ . It is possible to write the original measure as a combination of the conditional ones

$$ \begin{align*} \mu(E) = \sum_{i=1}^{n} \mu (P_{i}) \mu_{i}(E) = \sum_{i=1}^{n} \mu (P_{i}) \frac{\mu(E \cap P_i)}{\mu(P_i)}. \end{align*} $$

More generally, consider a probability space $(X, \mathcal {F}, \mu )$ and a partition $\mathcal {P}$ of X into measurable subsets. Let $\varrho $ be the natural projection that associates each point $x \in X$ to the element $P \in \mathcal {P}$ which contains x. The measurable function $\varrho $ can be used to induce a probability $\hat {\mu }$ on $\mathcal {P}$ . A subset $B \subset \mathcal {P}$ is measurable if and only if $\varrho ^{-1}(B)$ is a measurable subset of X. Then, the family $\hat {\mathcal {B}}(\mathcal {P})$ of measurable subsets is a $\sigma $ -algebra on $\mathcal {P}$ . Let $\hat {\mu }$ denote the measure given by

$$ \begin{align*} \hat{\mu}(B) = \varrho_{*}\mu (B) := \mu \circ \varrho^{-1} (B) = \mu(\{x \in X : \varrho(x) \in B\}) \end{align*} $$

for every $B \in \hat {\mathcal {B}}(\mathcal {P})$ . In this case, $\hat {\mu }$ is called the law of $\varrho $ , denoted by $\mathrm {law}(\varrho )$ . A disintegration of $\mu $ w.r.t. $\mathcal {P}$ into conditional measures is a family $\{{\mu }_{P}:P\in \mathcal {P} \}$ of probability measures on X such that for every $E \in \mathcal {F}$ :

(1) $\mu _{P}(P)=1$ for $\hat {\mu }$ -almost every (a.e.) $P \in \mathcal P$ ;
(2) $P \mapsto \mu _{P}(E)$ is measurable;
(3) $\mu (E)=\int \mu _{P}(E) ~d \hat {\mu }(P)$ .

There are several reasons why one may wish to study such possible combinations of measures. In ergodic theory, for example, the disintegration of a measure is directly related to the ergodic decomposition of invariant measures, which are crucial objects encoding the asymptotic behaviour of dynamical systems [Reference Oliveira and VianaOV14]. The concept of disintegration, however, appears in much broader context in areas, such as probability [Reference Chang and PollardCP97, Reference ParthasarathyPar67] and geometry [Reference SturmStu06], among others [Reference Butterley and MelbourneBM17, Reference Galatolo and LucenaGL20, Reference VarãoVar16, Reference VillaniVil03].

The idea of disintegrating a measure was devised by Von Neumann in 1932 [Reference Von NeumannVon32]. Since then, different versions of disintegration theorems have been presented and used, for example, in [Reference Ambrosio, Gigli and SavaréAGS05, Reference Ambrosio and PratelliAP03, Reference Dellacherie and MeyerDM78, Reference TjurTue75], to name a few. In particular, the well-known Rokhlin disintegration theorem shows that there exists a disintegration of $\mu $ relative to $\mathcal {P}$ if X is a complete separable metric space and $\mathcal {P}$ is a measurable partition. By measurable partition, we mean that there exists some measurable set $X_{0} \subset X$ such that $\mu (X)=\mu (X_{0})$ and $\mathcal {P} = \bigvee _{n=1}^{\infty } \mathcal P_{n} = \{ P_1 \cap P_2 \cap \cdots : P_n \in \mathcal {P}_n ~\text {for all}~ n \geq 1 \}$ restricted to $X_{0}$ , for an increasing sequence $\mathcal P_{1} \prec \mathcal P_{2} \prec \cdots \prec \mathcal P_{n} \prec \cdots $ of countable partitions [Reference RokhlinRok52].

Recently, Simmons has proposed a more general and subtle formulation of Rokhlin’s disintegration theorem, where he has considered any universally measurable space $(X, \mathcal {B}, \mu )$ and a measure space Y for which there exists an injective map $Y \to \{0, 1\}^{\mathbb N}$ . That is, Y is any subspace of the standard Borel space. He has shown that there is a (unique) system of conditional measures $(\mu _{y})_{y \in Y}$ , a disintegration of $\mu $ [Reference SimmonsSim12]. Then, his formulation is further developed to address $\sigma $ -finite measure spaces with absolutely continuous morphisms. One of the facts standing out in Simmons’ formulation is a fibre-wise perspective, which we wish to further explore.

Even though geometric properties of the space where a measure is defined and statistical properties obtained via disintegration theorems seem to be strictly connected, for instance, in foliated manifolds, very little geometric information is taken into account while studying disintegration of measures. In particular, intrinsic geometric properties of probability spaces are very often neglected. The purpose of this paper is to advertise the viewpoint of tackling disintegration of measures taking into consideration intrinsic structures of probability spaces obtained from optimal transport theory. To this end, we will formulate disintegration of probability measures in terms of a transportation problem to explore a fibre-wise formulation of a disintegration and its consequences.

1.1. Main results

As a first result in this paper, we prove a disintegration theorem, Theorem A, and we introduce a fibre-wise perspective on disintegration. Using this disintegration theorem, we study conditions in which one can obtain a path of conditional measures in the space of probability measures. In particular, in Propositions 5.2, 5.4 and 5.8, we investigate the weak continuity of the disintegration map, which parametrizes the conditional measures. The last result in this paper is Theorem B. Its first part shows how to construct a path of conditional measures in the space of probabilities. Its second part gives us a sort of rigidity result for disintegration of measures. Namely, we show that if one of the measures in this path is absolutely continuous, then all measures in the associated path must also be absolutely continuous. The third part is a particular case of absolute continuity in which the disintegration map is an isometry.

The paper is organized as follows. Section 2 contains the main concepts from optimal transport theory to be used throughout the paper. In §3, a disintegration theorem, stated as Theorem A, is proved. In §4, Theorem A is used to define and study what we called disintegration maps. These crucial objects are used in §5. There we proved a series of propositions about a disintegration map: in Proposition 5.2, we show that this map is nearly weakly continuous, under some assumptions on the reference measure $\nu $ . In Propositions 5.4 and 5.8, we study hypotheses about the disintegration for which this map is weakly continuous. Afterwards, we prove our main result in this section, Theorem B, about paths of measures given via disintegration maps and a rigidity condition establishing absolute continuity of measures in these paths.

2. Spaces of probability measures, Wasserstein spaces and optimal transport

In this section, we set up some notation to be used throughout the text. We also introduce some basic terminology from optimal transport theory, which is meant for readers not familiar with this area. Those who are skilled on the topic may wish to skip this section. Our main references for this section are [Reference Ambrosio, Gigli and SavaréAGS05, Reference AmbrosioAmb00, Reference VillaniVil09].

The cornerstone for the theory of optimal transportation is considered to be a logistic problem addressed by Gaspar Monge in 1781. The main idea is to transport masses from a given location to another one at minimal cost. To state it in a modern formulation, let $\mathscr {P}(X)$ be the set of all Borel probability measures on X. The problem amounts to the following.

Monge transport problem. Let X, Y be Radon spaces. Given measures $\mu \in \mathscr {P}(X)$ , $\nu \in \mathscr {P}(Y)$ and a fixed Borel cost function $c: X \times Y \to [0, \infty ]$ , minimize

$$ \begin{align*} T \mapsto \int_{X} c(x, T(x)) ~d\mu \end{align*} $$

among all maps T such that $T_{*}\mu = \nu $ .

The maps T fulfilling $T_{*}\mu = \nu $ are called transport maps. The Monge transport problem actually may be ill-posed and such a map does not need to exist. That is the case, for example, when one of the measures is a Dirac mass and the other one is not. A way around is given by a different formulation as proposed by Kantorovich.

Monge–Kantorovich transport problem. Let X, Y be Radon spaces. Given measures $\mu \in \mathscr {P}(X)$ , $\nu \in \mathscr {P}(Y)$ and a fixed Borel cost function $c: X \times Y \to [0, \infty ]$ , minimize

(1)

$$ \begin{align} \gamma \mapsto \int_{X \times Y} c(x, y) ~d\gamma(x, y) \end{align} $$

among all measures $\gamma \in \mathscr {P}(X \times Y)$ with marginals $\mu $ and $\nu $ , i.e. $\gamma $ satisfying $(\mathrm {proj}_{X})_{*}\gamma =\mu $ and $(\mathrm {proj}_{Y})_{*}\gamma =\nu $ , where $\mathrm {proj}_{X}$ and $\mathrm {proj}_{Y}$ are the canonical projections $(x, y) \mapsto x$ and $(x, y) \mapsto y$ , respectively.

The measures $\gamma \in \mathscr {P}(X \times Y)$ are called transport plans. We denote the set of all transport plans with marginals $\mu $ and $\nu $ by $\Pi (\mu , \nu )$ . The value $C(\mu , \nu ) = \inf _{\gamma \in \Pi (\mu , \nu )} \int _{X \times Y} c(x, y) ~d\gamma (x, y)$ is called optimal cost. Note that $(\mathrm {proj}_{X})_{*}\gamma =\mu $ is equivalent to $\gamma [A \times Y]=\mu [A]$ for every $A \in \mathcal {B}(X)$ , and $(\mathrm {proj}_{Y})_{*}\gamma =\nu $ is equivalent to $\gamma [X \times B]=\nu [B]$ for every $B \in \mathcal {B}(Y)$ . Moreover, it is possible to describe this problem in terms of the coupling of measures, in the following sense.

Definition 2.1. Let $(X, \mu )$ and $(Y, \nu )$ be probability spaces. A coupling of $(\mu , \nu )$ is a pair $(\mathcal {X}, \mathcal {Y})$ of measurable functions in a probability space $(\Omega , \mathbb {P})$ such that law $(\mathcal {X})=\mu $ and law $(\mathcal {Y})=\nu $ .

Considering $\Omega = X \times Y$ , coupling $\mu $ and $\nu $ means to construct $\gamma \in \mathscr {P}(X \times Y)$ with marginals $\mu $ and $\nu $ . Then, the Monge–Kantorovich transport problem can be understood as the minimization of the total cost, over all possible couplings of $(\mu , \nu )$ .

Whenever these problems are stated in metric spaces, we may choose the cost function to be the distance function itself, which in turn allows us to introduce a distance function between measures.

Definition 2.2. (Wasserstein distance)

Let $(X, d)$ be a separable complete metric space. Consider probability measures $\mu $ and $\nu $ on X and $p \in [1, \infty )$ . The Wasserstein distance of order p between $\mu $ and $\nu $ is given by

$$ \begin{align*} W_{p}(\mu, \nu):= \bigg( \inf\limits_{\gamma \in \Pi(\mu, \nu)} \int d(x_{1}, x_{2})^{p} ~d\gamma (x_{1}, x_{2})\bigg)^{{1}/{p}}. \end{align*} $$

In general, $W_{p}$ is not a distance in the strict sense, because it can take the value $+\infty $ . To rule this situation out, it is natural to constrain $W_{p}$ to a subset in which it takes finite values.

Definition 2.3. (Wasserstein space)

The Wasserstein space of order p is defined by

$$ \begin{align*} \mathscr{P}_{p}(X):= \bigg\{ \mu \in \mathscr{P}(X) : \int d(x, \tilde{x})^{p} \mu(dx) < +\infty \bigg\} \end{align*} $$

for $\tilde {x} \in X$ arbitrary.

Therefore, $W_{p}$ sets a (finite) distance on $\mathscr {P}_{p}(X)$ . It turns out that if $(X, d)$ is a complete separable metric space and d is bounded, then the p-Wasserstein distance metrizes the weak topology over $\mathscr {P}(X)$ [Reference VillaniVil09, Corollary 6.13]. Furthermore, if X is a complete metric space, then so is $\mathscr {P}(X)$ with the p-Wasserstein distance [Reference VillaniVil09, Theorem 6.18].

Although the Wasserstein distance is defined for every $p \geq 1$ , in this paper, we will chose either $p=1$ or $p=2$ , as stated later on. The reason is that for $W_{1}$ , we can explicitly compute distance bounds, while for $W_{2}$ , the space $\mathscr {P}_{2}(X)$ inherits geometric properties of the space X. The choice is always indicated throughout the text.

We finish this section recalling two well-known results for later use. The first one is regarding measurable and continuous functions, the Lusin theorem in a specific form. The other one gives us a ‘recipe’ to glue different couplings.

Theorem 2.4. [Reference FedererFed69, Theorem 2.3.5]

Let M be a locally compact metric space and N a separable metric space. Consider $\mu $ a Borel measure on M, $A \subset M$ a measurable set with finite measure and $f : M \to N$ a measurable map. Then, for each $\delta> 0$ , there is a closed set $K \subset A$ , with $\mu (A \backslash K) < \delta $ , such that the restriction of f to K is continuous.

Lemma 2.5. (Gluing lemma) [Reference VillaniVil09, Ch. 1]

Let $(X_{i}, \mu _{i})$ , $i = 1, 2, 3$ , be complete separable metric probability spaces. If $(\mathcal {X}_1,\mathcal {X}_2)$ is a coupling of $(\mu _{1}, \mu _{2})$ and $(\mathcal {Y}_2, \mathcal {Y}_3)$ is a coupling of $(\mu _{2}, \mu _{3})$ , then one can construct a triple of random variables $(\mathcal {Z}_{1}, \mathcal {Z}_{2}, \mathcal {Z}_{3})$ such that $(\mathcal {Z}_{1}, \mathcal {Z}_{2})$ has the same law as $(\mathcal {X}_1,\mathcal {X}_2)$ , and $(\mathcal {Z}_{2}, \mathcal {Z}_{3})$ has the same law as $(\mathcal {Y}_2, \mathcal {Y}_3)$ . If $\mu _{12}$ stands for the law of $(\mathcal {X}_1,\mathcal {X}_2)$ on $X_1 \times X_2$ and $\mu _{23}$ stands for the law of $(\mathcal {X}_2,\mathcal {X}_3)$ on $X_2 \times X_3$ , then to construct the joint law $\mu _{123}$ of $(\mathcal {Z}_{1}, \mathcal {Z}_{2}, \mathcal {Z}_{3})$ , one just has to glue $\mu _{12}$ and $\mu _{23}$ along their common marginal $\mu _{2}$ .

3. Disintegration of measures

To grasp some of the properties of the probability spaces while studying disintegration of measures, we would like to associate the latter with the optimal transport theory. Before doing so, we prove the following disintegration theorem. Our proof is based on the idea of choosing a dense subset of a vector space using separability. Then, we extend a tailor-made linear functional to the whole space, which is implicitly used in the proof of [Reference Dellacherie and MeyerDM78, III-70], although our conditions are different.

Theorem A. Let X and Y be locally compact and separable metric spaces. Let $\pi{\kern-1pt} :{\kern-1pt} X {\kern-1pt}\to{\kern-1pt} Y$ be a Borel map and take $\mu \in \mathcal {M}_{+}(X)$ , where $\mathcal {M}_{+}(X)$ is the set of all positive and finite Radon measures on X. Define $\nu = \pi _{*}\mu $ in $\mathcal {M}_{+}(Y)$ . Then, there exist measures $\mu _{y} \in \mathcal {M}_{+}(X)$ such that:

(1) $y \mapsto \mu _{y}$ is a Borel map and $\mu _{y} \in \mathscr {P}(X)$ for $\nu $ -a.e. $y \in Y$ ;
(2) $\mu = \nu \otimes \mu _{y}$ , that is, $\mu (A)= \int _{Y} \mu _{y} (A) ~d\nu (y)$ for every $A \in \mathcal {B}(X)$ ;
(3) $\mu _{y}$ is concentrated on $\pi ^{-1}(y)$ for $\nu $ -a.e. $y \in Y$ .

Proof. We shall first consider the disintegration of measures on compact metric spaces. Then, we tackle the general case as it is stated.

Step 1: To get started, consider X to be a compact metric space with its Borel $\sigma $ -algebra $\mathcal {B}(X)$ and let $\mu $ be a Radon measure on X. If $(Y, \mathcal {E})$ is a measurable space, we define a measurable map $q: \mathcal {B}(X) \to \mathcal {E}$ so that we set $\nu =q_{*}\mu $ . Let $C(X)$ be the set of all continuous real functions $\omega :X \to \mathbb {R}$ . Then, for each $\omega \in C(X)$ , we associate a measure $\unicode{x3bb} $ given by

$$ \begin{align*} \unicode{x3bb}(A)= \int\limits_{q^{-1}(A)} \omega (x) ~d\mu (x) \end{align*} $$

for every $A \in \mathcal {E}$ . The measure $\unicode{x3bb} $ is absolutely continuous w.r.t. $\nu $ . Indeed, for every $A \in \mathcal {E}$ , we have that $\nu (A) = \mu (q^{-1}(A)) = 0$ implies $\unicode{x3bb} (A)=0$ . Therefore, since $\nu $ and $\unicode{x3bb} $ are positive measures and $\unicode{x3bb} \ll \nu $ , by the Radon–Nikodym theorem, there exists $h: Y \to [0, \infty ]$ , the density of $\unicode{x3bb} $ w.r.t. $\nu $ , such that $\unicode{x3bb} (A) = \int _{A} h ~d\nu $ for $A \in \mathcal {E}$ . Thus,

(2)

$$ \begin{align} \int\limits_{A} h(y) ~d\nu(y) = \int\limits_{q^{-1}(A)} \omega(x) ~d\mu(x). \end{align} $$

Recall that $C(X)$ is a separable space with the supremum norm. Let $\mathcal {H} = \{ \omega _1 \equiv 1, \omega _2, \omega _3, \ldots \}$ be a dense subset of $C(X)$ . Suppose, without loss of generality, that $\mathcal {H}$ is a vector space over $\mathbb {Q}$ . Then, for each $n \in \mathbb {N}$ , we consider the Radon–Nikodym density $h_{n}$ associated with $\omega _n$ given by equation (2) so that for each $A \in \mathcal {E}$ ,

(3)

$$ \begin{align} \int\limits_{A} h_{n}(y) ~d\nu(y) = \int\limits_{q^{-1}(A)} \omega_n(x) ~d\mu(x). \end{align} $$

Note that $h_{n} \geq 0$ almost always (a.a.), if $\omega _n \geq 0$ , since $\nu =q_{*}\mu $ .

Step 2: We will use the associated densities to construct a linear functional which will be extended using the Hahn–Banach theorem. To do so, we denote by $\mathbb {A}$ the set of all $y \in Y$ , such that, if $\omega _i= \alpha \omega _j + \beta \omega _k$ , then we have for their associated densities that the relation $h_{i}(y) = \alpha h_{j}(y) + \beta h_{k}(y)$ holds true, where $\alpha , \beta \in \mathbb {Q}$ and the associated density to $\omega _{1}$ is set to $h_{1}(y) = 1$ . The set $\mathbb {A}$ is measurable and $\nu (\mathbb {A})=1$ . Indeed,

$$ \begin{align*} \int_{q^{-1}(Y)} \omega_i(x) ~d\mu &= \int_{q^{-1}(Y)} (\alpha \omega_j + \beta \omega_k)(x)\,d\mu \\&= \alpha \bigg(\! \int_{q^{-1}(Y)}\! \omega_j(x) \,d\mu \!\bigg) + \beta \bigg(\! \int_{q^{-1}(Y)} \!\omega_k(x) ~d\mu \!\bigg),\! \text{ which by equation }(3),\\&= \alpha \bigg( \int_Y h_{j}(y) ~d\nu \bigg) + \beta \bigg( \int_Y h_{k}(y) ~d\nu \bigg) \\&= \int_Y \alpha h_{j}(y) + \beta h_{k}(y) ~d\nu, \end{align*} $$

so $ \int h_{i}(y)\,d\nu = \int \alpha h_{j}(y) + \beta h_{k}(y)\,d\nu $ . Consequently, $h_{i}(y) = \alpha h_{j}(y) + \beta h_{k}(y) \nu $ -a.a. Therefore, whenever $\omega _{n}$ is a linear combination of elements of $\mathcal {H}$ , then the associated densities defined by equation (3) can also be written point-wise as linear combinations of Radon–Nikodym densities.

Step 3: For each $y \in \mathbb {A}$ , we define the functional $\tilde {\varphi _y}: \mathcal {H} \to \mathbb {R}$ , given by $\tilde {\varphi _y}(\omega _n):=h_{n}(y)$ . Note that $\tilde {\varphi _y}: \mathcal {H} \to \mathbb {R}$ is $\mathbb {Q}$ -linear with $\| \tilde {\varphi _y} \| \leq 1$ and, since $\tilde {\varphi _y} (1)=1$ , we have actually that $\| \tilde {\varphi _y} \| = 1$ . Therefore, by the Hahn–Banach theorem, $\tilde {\varphi _y}$ can be extended to a continuous positive linear functional $\varphi _y: C(X) \to \mathbb {R}$ , with $\| \varphi _y \| = 1$ . Furthermore, the Riesz–Markov–Kakutani representation theorem assures that there exists a unique Radon measure $\mu _y$ on X such that $\varphi _y(\omega )= \int \omega ~d \mu _y$ for every $\omega \in C(X)$ and $\varphi _y (1)=1$ . Note that $\int \omega ~d \mu _y \leq 1$ , since $\| \varphi _y \| = 1$ . Thus, we conclude that $\mu _y$ is a probability measure. Note also that $\mu _y$ is supported on $q^{-1} \{ y \in \mathbb {A}\}$ . For $y \notin \mathbb {A}$ , consider $\mu _y = 0$ .

Step 4: Observe that $y \mapsto \int _{X} \omega _n\,d\mu _y$ is $\mathcal {E}$ -measurable for every $\omega _n \in \mathcal {H}$ and

$$ \begin{align*} \int_{Y} \int_{q^{-1}(y)} \omega_n(x) \,d\mu_y ~d\nu = \int_{Y} \varphi_y(\omega_n) \,d\nu = \int_{Y} h_{n}(y)\,d\nu = \int_{X} \omega_n(x)\,d \mu. \end{align*} $$

By the definition of $\mathcal {H}$ , we have that for each $\omega \in C(X)$ , there exists a sequence $(\omega _{i})_{i}$ , with $\omega _i \to \omega $ uniformly. So, by uniform convergence, we have that $y \mapsto \int _{X} \omega ~d\mu _y$ is $\mathcal {E}$ -measurable for every $\omega \in C(X)$ . Furthermore,

$$ \begin{align*} \int_{Y} \int_{q^{-1}(y)} \omega(x)\,d\mu_y \,d\nu = \int_{X} \omega(x)\,d \mu. \end{align*} $$

The same holds true for any bounded and $\mathcal {B}(X)$ -measurable function $\omega $ . Indeed, denote by $\mathscr {C}$ the class of functions such that $y \mapsto \int _{X} \omega \,d\mu _y$ is $\mathcal {E}$ -measurable and $\int _{Y} \int _{q^{-1}(y)} \omega (x)\,d\mu _y \,d\nu = \int _{X} \omega (x)\,d \mu $ . Note that $C(X) \subset \mathscr {C}$ , from what was shown before. If $A \subset X$ is an open set, then . So, let $\mathcal {D}$ be the set of all characteristic functions in $\mathscr {C}$ . If for $n \in \mathbb {N}$ , we have that . If , we have that . Thus, the class of measurable sets whose characteristic functions are in $\mathcal {D}$ is $\mathcal {B}(X)$ . Therefore, the result follows by monotone convergence. This completes the proof of Theorem A for compact spaces.

Step 5: Let X be a locally compact and separable metric space. Then, the set of continuous real-valued functions with compact support on X, denoted by $C_c(X)$ , is a vector space. Such a vector space can be seen as the union of the spaces $C_c(K_{i})$ of continuous functions with support on compact sets $K_{i}$ . Since $\mu $ is a Radon measure on X, the map $\varphi : C_c(X) \to \mathbb {R}$ , such that, $\omega \mapsto \int _{X} \omega (x)\,d\mu $ , is a continuous positive linear map. Note also that $\mu $ is supported on a set $\tilde {\mathcal {K}}$ , which is a countable union of compact subsets $K_{i} \subset X$ . So, we can imbed X into a compact metric space $\mathcal {K}$ and identify $\mu $ with a measure on $\mathcal {K}$ with support on $\tilde {\mathcal {K}}$ , and construct the measures $\mu _y$ as above. Consider $\mu _{y}=0$ for y such that $\pi ^{-1}(y) \notin \tilde {\mathcal {K}}$ . Thus, there exist probability measures $\mu _{y}$ on X such that each $\mu _{y}$ is supported on $\pi ^{-1}(y)$ and, for every $\omega \in C_c(X)$ ,

(4)

$$ \begin{align} \int_{Y} \int_{\pi^{-1}(y)} \omega(x)\,d\mu_y \,d\nu = \int_{X} \omega(x)\,d \mu. \end{align} $$

Since Y is a locally compact and separable metric space and $\pi : X \to Y$ is a Borel map, then $y \mapsto \mu _y$ is a Borel map for $\nu $ -a.e. $y \in Y$ . Furthermore, note that equation (4) is equivalent to say that $\mu (A)= \int _{Y} \mu _{y} (A)\,d\nu (y)$ for every $A \in \mathcal {B}(X)$ and $\mu _{y}$ is concentrated on $\pi ^{-1}(y)$ for $\nu $ -a.e. $y \in Y$ . This concludes the proof.

Many interesting examples arise when we consider Theorem A for the case of product spaces with the Borel map $\pi $ as the canonical projection on the first component, as follows.

Corollary 3.1. Let X and Y be locally compact and separable metric spaces. Let $\mathrm {proj}_{X}: X \times Y \to X $ be the canonical projection on the first component, take $\gamma \in \mathcal {M}_{+}(X \times Y)$ and set $\mu = {\mathrm {proj}_{X}}_{*}\gamma \in \mathcal {M}_{+}(X)$ . Then, there exist measures $\gamma _{x} \in \mathcal {M}_{+}(X \times Y)$ such that:

(1) $x \mapsto \gamma _{x}$ is a Borel map and $\gamma _{x} \in \mathscr {P}(X \times Y)$ for $\mu $ -a.e. $x \in X$ ;
(2) $\gamma = \mu \otimes \gamma _{x}$ , i.e. $\gamma (A)= \int _{X} \gamma _{x} (A) ~d\mu (x)$ for every $A \in \mathcal {B}(X \times Y)$ ;
(3) $\gamma _{x}$ is concentrated on $\mathrm {proj}_{X}^{-1} (x)$ for $\mu $ -a.e. $x \in X$ .

In fact, since $\gamma _{x}$ is concentrated on $\mathrm {proj}_{X}^{-1}(x)=\{x\} \times Y$ , we can consider each $\gamma _{x}$ as a measure on Y, writing $\gamma (B)=\int _{X} \gamma _{x}(\{y : (x, y) \in B \}) ~d\mu $ for every $B \in \mathcal {B}(X \times Y)$ , adding a different point of view to disintegration. The following example illustrates this case of disintegration of measures.

Example 3.2. Consider a solid torus $S^{1} \times D^{2}$ . Let $\mathcal {F}^{s} = \{\{x\} \times D^{2} \}_{x \in S^{1}}$ be a foliation of $S^{1} \times D^{2}$ , as represented in Figure 1. Given a measure $\gamma \in \mathscr {P}(S^{1} \times D^{2})$ , let $\mathrm {proj}_{S^1}: S^{1} \times D^{2} \to S^{1}$ be the canonical projection on the first component and set $\mu ={\mathrm {proj}_{S^1}}_{*}\gamma $ . Theorem A, with $\pi := \mathrm {proj}_{S^1}$ , gives us a disintegration $\{ \gamma _{x} : x \in S^{1} \}$ of $\gamma $ along the leaves. Since the measures $\gamma _{x}$ are concentrated on ${\mathrm {proj}}_{S^1}^{-1}(x)=\{x\} \times D^2$ for $\mu $ -a.e. $x \in S^{1}$ , we can consider each $\gamma _{x}$ as a measure on $D^{2}$ . That is, we can define a probability on $D^2$ for each $x \in S^1$ .

Figure 1 Representation of $\mathcal {F}^{s} = \{\{x\} \times D^{2} \}_{x \in S^{1}}$ .

The point of view from Corollary 3.1 is, somehow, a generalization of cases of disintegration of a probability measure along leaves in a foliated compact Riemannian manifold, as in Example 3.2. In this sense, we remark that different versions of disintegration theorems can be related by taking suitable hypotheses, as we do in the following example.

Example 3.3. Let $M_1$ and $M_2$ be compact Riemannian manifolds and set the product space $\Sigma = M_{1} \times M_{2}$ . Let $\mathcal {F}^{s}=\{ \{x\} \times M_{2} \}_{x \in M_{1}}$ be a foliation of $\Sigma $ . Given $\gamma \in \mathscr {P}(M_1 \times M_2)$ , let ${\mathrm {proj}}_{M_{1}} : \Sigma \to M_{1}$ be the canonical projection on $M_{1}$ and set $\mu ={{\mathrm {proj}}_{M_{1}}}_{*}\gamma $ . By Theorem A, there exists a family $\{ \gamma _{x} : x \in M_{1} \} \subset \mathscr {P}(M_{1} \times M_{2})$ such that:

(1) $x \mapsto \gamma _{x}$ is a Borel map and $\gamma _{x} \in \mathscr {P}(M_{1} \times M_{2})$ for $\mu $ -a.e. $x \in M_{1}$ ;
(2) $\gamma (A)= \int _{M_{1}} \gamma _{x} (A)\,d\mu (x)$ for every $A \in \mathcal {B}(M_{1} \times M_{2})$ ;
(3) $\gamma _{x}$ is concentrated on $\mathrm {proj}_{M_{1}}^{-1} (x)$ for $\mu $ -a.e. $x \in M_{1}$ .

Furthermore, we can consider each $\gamma _{x}$ as a probability on $M_{2}$ , as stated above. Note that this result agrees with Rokhlin’s disintegration theorem. In fact, let $\varrho : \Sigma \to \mathcal {F}^{s}$ be a map that associates each point $(x, y) \in \Sigma $ to the $\zeta $ element of $\mathcal {F}^{s}$ that contains $(x, y)$ . Consider $\hat {\gamma }=\varrho _{*}\gamma $ . Note that $\mathcal {F}^{s}$ is a measurable partition of $\Sigma $ and $\Sigma $ is a complete separable metric space. So, Rokhlin’s disintegration theorem describes a disintegration of $\gamma $ relative to $\mathcal {F}^{s}$ by a family $\{ \gamma _{\zeta } : \zeta \in \mathcal {F}^{s} \}$ , such that, for $E \subset \Sigma $ measurable set:

(1) $\gamma _{\zeta }(\zeta )=1$ for $\hat {\gamma }$ -a.e. $\zeta \in \mathcal {F}^{s}$ ;
(2) $\zeta \mapsto \gamma _{\zeta }(E)$ is measurable;
(3) $\gamma (E)=\int \gamma _{\zeta }(E) \,d\hat {\gamma }(\zeta )$ .

Rewrite $\mathcal {F}^{s}$ by $\{ \zeta _{x} \}_{x \in M_{1}}$ , where $\zeta _{x}=\{ x \} \times M_{2}$ for each $x \in M_{1}$ , and consider $\gamma _{x}'={{\mathrm {proj}}_{M_1}}_{*}\gamma $ . For each $x \in M_{1}$ , let $\gamma _{\zeta _{x}}$ be the restriction of $\gamma _{\zeta }$ to $\zeta = \{ x \} \times M_{2}$ . Note that $\varrho ^{-1}(\zeta )= \{ x \} \times M_{2} = \zeta _{x} = {{\mathrm {proj}}_{M_1}}^{-1}(x)$ , so

$$ \begin{align*} \int \gamma_{\zeta}(E) d\hat{\gamma}(\zeta) &= \int \gamma_{\zeta}(E) \gamma(\varrho^{-1}\,(d\zeta)) \\ &= \int_{M{1}} \gamma_{\zeta_{x}}(E \cap \zeta_{x}) \gamma({{\mathrm {proj}}_{M_1}}^{-1}\,(dx)) \\ &= \int_{M{1}} \gamma_{\zeta_{x}}(E \cap \zeta_{x})\,d \gamma_{x}' \end{align*} $$

and then

$$ \begin{align*} \gamma(E)=\int_{M{1}} \gamma_{\zeta_{x}}(E \cap \zeta_{x})\,d \gamma_{x}'. \end{align*} $$

Moreover, note that $\gamma _{\zeta _{x}}$ is supported in ${{\mathrm {proj}}}_{M_1}^{-1}(x) = \zeta _{x}$ . Hence, we have a disintegration $\{\gamma _{\zeta _{x}} : \zeta _{x} \in \mathcal {F}^{s} \}$ along the leaves of $\Sigma $ associated to $\gamma _{x}'$ . In addition, it is possible to identify ${{\mathrm {proj}}}_{M_1}^{-1}(x)$ with $M_{2}$ , and write this disintegration by $\{\gamma _{x} : x \in M_1 \} \subset \mathscr {P}(M_{2})$ , as desired.

Such an example is one of the possible roles of disintegration of measures. In dynamics, for instance, this kind of disintegration appears in several contexts. To name a few, in [Reference Butterley and MelbourneBM17], the regularity of this kind of disintegration is investigated considering invariant measures for hyperbolic skew products. Specifically, for this purpose, a function that associates each x in X with a probability measure $\gamma _x$ obtained via a disintegration of $\gamma $ is analysed. In the next section, we will see that this type of application can be thought in a more general framework and it has important properties. We can also cite [Reference GalatoloGal17, Reference Galatolo and LucenaGL20], where the disintegration of Example 3.2 is used to study the behaviour of the transfer operator in a solenoidal map.

We can actually obtain uniqueness and absolute continuity of the disintegration in the context of Theorem A. If $\unicode{x3bb} \in \mathcal {M}_{+}(Y)$ is another measure, such that there exists a Borel map $y \mapsto \eta _{y}$ for which $\mu = \unicode{x3bb} \otimes \eta _{y}$ , with $\eta _{y}$ concentrated on $\pi ^{-1}(y)$ for $\mu $ -a.e. $y \in Y$ , then $\unicode{x3bb} |_{C} \ll \nu $ , where $C=\{ y \in Y : \eta _{y}(X)>0 \}$ and $\eta _{y} \ll \mu _{y}$ for $\nu $ -a.e. $y \in Y$ . See the details in [Reference Ambrosio and PratelliAP03]. In the following proposition, we focus on the case of product spaces. Taking $\gamma \in \mathcal {M}_{+}(X \times Y)$ with $\mu ={\mathrm {proj}_{X}}_{*}\gamma $ , we obtain uniqueness of $\gamma _{x}$ and $\mu $ in $\gamma =\mu \otimes \gamma _{x}$ .

Proposition 3.4. Let $X \times Y$ , X be locally compact and separable metric spaces. Let $\mathrm {proj}_{X}: X \times Y \to X$ be the projection on the first component and consider $\gamma \in \mathcal {M}_{+}(X \times Y)$ , $\nu \in \mathcal {M}_{+}(X)$ and $\mu ={\mathrm {proj}_{X}}_{*}\gamma $ . Let $x \mapsto \eta _x$ be a Borel $\mathcal {M}_{+}$ -valued map defined on X such that:

(1) $\gamma = \nu \otimes \eta _x$ ;
(2) $\eta _x$ is concentrated on $\mathrm {proj}_{X}^{-1}(x)$ for $\nu $ -a.e. $x \in X$ .

Then, the $\eta _{x}$ are uniquely defined $\nu $ -a.a. by conditions (1) and (2). Moreover, for $C=\{ x \in X: \eta _x(X \times Y)>0 \}$ , $\nu |_C$ is absolutely continuous w.r.t. $\mu $ . In particular, $({\nu |_C}/{\mu }) \eta _x = \gamma _{x}$ for $\mu $ -a.e. $x \in X$ , where $\gamma _{x}$ are the conditional probabilities as in Corollary 3.1.

Proof. Let $\eta _{x}$ and $\eta _{x}'$ be measures satisfying conditions (1) and (2). Let $(A_{n})_{n}$ be a sequence of open sets such that the finite intersection is also an open set which generates $\mathcal {B}(X \times Y)$ , the Borel $\sigma $ -algebra of $X \times Y$ . Consider $B \in \mathcal {B}(X \times Y)$ and $A=A_n \cap ~{\mathrm {proj}}_{X}^{-1}(B)$ for any $n \in \mathbb {N}$ . By condition (1), we have that

$$ \begin{align*} \gamma(A)=\int_{X} \eta_{x}(A)\,d\nu = \int_{B} \eta_{x}(A_n)\,d \nu \end{align*} $$

and

$$ \begin{align*} \gamma(A)=\int_{X} \eta_{x}' (A)\,d\nu = \int_{B} \eta_{x}' (A_n)\,d \nu. \end{align*} $$

Therefore,

$$ \begin{align*} \int_{B} \eta_{x}(A_n)\,d \nu = \int_{B} \eta_{x}' (A_n)\,d \nu. \end{align*} $$

Given that B is arbitrary, then $\eta _{x}(A_n) = \eta _{x}'(A_n)$ for $\nu $ -a.e. $x \in X$ and for any $n \in \mathbb {N}$ . So, there exists a set $N \subset X$ , with $\nu (N)=0$ such that $\eta _{x}(A_n) = \eta _{x}'(A_n)$ for any $n \in \mathbb {N}$ , $x \in (X-N)$ . Hence, for $\nu $ -a.e. $x \in X$ , $\eta _{x} = \eta _{x}'$ .

Let us denote $C=\{ x \in X : \eta _x(X \times Y)>0 \}$ . Let $\mathcal {G} \subset C$ be such that $\mu (\mathcal {G})=0$ . So, ${\mathrm {proj}}_{X}^{-1}(\mathcal {G})$ is such that $\gamma ({\mathrm {proj}}_{X}^{-1}(\mathcal {G}))=0$ . Therefore, condition (2) implies

$$ \begin{align*} 0=\int_{X} \eta_{x}({\mathrm {proj}}_{X}^{-1}(\mathcal{G}))\,d\nu = \int_{\mathcal{G}} \eta_{x} (X \times Y)\,d\nu. \end{align*} $$

Since $\eta _{x} (X \times Y)>0$ in $C \supset \mathcal {G}$ , we have that $\nu (\mathcal {G})=0$ . That is, $\nu |_{C} \ll \mu $ . Moreover, writing $\nu |_{C}=f \mu $ implies that $\gamma = f \mu \otimes \eta _{x}$ . However, by Corollary 3.1, $\gamma = \mu \otimes \gamma _{x}$ and then $({\nu |_{C}}/{\mu }) \eta _{x} = \gamma _{x}$ .

4. Disintegration maps

From the optimal transport perspective, Theorem A in fact deals with the disintegration of transport plans. In this sense, it is possible to define a function from Y to $\mathscr {P}(X)$ with certain properties which actually establishes the link between disintegration of measures and the geometric properties of the measure spaces. We will call such an object the disintegration map.

Definition 4.1. (Disintegration map)

Let X and Y be locally compact and separable metric spaces. Consider a measure $\mu \in \mathcal {M}_{+}(X)$ , a Borel map $\pi : X \to Y$ and a disintegration of $\mu $ given by Theorem A, so that $\mu = \nu \otimes \mu _y$ . We define the disintegration map:

$$ \begin{align*} f: Y &\to (\mathscr{P}(X), W_{p}) \\ y & \mapsto \mu_{y}, \end{align*} $$

such that $\mu = \nu \otimes f(y)$ , where $W_{p}$ is the p-Wasserstein distance.

Remark 4.2. To clarify which measures are associated with the disintegration map, we will say that ‘f is a disintegration map of $\mu $ w.r.t. $\nu $ ’.

Although we may define such a map to a general $W_{p}$ , our main interest will be when either $p=1$ or $p=2$ . On the one hand, for $p=1$ , there is an explicit formula for the Wasserstein distance which we can apply to study the disintegration of measures in product spaces. On the other hand, when $p=2$ , the theory of optimal transport allows for a geometric characterization of $\mathscr {P}_{2}(X)$ in terms of the geometric properties of X. More precisely, the study of geodesics defined in $\mathscr {P}_{2}(X)$ and the convexity properties of certain functionals along these geodesics play a crucial role in the metric theory of gradient flows, which allows us to infer geometric properties of X itself. We shall address the case when $p=2$ in the next section. Before that, we use the disintegration maps to further characterize the disintegration of measures in product spaces. In this section, consider the following definition of a disintegration map.

Definition 4.3. (Disintegration map—product spaces)

Let X and Y be locally compact and separable metric spaces. Consider $\gamma \in \mathscr {P}(X \times Y)$ and a disintegration of $\gamma $ given by Corollary 3.1, so that $\gamma = \mu \otimes \gamma _{x}$ . In this case, the disintegration map reads as

$$ \begin{align*} f: X &\to (\mathscr{P}(Y), W_{1}) \\ x & \mapsto \gamma_{x} \end{align*} $$

such that $\gamma = \mu \otimes f(x)$ .

In the lines of [Reference Ambrosio and PratelliAP03, Reference Granieri and MaddalenaGM13], we start by showing the following.

Proposition 4.4. A map $f: X \to (\mathscr {P}(Y), W_{1})$ is a disintegration map if and only if it is Borel.

Proof. Denote by $\text {Lip}_{1}(Y)$ the set of Lipschitz functions whose Lipschitz constants are less than or equal to $1$ . By the Arzelà–Ascoli theorem, the space $\text {Lip}_{1}(Y)$ is compact with respect to the uniform convergence [Reference Ambrosio, Gigli and SavaréAGS05, Proposition 3.3.1]. Let $D \subset \text {Lip}_{1}(Y)$ be a countable dense subset and take $\varphi \in \text {Lip}_{1}(Y)$ . If the measures $\nu _{1}$ , $\nu _{2} \in \mathscr {P}(Y)$ have bounded support, we can use the duality formula to obtain

$$ \begin{align*} W_{1}(\nu_{1}, \nu_{2}) = \sup\limits_{\varphi \in Lip_{1}(Y)} \bigg\{\! \int_{Y} \varphi ~d(\nu_{1} -\nu_{2}) \bigg\} = \sup\limits_{\varphi \in D} \int_{Y} \varphi \,d(\nu_{1}-\nu_{2}); \end{align*} $$

see [Reference Ambrosio, Gigli and SavaréAGS05, §7.1] for details. For all $\varphi \in \text {Lip}_{1}(Y)$ ,

$$ \begin{align*} \psi_{\varphi}: X &\to \mathbb{R} \\ x &\mapsto \int_{Y} \varphi \,d(\nu-f(x)) \end{align*} $$

is Borel. For f to be a Borel map, it suffices that

$$ \begin{align*} f^{-1}(B(\nu, r))= \bigcap\limits_{\varphi \in D} \psi_{\varphi}^{-1}((-r, r)):=A, \end{align*} $$

where $B(\nu , r)$ is an open ball of radius r centred at $\nu $ in $(\mathscr {P}(Y), W_{1})$ . In fact, if $x \in A$ , then $|\psi _{\varphi }(x)|<r$ for all $\varphi \in D$ and $W_{1}(\nu , f{(x)})<r$ (by the definition of $\psi _{\varphi }$ ), so that $f{(x)} \in B(\nu , r)$ . In the same way, if $f{(x)} \in B(\nu , r)$ , then $W_{1}(\nu , f{(x)})<r$ and, by the duality formula,

$$ \begin{align*} |\psi_{\varphi}(x)|:=\Big|\int_{Y} \varphi\,d(\nu-f{(x)})\Big|<r \end{align*} $$

for every $\varphi \in D$ , so that $x \in A$ . Thus, one way is proven. Conversely, let $A \subset Y$ be an open subset. Note that

Let $I_{\varphi }$ be a function given by

(5)

$$ \begin{align} I_{\varphi}: (\mathscr{P}(Y), W_{1}) & \to \mathbb{R} \nonumber \\ \unicode{x3bb} & \mapsto \int_{Y} \varphi(y) \,d\unicode{x3bb}, \end{align} $$

where $\varphi $ is a lower semicontinuous function over Y. Since $W_{1}$ metrizes the weak* topology of $\mathscr {P}(Y)$ [Reference Ambrosio, Gigli and SavaréAGS05, Ch. 7], the function $I_{\varphi }$ is lower semicontinuous. For every $x \in X$ , one obtains $\int _{Y} \varphi (y) ~df{(x)}=I_{\varphi }(f(x))$ . By assumption, f is Borel and then $f(\cdot )(A):X \to \mathbb {R}$ is a composition of a lower semicontinuous function and a Borel map, so it is a Borel map. Therefore, f is a disintegration map, which concludes the proof.

The disintegration map in fact can be written in terms of the Monge problem. Given a transport map $T:X \to Y$ and the measures $\mu \in \mathscr {P}(X)$ and $\nu = T_{*}\mu \in \mathscr {P}(Y)$ , the disintegration map is given by $x \mapsto \delta _{T(x)}$ , where $\delta _{T(x)}$ is a Dirac measure at $T(x)$ . It is possible to show that there exists a relationship among measures in $(\mathscr {P}(Y), W_{1})$ , via push forward of $\mu $ by disintegration maps, and the second marginal of transport plans induced by distinct transport maps.

Lemma 4.5. Let $T:X \to Y$ and $S: X \to Y$ be transport maps. Consider a measure $\mu \in \mathscr {P}(X)$ , and the applications f and g given by

$$ \begin{align*} f: X &\to (\mathscr{P}(Y), W_{1})\\[-2pt]x &\mapsto \delta_{T(x)}, \end{align*} $$

$$ \begin{align*} g: X &\to (\mathscr{P}(Y), W_{1}) \\[-2pt] x &\mapsto \delta_{S(x)}. \end{align*} $$

Then, $T_{*}\mu =S_{*}\mu $ if and only if $f_{*}\mu =g_{*}\mu $ .

Proof. Define $\varphi (y)=\psi (\delta _{y})$ , where $\psi \in C((\mathscr {P}(Y), W_{1}))$ is arbitrarily chosen. Then,

$$ \begin{align*} \int_{Y} \psi d(f_{*}\mu) &= \int_{X} \psi(f(x)) \,d\mu \\ &= \int_{X} \psi (\delta_{T(x)})\,d\mu \\ &= \int_{X} \varphi(T(x))\,d\mu \\ &= \int_{X} \varphi(S(x))\,d\mu \\ &= \int_{Y} \psi\,d(g_{*}\mu). \end{align*} $$

Given the arbitrary choice of $\psi $ , it follows that $f_{*}\mu =g_{*}\mu $ . Conversely, consider $\varphi \in C(Y)$ and the application $I_{\varphi }$ defined by equation (5). Note that

$$ \begin{align*} \int_{\mathscr{P}(Y)} I_{\varphi}(\unicode{x3bb}) \,d(f_{*}\mu)=\int_{\mathscr{P}(Y)} I_{\varphi}(\unicode{x3bb})\,d(g_{*}\mu) \end{align*} $$

if and only if

$$ \begin{align*} \int_{X} I_{\varphi}(f{(x)})\,d\mu = \int_{X} I_{\varphi}(g{(x)}) \,d\mu, \end{align*} $$

which in turn occurs if and only if

$$ \begin{align*} \int_{X} \bigg( \int_{Y} \varphi (y)\,df{(x)} \bigg)\,d\mu = \int_{X} \bigg( \int_{Y} \varphi (y) \,dg{(x)} \bigg) \,d\mu. \end{align*} $$

Since

$$ \begin{align*} \int_{Y} \varphi (y)\,df{(x)}=\varphi (T(x)) \end{align*} $$

and

$$ \begin{align*} \int_{Y} \varphi (y)\,dg{(x)}=\varphi (S(x)), \end{align*} $$

the last equation can be written as

$$ \begin{align*} \int_{X} \varphi (T(x))\,d\mu = \int_{X} \varphi (S(x))\,d\mu. \end{align*} $$

Due to the arbitrary choice of $\varphi $ , it follows that $T_{*}\mu =S_{*}\mu $ .

Corollary 4.6. Let $T:X \to Y$ and $S: X \to Y$ be transport maps. Consider a measure $\mu \in \mathscr {P}(X)$ , the disintegration maps f and g defined as in Lemma 4.5, and the measures $\gamma =\mu \otimes f(x)$ and $\eta =\mu \otimes g(x)$ . Then, $f_{*}\mu =g_{*}\mu $ if and only if ${\mathrm {proj}_{Y}}_{*}\gamma = {\mathrm {proj}_{Y}}_{*} \eta $ .

Given $\gamma , \eta \in \Pi (\mu , \nu )$ , as defined in §2, with $\gamma = \mu \otimes f{(x)}$ and $\eta =\mu \otimes g{(x)}$ , we say that $\gamma $ is equivalent by disintegration to $\eta $ (and we denote $\gamma \approx \eta $ ) if $f_{*}\mu =g_{*}\mu $ . With this equivalence in mind, it is possible to define an equivalent class among the transport plans as follows.

Definition 4.7. Given $\gamma \in \Pi (\mu , \nu )$ with $\gamma =\mu \otimes f(x)$ , the transport class of $\gamma $ is defined as the equivalence class $[\gamma ]=\{ \eta =\mu \otimes g(x):g_{*}\mu =f_{*}\mu \}$ .

Thus, we have that all transport plans induced by transport maps belong to the same transport class. Moreover, in the next proposition, we prove that it is possible to use equivalence by disintegration to assure the existence of a transport map.

Proposition 4.8. Consider a transport map $T:X \to Y$ such that $T_{*}\mu =\nu $ and $\gamma =\mu \otimes \delta _{T(x)}$ for a non-atomic measure $\mu \in \mathscr {P}(X)$ . If $\eta \in [\gamma ]$ , then there exists a transport map $S: X \to Y$ such that $\eta = \mu \otimes \delta _{S(x)}$ .

Proof. By the approximation theorem [Reference AmbrosioAmb00, Theorem 9.3] and by the definition of equivalent by disintegration, there exist a sequence of Borel functions $S_{n}: X \to Y$ such that $\eta = \lim \limits _{n \to \infty } \mu \otimes \delta _{S_{n}(x)}$ and $(S_{n})_{*}\mu =\nu $ for every $n \in \mathbb {N}$ . So $\mu \otimes \delta _{S_{n}(x)} \in [\gamma ]$ for every $n \in \mathbb {N}$ . Consider $\varphi (y)=|y|^2$ and observe that

$$ \begin{align*} \int_{X} |S_{n}(x)|^2\,d\mu = \int_{X} \varphi (S_{n}(x))\,d\mu = \int_{Y} \varphi(y)\,d\nu = \int_{Y} |y|^2\,d\nu < \infty. \end{align*} $$

Let $\psi \in C((\mathscr {P}(Y), W_{1}))$ be a function given by $\psi (\delta _{y})=|y|^2$ and let $\Delta \subset \mathscr {P}(Y)$ be the set of Dirac measures. Observe that $\psi $ is Lipschitz over $\Delta $ w.r.t. $W_1$ . So, take $\psi $ any Lipschitz extension over $\mathscr {P}(Y)$ . Since $(\delta _{S_{n}})_{*}\mu =(\delta _{T})_{*}\mu $ , we have that

$$ \begin{align*} \int_{X}|S_{n}(x)|^2\,d\mu = \int_{X} \psi(\delta_{S_{n}(x)})\,d\mu = \int_{X} \psi (\delta_{T(x)})\,d\mu = \int_{X} |T(x)|^{2} \,d\mu \end{align*} $$

for every $n \in \mathbb {N}$ . Therefore, moving on to a subsequence, we can assume that $S_{n}$ is weakly convergent to S. By [Reference AmbrosioAmb00, Lemma 9.1], $\mu \otimes \delta _{S_{n}(x)}$ converges weakly to $\mu \otimes \delta _{S(x)}$ . This concludes the proof.

In this context, the Monge problem can be interpreted as: minimize $\int _{X \times Y} c(x, y) \,d \gamma $ in a fixed transport class of $\Pi (\mu , \nu )$ , that is, obtain

$$ \begin{align*} \min \bigg\{\! \int_{X \times Y} c(x, y)\,d\gamma : \gamma \in [\mu \otimes \delta_{T}] \bigg\} \end{align*} $$

for a given transport map T. Regarding the Monge–Kantorovich problem, note that the second part of the proof of Lemma 4.5 applies to general transport plans, so that the second marginal can be fixed by the disintegration maps, as follows.

Lemma 4.9. Consider $\mu \in \mathscr {P}(X)$ , the disintegration maps $f: X \to \mathscr {P}(Y)$ and $g: X \to \mathscr {P}(Y)$ , and the transport plans $\gamma = \mu \otimes f(x)$ and $\eta =\mu \otimes g(x)$ . Then, $f_{*}\mu =g_{*}\mu $ implies ${\mathrm {proj}_{Y}}_{*}\gamma = {\mathrm {proj}_{Y}}_{*} \eta $ .

The reciprocal of Lemma 4.9, however, is not true. Consider, for instance, the following transport problem. Three factories, $x_{1}$ , $x_{2}$ and $x_{3}$ , with the same production, supply 100% of their products to two stores, $y_{1}$ and $y_{2}$ . Suppose store $y_{2}$ has a demand five times greater than store $y_{1}$ . Let $\mu $ be the measure related to the production of the factories and $\nu $ be the measure related to the amount of delivered products to the stores. These measures are given by

$$ \begin{align*} \mu=\tfrac{1}{3} \delta_{x_{1}}+\tfrac{1}{3} \delta_{x_{2}}+\tfrac{1}{3} \delta_{x_{3}}, \end{align*} $$

$$ \begin{align*} \nu=\tfrac{1}{6}\delta_{y_{1}}+\tfrac{5}{6}\delta_{y_{2}}. \end{align*} $$

Consider the transport plan illustrated in Figure 2, which divides the products which are produced in the factory $x_{1}$ between the two stores. The corresponding disintegration map f is given by $f(x_1)=\tfrac 12 \delta _{y_{1}} + \tfrac 12 \delta _{y_{2}}$ , $f(x_2)=\delta _{y_{2}}$ and $f(x_3)=\delta _{y_{2}}$ .

Figure 2 Transport plan 1.

Note that since $\mu $ is of the type $\sum \limits _{i} \alpha _{i} \delta _{x_{i}}$ , we have that $f_{*}\mu = \sum \limits _{i} \alpha _{i} \delta _{f(x_{i})}$ . Suppose, however, that there is a change in logistics and in the new transport plan, Figure 3, the factory $x_{1}$ delivers its production only to the store $y_{2}$ and the factory $x_{2}$ divides its production between the two stores. In this new situation, the disintegration map is given by $g(x_1)= \delta _{y_{2}}$ , $g(x_2)=\tfrac 12 \delta _{y_{1}} + \tfrac 12 \delta _{y_{2}}$ and $g(x_3)=\delta _{y_{2}}$ . Nevertheless, on the one hand, $f_{*}\mu =g_{*}\mu $ , i.e., the transport class does not change. On the other hand, if in this new transport plan the factories $x_{1}$ and $x_{2}$ deliver to both stores, Figure 4, in such a way that $x_{1}$ sends 30% of its production to $y_{1}$ and 70% to $y_{2}$ , and $x_{2}$ sends 20% of its production to $y_{1}$ and 80% to $y_{2}$ , the disintegration map is given by $h(x_1) = {3}/{10} \delta _{y_{1}} + {7}/{10} \delta _{y_{2}}$ , $h(x_2) = {2}/{10} \delta _{y_{1}} + {8}/{10} \delta _{y_{2}}$ and $h(x_3) = \delta _{y_{2}}$ . In this case, $f_{*}\mu \neq h_{*}\mu $ . So, the transport plans $1$ and $3$ do not belong to the same transport class.

Figure 3 Transport plan 2.

Figure 4 Transport plan 3.

Furthermore, if there is a small change in transport plan 3, so that $x_{1}$ sends 10% of its production to $y_{1}$ and 90% to $y_{2}$ , and $x_ {2}$ sends 40% of its production to $y_{1}$ and 60% to $y_{2}$ , the new disintegration map is given by $k(x_1) = {1}/{10} \delta _{y_{1}} + {9}/{10} \delta _{y_{2}}$ , $k(x_2) = {4}/{10} \delta _{y_{1}} + {6}/{10} \delta _{y_{2}}$ and $k(x_3) = \delta _{y_{2}}$ . Therefore, $h_{*}\mu \neq k_{*}\mu $ . Thus, with the previous definition for transport class, the transport class changes when either the number of factories that deliver products to more than one store is changed or even if the fraction of delivered production is changed.

Therefore, to be compatible with the Monge–Kantorovich problem, we need another definition of transport class. Take $\mu \otimes f \in \Pi (\mu , \nu )$ and $\Lambda =f_{*}\mu $ , and recall that $(\mathrm {proj}_{Y})_{*}(\mu \otimes f)=\nu $ , then for every $\varphi \in C(Y)$ ,

$$ \begin{align*} \int_{Y}\varphi(y)\,d\nu &=\int_{X} \bigg( \int_{Y} \varphi(y)\,df{(x)} \bigg)\,d\mu \\ &= \int_{X} I_{\varphi}(f{(x)})\,d\mu \\ &= \int_{\mathscr{P}(Y)}I_{\varphi}(\unicode{x3bb})\,d\Lambda(\unicode{x3bb}) \\ &= \int_{\mathscr{P}(Y)} \bigg( \int_{Y} \varphi (y)\,d\unicode{x3bb} \bigg)\,d\Lambda, \end{align*} $$

where $I_{\varphi }$ is given by equation (5) and $\unicode{x3bb} \in \mathscr {P}(Y)$ . Hence, every probability $\Lambda $ in $(\mathscr {P}(Y), W_{1})$ satisfying

(6)

$$ \begin{align} \int_{\mathscr{P}(Y)} \unicode{x3bb} \,d\Lambda=\nu \end{align} $$

defines a transport class $[\gamma ]{\kern-1pt}={\kern-1pt}\{\mu {\kern-1pt}\otimes{\kern-1pt} f {\kern-1pt}:{\kern-1pt} f_{*}\mu {\kern-1pt}={\kern-1pt}\Lambda \}$ . As an example, a transport class $[\mu {\kern-1pt}\times{\kern-1pt} \nu ]$ corresponds to the measure $\Lambda =\delta _{\nu }$ . From this point of view, the Monge–Kantorovich problem in the class $\Lambda $ can be thought as

$$ \begin{align*} MK_{\Lambda}(c, \mu, \nu)=\inf\limits_{\gamma} \bigg\{\! \int_{X \times Y} c(x, y)\,d\gamma : \gamma = \mu \otimes f, f_{*}\mu=\Lambda \bigg\}. \end{align*} $$

This notion of transport class also allows us to see the Monge–Kantorovich problem as an abstract Monge problem between the spaces X and $\mathscr {P}(Y)$ , considering the cost

$$ \begin{align*} \tilde{c}(x, \unicode{x3bb})=\int_{Y}c(x,y),d\unicode{x3bb}. \end{align*} $$

In fact, for every disintegration map $f: X \to \mathscr {P}(Y)$ , such that $f_{*}\mu =\Lambda $ ,

$$ \begin{align*} \int_{X} \tilde{c}(x, f{(x)})\,d\mu &= \int_{X} \bigg( \int_{Y}c(x,y)\,df{(x)} \bigg)\,d \mu \\ &= \int_{X \times Y} c(x, y)\,d(\mu \otimes f) \end{align*} $$

and the problem $MK_{\Lambda }(c, \mu , \nu )$ is equivalent to the Monge problem with parameters $(\tilde {c}, \mu , \Lambda )$ . Thus, the disintegration of transport plans introduces another perspective for the optimization problems proposed by Monge and Kantorovich.

5. Absolute continuity from disintegration maps

To connect what we just developed to the geometric properties of the measure spaces and the space of measures, we shall study how paths of measures are given by disintegration maps. Our interest hereafter is to use the $2$ -Wasserstein distance.

To construct such paths in $2$ -Wasserstein space, it is essential to analyse the weak continuity of the disintegration map. In the Wasserstein space, one can study a characterization of convergence. We say that $\{ \mu _k \}$ converges weakly to $\mu $ when

$$ \begin{align*} \int \varphi ~\mu_k \longrightarrow \int \varphi \,d\mu \end{align*} $$

for any bounded continuous function $\varphi $ . Moreover, Wasserstein distances metrize weak convergence, that is, if $\{ \mu _k \}$ is a sequence in $\mathscr {P}_p (X)$ and $\mu \in \mathscr {P}_p(X)$ , then $\mu _k$ converges weakly to $\mu $ is equivalent to $W_p (\mu _k, \mu ) \longrightarrow 0$ . When referring to continuity of the disintegration map, we use weak continuity meaning continuity with respect to weak convergence on $\mathscr {P}_p (X)$ .

A map f on an metric space $(Y, \nu )$ is called nearly continuous if, for each $\varepsilon> 0$ , there exist $\mathcal {K} \subset Y$ closed with $\nu (Y \backslash \mathcal {K}) < \varepsilon $ such that f restricted to $\mathcal {K}$ is continuous. Let f be a disintegration map of $\mu $ w.r.t. $\nu $ . If $\nu $ is absolutely continuous with respect to the volume measure of Y, we can apply Lusin’s Theorem 2.4 to show that f is nearly weakly continuous. We also need the following lemma, which is actually a known fact within the optimal transport community. For completeness, we add it here in our context.

Lemma 5.1. Let $(X, d)$ be a separable metric space. The $2$ -Wasserstein space $(\mathscr {P}(X), W_{2})$ is a separable metric space.

Proof. Let $\mathcal {D} \subset X$ be a countable and dense subset. Consider the space $\mathcal {M}$ defined by $\mathcal {M} := \{ \nu \in \mathscr {P}(X) : \nu = \sum _j a_j \delta _{x_j}, ~\text {with}~ a_j \in \mathbb {Q} ~\text {and}~ x_j \in \mathcal {D} \}.$ We want to show that $\mathcal {M}$ is dense in $(\mathscr {P}(X), W_2)$ .

Given $\mu \in (\mathscr {P}(X), W_{2})$ , then for any $\varepsilon> 0$ and $x_0 \in \mathcal {D}$ , there exists a compact set $K \subset X$ such that

$$ \begin{align*} \int_{X \backslash K}\,d(x_0, x)^{2} \,d\mu \leq \varepsilon^{2}. \end{align*} $$

Since K is compact, we may cover it with a family $\{ B(x_k, {\varepsilon }/{2}):1 \leq k \leq N, x_k \in \mathcal {D} \}$ and define

$$ \begin{align*} B_{k}':= B(x_k, \varepsilon) \backslash \bigcup_{j<k} B(x_j, \varepsilon) \end{align*} $$

so that $\{ B_{k}' \}$ are disjoint and still cover K. Define $\varphi $ on X by

$$ \begin{align*} \begin{cases} \varphi(B_{k}' \cap K) = \{ x_k \}, \\ \varphi(X-K)=\{ x_0 \}. \end{cases} \end{align*} $$

So $d(x, x_k) \leq \varepsilon $ for every $x \in K$ . Therefore,

$$ \begin{align*} \int_{X}\,d(x, \varphi(x))^{2} \,d\mu \leq \varepsilon^2 \int_{K}\,d\mu + \int_{X \backslash K} d(x, x_0)^2 \,d\mu \leq 2 \varepsilon^2 \end{align*} $$

and then $W_2 (\mu , \varphi _{*}\mu ) \leq 2 \varepsilon ^2$ . Note that $\varphi _{*}\mu $ can be written as $\sum _{0 \leq j \leq N} \alpha _j \delta _{x_j}$ , that is, $\mu $ might be approximated by a finite combination of Dirac masses. Moreover, the coefficients $\alpha _j$ might be replaced by rational coefficients (up to a small error in Wasserstein distance): since Wasserstein distances are controlled by weighted total variation [Reference VillaniVil09, Theorem 6.15],

$$ \begin{align*} W_{2} \bigg( \sum_{0 \leq j \leq N} \alpha_j \delta_{x_j}, \sum_{0 \leq j \leq N} \beta_j \delta_{x_j} \bigg) \leq 2^{{1}/{2}} \bigg[ \max_{k, l}\,d(x_k, x_l) \bigg] \sum_{0 \leq j \leq N} |\alpha_j - \beta_j|^{{1}/{2}}, \end{align*} $$

which can become small, taking suitable coefficients $\beta _j \in \mathbb {Q}$ . Thus, it follows that $\mathcal {M}$ is dense in $\mathscr {P}(X)$ . Consequently, $(\mathscr {P}(X), W_2)$ is separable.

Proposition 5.2. Let $X, ~Y$ be locally compact, complete and separable metric spaces. Consider $\pi : X \to Y$ a Borel map, $\mu \in \mathcal {M}_{+}(X)$ , $\operatorname {\mathrm {vol}}_Y$ volume measure on Y and $\nu := \pi _{*}\mu $ . If $\nu \ll \operatorname {\mathrm {vol}}_Y$ , then the disintegration map of $\mu $ w.r.t. $\nu $ is nearly weakly continuous.

Proof. Consider the $2$ -Wasserstein distance $W_{2}$ on $\mathscr {P}(X)$ , with d a complete bounded metric for X. Here, $W_{2}$ metrizes the weak convergence of $\mathscr {P}(X)$ [Reference VillaniVil09, Theorem 6.9]. Furthermore, $(\mathscr {P}(X), W_{2})$ is a separable space. Moreover, from Theorem A, the map $y \mapsto \mu _{y} \in \mathscr {P}(X)$ is measurable and $\nu $ is a Borel measure on Y, since $\nu \ll \operatorname {\mathrm {vol}}_Y$ . Then, by Lusin’s theorem for Y and $\varepsilon> 0$ , there exists $\mathcal {K} \subset Y$ with $\nu (Y\backslash \mathcal {K}) < \varepsilon $ such that the disintegration map restricted to $\mathcal {K}$ is weakly continuous.

Although Proposition 5.2 is relevant by itself, we would like to have conditions for which the disintegration map is weakly continuous at every point, so that, given any two points $y, y'$ in Y, we can construct a path of conditional measures connecting $\mu _{y}$ and $\mu _{y'}$ . Note that the 2-Wasserstein space is actually connected, and therefore we can always find a path connecting two measures. Nevertheless, we require this path to be specifically given by the disintegration map. This is not trivial indeed and we can easily construct examples in which this map is not weakly continuous.

Example 5.3. Consider $X = Y = [0,1]$ . Let $\mu $ be the Lebesgue measure on X and take the map $\pi : X \to Y$ given by

$$ \begin{align*} \pi(x) = \begin{cases} 2x & \text{if } x< \frac{1}{2}, \\ 1 & \text{if } x \geq \frac{1}{2}. \end{cases} \end{align*} $$

Note that $\pi $ is Borel measurable, since it is continuous. Define

$$ \begin{align*} \nu := \pi_{*} \mu = \tfrac{1}{2} \unicode{x3bb} + \tfrac{1}{2} \delta_1, \end{align*} $$

where $\unicode{x3bb} $ is the Lebesgue measure on Y and $\delta _1$ is the Dirac measure at $y=1$ . A disintegration $\{ \mu _y \}_{y \in Y}$ of $\mu $ with respect to $\pi $ is given by

$$ \begin{align*} \mu_y = \begin{cases} \delta_{\pi^{-1}(y)} & \text{if } y<1, \\ 2\unicode{x3bb}|_{[{1}/{2}, 1]} & \text{if } y=1. \end{cases} \end{align*} $$

In this case, the disintegration map is not weakly continuous at $y=1$ . In fact, consider a sequence $(y_n)_n$ in Y such that $y_n \longrightarrow y=1$ . Note that

$$ \begin{align*} W_2^2(\mu_{y_n}, \mu_y) =& \inf_{\gamma \in \Pi(\mu_{y_n}, \mu_y)} \int_{X \times X} \,d(x_1, x_2)^2 \,d \gamma \\ =& \int_{X \times X}\,d(x_1, x_2)^2 \,d (\delta_{\pi^{-1}(y_n)} \times \mu_y) \\ =& \int_{X}\,d(\pi^{-1}(y_n), x_2)^2 \,d\mu_{y}. \end{align*} $$

Then, at the limit $y_n \longrightarrow 1$ , we have

$$ \begin{align*} W_2^2(\mu_{y_n}, \mu_y) \longrightarrow \int_{X} d \bigg(\frac{1}{2}, x_2 \bigg)^2 \,d\mu_{y} \neq 0. \end{align*} $$

Therefore, f is not weakly continuous at $y=1$ . Furthermore, note that $\nu (\{1\}) = \tfrac 12$ and $\nu $ is not absolutely continuous with respect to $\unicode{x3bb} $ . However, considering $Y=[0, 1)$ and guaranteeing the absolute continuity of $\nu $ , we can obtain a good approximation $\mathcal {K}$ in which f is weakly continuous.

We can actually ask for some additional hypotheses so that the disintegration map is weakly continuous. One possibility is to take $\pi $ as a bijective and continuous map.

Proposition 5.4. Let X and Y be locally compact and separable metric spaces. Consider $\mu \in \mathcal {M}_{+}(X)$ , $\pi : X \to Y$ a Borel map, $\nu := \pi _*\mu $ , $\{ \mu _{y} \}_{y \in Y}$ a disintegration of $\mu $ given by Theorem A, and f the disintegration map of $\mu $ w.r.t. $\nu $ . If $\pi $ is bijective and continuous, then f is weakly continuous.

Proof. On the one hand,

$$ \begin{align*} \mu(B) =& \int_{Y} \mu_y (B) \,d\nu \\ =& \int_{Y} \mu_y (B \cap \pi^{-1}(y)) \,d\nu. \end{align*} $$

On the other hand,

Then,

$$ \begin{align*} \int_{Y} \mu_y (B \cap \pi^{-1}(y)) \,d\nu = \int_{Y} \delta_{\pi^{-1}(y)} (B \cap \pi^{-1}(y)) \,d\nu. \end{align*} $$

Moreover, since $\mu _y$ is a probability and $B \cap \pi ^{-1}(y)$ is either a singleton or an empty set, we have that

$$ \begin{align*} \delta_{\pi^{-1}(y)} (B \cap \pi^{-1}(y)) \geq \mu_y (B \cap \pi^{-1}(y)). \end{align*} $$

Then, $\mu _y (B \cap \pi ^{-1}(y)) = \delta _{\pi ^{-1}(y)} (B \cap \pi ^{-1}(y))$ and it follows that $\mu _y = \delta _{\pi ^{-1}(y)}$ .

Suppose $y_n \longrightarrow y$ . Since $\pi $ is continuous, $\pi ^{-1}$ is continuous and then

$$ \begin{align*} \int g\,d\mu_{y_n} \longrightarrow \int g \,d\mu_y \end{align*} $$

for every bounded uniformly continuous function g. Therefore, f is weakly continuous.

Asking for the bijectivity of $\pi $ is quite strong. We want to explore ways to ease this restriction and to obtain continuity at least for almost every point. In some examples for which $\pi $ is a quotient map, we have the weak continuity of f.

Example 5.5. $\ell _q$ -product space from metric measures spaces.

Let $(Y, d_Y, m_Y)$ , $(Z, d_Z, m_Z)$ denote the metric spaces $(Y, d_Y)$ and $(Z, d_Z)$ which are endowed with probability measures $m_Y$ and $m_Z$ , respectively. We shall call these metric measure spaces. For $q \in [1, \infty ]$ , define the $\ell _q$ -product space $X := Y \times _{\ell _q} Z$ as the product space $Y \times Z$ equipped with the measure $\mu = m_Y \times m_Z$ and the distance $d_{\ell _q}$ given by

$$ \begin{align*} d_{\ell_q} ((y, z), (y' z')) = \begin{cases}[ d_Y (y, y')^q + d_Z (z, z')^q]^{{1}/{q}} & \text{if } 1 \leq q < \infty, \\ \max \{ d_Y(y, y') , d_Z (z, z') \} & \text{if } q=\infty. \end{cases} \end{align*} $$

Consider the projection $\pi : X \to Y$ and $\{ \mu _{y} \}_{y \in Y}$ the disintegration of $\mu $ with respect to $\nu := \pi _* \mu $ . Note that

$$ \begin{align*} d_{\ell_q} (\pi^{-1}(y), \pi^{-1}(y')) = d_Y (y, y') \end{align*} $$

for every $q \in [1, \infty ]$ and, for every conditional measure $\mu _{y}, \mu _{y'}$ ,

$$ \begin{align*} W_2^2(\mu_{y}, \mu_{y'}) =& \inf_{\gamma \in \Pi (\mu_{y}, \mu_{y'})} ~ \int d_{\ell_q} ((y_1, z_1), (y_2, z_2))^2 \, d\gamma \\ =& \inf_{\gamma \in \Pi(\delta_{y}, \delta_{y'})} \int d_Y(y_1, y_2)^2\,d\gamma \\ =& ~ d_{Y}(y, y')^2. \end{align*} $$

Then, $W_2 (\mu _{y}, \mu _{y'}) = d_{Y}(y, y')$ and, therefore, the disintegration map is weakly continuous.

In fact, this is one example of a disintegration of measures associated with a foliation, called metric measure foliation, covered in [Reference Galaz-García, Kell, Mondino and SosaGKMS18]. For a precise definition, we need to introduce some concepts. Let $(X, d)$ be a metric space. A foliation $\mathcal {F}$ of X is a partition of X into closed subsets. The elements of this partition are called leaves. Here, $\mathcal {F}$ is called a metric foliation if for every $F, F' \in \mathcal {F}$ and every $x \in F$ ,

$$ \begin{align*} d(F, F') = d(x, F'), \end{align*} $$

where $d(F, F') = \inf \{ d(x, x') : x \in F, x' \in F' \}$ and $d(x, F') = d (\{ x \}, F')$ . In the case where each leaf is bounded, we say that $\mathcal {F}$ is bounded. Given a metric foliation $\mathcal {F}$ of X, define the equivalence relation:

(7)

$$ \begin{align} x \sim x' \!\iff \text{there exists}~ F \in \mathcal{F} ~\text{such that}~ x, x' \in F. \end{align} $$

Consider $X^{*} :=X/\sim $ the set of equivalence classes under equation (7) and the projection $p: X \to X^{*}$ onto $X^{*}$ . We call $X^{*}$ the quotient space and p the quotient map. Define a distance function $d^{*}$ on $X^{*}$ as

(8)

$$ \begin{align} d^{*}(y, y') := d(p^{-1}(y), p^{-1}(y')) \end{align} $$

for $y, y' \in X^{*}$ . Note that p is a submetry: $p(B(x, r)) = B(p(x), r)$ , where $B(x, r)$ is a ball centred at x with radius r. In fact,

$$ \begin{align*} B(p(x), r) &= \{ y \in X^{ *} : d^{*} (y, p(x)) < r\} \\ &= \{ y \in X^{ *} : d (p^{-1} (y), p^{-1}(p(x))) < r\} \\ &= p(B(x, r)). \end{align*} $$

Therefore, p is 1-Lipschitz. In this notation, we define the following.

Definition 5.6. Let $\mathcal {F}$ be a metric foliation of $(X, d, \mu )$ . Additionally, $\mathcal {F}$ is a metric measure foliation if $p_*\mu $ is locally finite Borel measure on $X^*$ , and there exists a Borel subset $\Omega \subset X^*$ with $p_*\mu (X^* \backslash \Omega ) =0$ such that

(9)

$$ \begin{align} W_2(\mu_y, \mu_{y'}) = d^* (y, y') \end{align} $$

for any $y, y' \in \Omega $ , where $\{ \mu _y \}_{y \in Y}$ is a disintegration of $\mu $ with respect to $p_{*}\mu $ .

Note that, in this case, the disintegration map is an isometry. This is the most important characteristic of a metric measure foliation for us. A very important example of a metric measure foliation is related to the action of isometry group.

Example 5.7. Let $(X, d, \mu )$ be a metric measure space and G a compact topological group. Let

$$ \begin{align*} G \times X \ni (g, x) \mapsto gx \in X \end{align*} $$

be an isometric action of G on X. Suppose this action is metric measure isomorphic, that is, for every $g \in G$ , the map $X \ni x \mapsto gx \in X$ is an isometry preserving the measure $\mu $ . Consider $[x]$ the G-orbit of a point $x \in X$ and the quotient space $X/G$ endowed with the distance

$$ \begin{align*} d_{X/G}([x], [x']) = \inf_{g, g' \in G} d(gx, g'x'). \end{align*} $$

Consider $p: X \to X/G$ the projection map, that is, p is given by $x \mapsto [x]$ . The family $\mathcal {F} := \{ p^{-1}(y) : y \in X/G \}$ is a metric measure foliation on X.

Other interesting examples arise from Riemannian submersions of weighted Riemannian manifolds [Reference Galaz-García, Kell, Mondino and SosaGKMS18].

From Definition 5.6, we have a direct association of the $2$ -Wasserstein distance between conditional measures (given by Theorem A) and the distance between points to which these measures were indexed. It is clear that in this case, we have the weak continuity of the disintegration map, since it is a isometry. Note that we can relate the $2$ -Wasserstein distance between conditional measures to the distance between two points of the quotient space looking at a direction perpendicular to the leaves, through the leaves. Roughly speaking, we can think of a kind of space fibration, where one of the directions has been ‘collapsed’, and it becomes a parameter for the disintegration family, so that each conditional measure is supported on the underlying fibre (see Figure 5). That is, the measures $\mu _y$ are of the type $\delta _{y} \times \unicode{x3bb} $ , where $\unicode{x3bb} $ is a measure on the fibre, similar to Example 3.2.

Figure 5 Idea of a metric measure foliation as a space fibration.

With what we have seen so far, we have been able to classify some situation in which the disintegration map is weakly continuous.

Proposition 5.8. Let X be locally compact and separable metric space. Consider $\mu \in \mathcal {M}_{+}(X)$ , a metric foliation $\mathcal {F}$ of X, the quotient space $X^*$ and the quotient map $p:X \to X^*$ . If there exists a metric measure foliation of X, with $\Omega = X^*$ , then the disintegration map of $\mu $ w.r.t. $\nu =p_{*}\mu $ is weakly continuous.

Proof. Given a metric measure foliation of X, we have for every $y,{\kern-1pt} y' {\kern-1pt}\in{\kern-1pt} X^*, W_2(\mu _y, \mu _{y'}) {\kern-1pt}= d^* (y, y')$ , where $\mu _{y} = f(y)$ and $\mu _{y'} = f(y')$ , and the disintegration map is denoted by f. Consider a sequence $(y_n)_n$ in $X^*$ such that $y_n \longrightarrow y$ . Note that $W_2(\mu _{y_n}, \mu _{y}) \longrightarrow 0$ , since $d^*(y_n, y) \longrightarrow 0$ . Therefore, we have the weak continuity of f.

Remark 5.9. If in Proposition 5.8 we do not ask for $\Omega = X^*$ , the weak continuity of the disintegration map is given for $p_*\mu $ -a.e. $y \in X^*$ due the definition of metric measure foliation. Although it seems to be a strong condition, in general cases of interest, we have it satisfied, as in Examples 5.5 and 5.7, for instance.

Remark 5.10. One can associate hypotheses about the map $\pi $ and the type of disintegration obtained. In light of what we have seen, we may obtain the following.

(1) Under the hypotheses of Proposition 5.4, that is, when the map $\pi $ is bijective and continuous, the conditional measures given by Theorem A are Dirac deltas.
(2) Under the hypotheses of Proposition 5.8, that is, in the metric measure foliation case, the conditional measures given by Theorem A are of the type $\delta _{y} \times \unicode{x3bb} $ , where $\unicode{x3bb} $ is a measure on the fibre.

The entire study carried out on the disintegration map makes clear a fundamental condition for it to be weakly continuous: the supports of the conditional measures $\{ \mu _y \}$ must be disjoint. However, if we want some kind of absolute continuity of $\{ \mu _y \}$ with respect to a reference measure, the supports must have $\mu $ -positive measure. In the cases of Propositions 5.4 and 5.8, we do not obtain supports with a $\mu $ -positive measure. Another way is to think about the absolute continuity of measures with respect to a reference measure on the fibres. This discussion is summarized in Theorem B. In the statement, we call f a minimizing invariant, if it maps a minimizing curve on Y to a minimizing curve on $(\mathscr {P}(X), W_2)$ . Such a condition is fulfilled when $\pi $ is a Riemannian submersion, for example.

Theorem B. Let X and Y be locally compact, complete, separable metric spaces. Consider $\pi : X \to Y$ a Borel map, $\mu $ in $\mathcal {M}_{+}(X)$ and $\nu := \pi _{*}\mu $ . If the disintegration map of $\mu $ w.r.t. $\nu $ is weakly continuous and Y is path connected, then given two points $y, y' \in Y$ :

(i) there exists a path on $(\mathscr {P}(X), W_{2})$ , given by the disintegration map, connecting $\mu _{y}$ and $\mu _{y'}$ , the respective conditional measures given by Theorem A;
(ii) if X is a smooth compact Riemannian manifold equipped with a volume measure $\operatorname {\mathrm {vol}}$ , $\mu \ll \operatorname {\mathrm {vol}}$ , $\pi $ is such that $\pi ^{-1}(y)$ has $\mu $ -positive measure for $\nu $ -almost every y, the disintegration map is minimizing invariant, and either $\mu _{y}$ or $\mu _{y'}$ is absolutely continuous w.r.t. $\operatorname {\mathrm {vol}}$ , then all the measures $\mu _{y_{t}}$ on the path given by item (i) are absolutely continuous w.r.t. $\operatorname {\mathrm {vol}}$ ;
(iii) if $\pi $ is such that $\{ \pi ^{-1}(y) \}_{y \in Y}$ is a metric measure foliation of X, X a smooth compact Riemannian manifold, there exists a path given by the disintegration map connecting $\mu _{y}$ and $\mu _{y'}$ , and if either $\mu _{y}$ or $\mu _{y'}$ is absolutely continuous with respect to the volume measure on the respective support fibre, then all the measures $\mu _{y_{t}}$ on this path are absolutely continuous with respect to the volume measure on the fibre.

Proof. (i) This is a direct consequence of the weak continuity of the disintegration map, which will be denoted by f throughout the demonstration. Consider $y, y' \in Y$ . Let $\psi $ be a continuous curve in Y connecting y and $y'$ , that is, $\psi = \{ y_{t} : t \in [0, 1], \psi (0)=y, \psi (1)=y' \}$ . Taking $y_t \longrightarrow \bar {y} \in Y$ , we have $f(y_t) \stackrel {w}{\longrightarrow } f(\bar {y})$ , that is, $W_2(\mu _{y_{t}}, \mu _{\bar {y}}) \longrightarrow 0$ . Then, $\zeta = \{ \mu _{y_{t}} : t \in [0, 1] \}$ , where $\mu _{y_{t}}$ is the conditional measure associated with $y_t$ via f, for every $t \in [0, 1]$ , is a weakly continuous curve in $(\mathscr {P}(X), W_2)$ connecting $\mu _{y}$ and $\mu _{y'}$ . This proves the first part of our theorem.

The proof of part (ii) will be done in a few steps. The idea is to use a sort of ‘time-dependent’ version of optimal transport. See [Reference VillaniVil09, Ch. 7], for example. In short, we will consider the curve $\zeta $ given by the disintegration map as an interpolation between probability measures, called displacement interpolation. To this end, we consider a transport problem, and we associate $\zeta $ with a random curve $\xi $ in X.

Step 1: A minimizing curve in the space of measures. Consider the path $\zeta = \{ \mu _{y_{t}} : t \in [0, 1] \}$ as constructed in part (i). Taking $\psi $ as a minimizing curve in Y, $\zeta $ is a minimizing curve in $\mathscr {P}(X)$ , since f is minimizing invariant. By abuse of notation, we use the weak continuous path $\zeta : [0,1] \to (\mathscr {P}(X), W_{2})$ while referring to this path in $(\mathscr {P}(X), W_{2})$ given by the disintegration map f evaluated at the minimal curve $\psi $ joining $[y, y']$ in Y. More precisely, we consider the composition $f \circ \psi $ to describe $\zeta $ .

Step 2: A random curve in X. We want to associate $\zeta $ with a random curve $\xi{\kern-1pt} :{\kern-1pt} [0,{\kern-1pt}1] \to X$ . To set some notation, let $e_t$ be the evaluation map, given by $e_{t}(\xi ) = \xi (t) := \xi _{t}$ , meaning the evaluation of $\xi $ at t. We will also use some usual concepts related to geodesics in Riemannian manifolds throughout the demonstration. We suggest [Reference JostJost11] for a comprehensive reading.

Consider the curve $\zeta : [0,1] \to \mathscr {P}_2(X)$ joining $\mu _{0}$ and $\mu _{1}$ (from Step 1), denoting $\mu _0 = \mu _y$ and $\mu _1 = \mu _{y'}$ . We also denote $\mu _{t} = \mu _{y_t}$ . Suppose that there is a transport problem associated with this curve, whose respective spatial distributions are modelled by these probability measures. Assume that the cost function for the transport between the initial point $x_0 \in X$ (at time $0$ ) and the final point $x_1 \in X$ (at time $1$ ), denoted by $c^{0, 1} (x_0, x_1)$ , is associated with a family of functionals parametrized by the initial and the final times. Denote by $\mathcal {A}^{0, 1}$ the functional on the set of curves $[0, 1] \to X$ , such that,

$$ \begin{align*} c^{0, 1} (x_0, x_1)= \inf \{ \mathcal{A}^{0, 1} (\xi) : \xi_{0} = x_0,~\xi_{1} = x_1, ~\xi \in \mathcal{C}([0, 1]; X) \}. \end{align*} $$

In other words, $c^{0, 1}(x_0, x_1)$ is the minimal cost needed to go from point $x_0$ at initial time $0$ , to point $x_1$ at final time $1$ . Moreover, let $C^{0,1}(\mu _{0}, \mu _{1})$ be the optimal transport cost between $\mu _{0}$ and $\mu _{1}$ for the cost $c^{0, 1} (x_0, x_1)$ . For $t_1, t_2 \in [0, 1]$ , define

$$ \begin{align*} \mathcal{A}^{t_1, t_2}(\xi) = \frac{L(\xi)^{2}}{t_2 - t_1}, \end{align*} $$

where $L(\xi )$ is the length of $\xi $ , so that

$$ \begin{align*} c^{t_1, t_2} (x_0, x_1)=\frac{d(x_0, x_1)^2}{t_2 - t_1} \end{align*} $$

and

(10)

$$ \begin{align} C^{t_1, t_2} (\mu_{0}, \mu_{1}) = \frac{W_2 (\mu_{0}, \mu_{1})^2}{t_2 - t_1}. \end{align} $$

We want to show that there exists a random minimizer $\xi : [0,1] \to X$ , such that, law $(\xi _t) = \mu _t$ for every $t \in [0,1]$ . In other words, we want to show that $\zeta $ is a curve in the space of measures which interpolates all possible measures along the minimizing path joining $\mu _{0}$ and $\mu _{1}$ . Such a curve is called displacement interpolation.

Step 3: $\zeta $ is a displacement interpolation. In this step, $\xi $ will be constructed by dyadic approximation, according to couplings of measures in $\zeta $ associated with times $t = {1}/{2^k}$ . To achieve this, we will use the iterative construction in line with [Reference VillaniVil09, Theorem 7.21].

Let $\Gamma $ be the set of minimizing curves in X. It will be necessary throughout the text to consider subsets of $\Gamma $ in which the geodesics are defined for certain time intervals and endpoints (or endpoints regions). So, for $s,t \in [0,1]$ , $x_s, x_t \in X$ , let $\Gamma _{x_s \to x_t}^{s, t}$ be the set of minimizing curves in X starting at $x_s$ at time s and ending at $x_t$ at time t. Similarly, for any two compact sets $K_s, K_t \subset X$ , let $\Gamma _{K_s \to K_t}^{s, t}$ be the set of minimizing curves starting in $K_s$ at time s and ending in $K_t$ at time t.

Considering the measures along $\zeta $ , for $t_1, t_2, t_3 \in [0, 1]$ , let $\gamma _{t_1 \to t_2}$ be an optimal transference plan between $\mu _{t_1}$ and $\mu _{t_2}$ for $c^{t_1, t_2}(x_{t_1}, x_{t_2})$ , and let $\gamma _{t_2 \to t_3}$ be an optimal transference plan between $\mu _{t_2}$ and $\mu _{t_3}$ for $c^{t_2, t_3}(x_{t_2}, x_{t_3})$ . By Lemma 2.5, it is possible to take random variables $(\xi _{t_1}, \xi _{t_2}, \xi _{t_3})$ such that law $(\xi _{t_1}, \xi _{t_2}) = \gamma _{t_1 \to t_2}$ , law $(\xi _{t_2}, \xi _{t_3}) = \gamma _{t_2 \to t_3}$ and law $(\xi _{t_i})=\mu _{t_i}$ for $i=1, 2 ,3$ . Since $\zeta $ is minimizing in $\mathscr {P}_2(X)$ (see Step 1) and $\mathscr {P}_2(X)$ is a geodesic space, it follows from equation (10) that

$$ \begin{align*} C^{t_1, t_2}(\mu_{t_1}, \mu_{t_2}) + C^{t_2, t_3}(\mu_{t_2}, \mu_{t_3}) = C^{t_1, t_3}(\mu_{t_1}, \mu_{t_3}). \end{align*} $$

This, in particular, implies:

(a) $(\xi _{t_1}, \xi _{t_3})$ is an optimal coupling of $(\mu _{t_1}, \mu _{t_3})$ for $c^{t_1, t_3}(\xi _{t_1}, \xi _{t_3})$ ;
(b) $c^{t_1, t_3} (\xi _{t_1}, \xi _{t_3}) =c^{t_1, t_2} (\xi _{t_1}, \xi _{t_2}) + c^{t_2, t_3} (\xi _{t_2}, \xi _{t_3})$ almost surely.

Let $(\xi _{0}, \xi _{1})$ be an optimal coupling of $(\mu _{0}, \mu _{1})$ . Consider optimal transference plans $\gamma _{0 \to 1/2}$ , $\gamma _{1/2 \to 1}$ , as above, and construct random variables $(\xi _{0}^{(1)}, \xi _{1/2}^{(1)}, \xi _{1}^{(1)})$ such that $(\xi _{0}^{(1)}, \xi _{1/2}^{(1)})$ is an optimal coupling of $(\mu _{0}, \mu _{1/2})$ for $c^{0, 1/2}(\xi _{0}^{(1)}, \xi _{1/2}^{(1)})$ ; $(\xi _{1/2}^{(1)}, \xi _{1}^{(1)})$ is an optimal coupling of $(\mu _{1/2}, \mu _{1})$ for $c^{1/2, 1}(\xi _{1/2}^{(1)}, \xi _{1}^{(1)})$ , and law $(\xi _{i}^{(1)}) = \mu _{i}$ for $i=0, \tfrac 12 ,1$ . Moreover, item (a) implies that $(\xi _{0}^{(1)}, \xi _{1}^{(1)})$ is an optimal coupling of $(\mu _{0}, \mu _{1})$ , and item (b) implies

$$ \begin{align*} c^{0, 1}(\xi_{0}^{(1)}, \xi_{1}^{(1)}) = c^{0, {1}/{2}} \big(\xi_{0}^{(1)}, \xi_{{1}/{2}}^{(1)} \big) + c^{{1}/{2}, 1} \big(\xi_{{1}/{2}}^{(1)}, \xi_{1}^{(1)}\big) \end{align*} $$

almost surely. Iterating this process, at the step k, we have random variables $(\xi _{0}^{(k)}, \xi _{{1}/{2^{k}}}^{(k)}, \xi _{{2}/{2^{k}}}^{(k)}, \ldots , \xi _{1}^{(k)})$ , so that for any two $i, j \leq 2^{k}$ , $(\xi _{{i}/{2^{k}}}^{(k)}, \xi _{{j}/{2^{k}}}^{(k)})$ is an optimal coupling of $(\mu _{{i}/{2^{k}}}, \mu _{{j}/{2^{k}}})$ . Furthermore, for $i_1, i_2, i_3 \leq 2^{k}$ ,

$$ \begin{align*} c^{{i_1}/{2^k}, {i_3}/{2^k}} \big(\xi_{{i_1}/{2^k}}^{(k)}, \xi_{{i_3}/{2^k}}^{(k)}\big) = c^{{i_1}/{2^k}, {i_2}/{2^k}}\big(\xi_{{i_1}/{2^k}}^{(k)}, \xi_{{i_2}/{2^k}}^{(k)}\big) + c^{{i_2}/{2^k}, {i_3}/{2^k}}\big(\xi_{{i_2}/{2^k}}^{(k)}, \xi_{{i_3}/{2^k}}^{(k)}\big) \end{align*} $$

almost always.

We want to extend the random variables $\xi ^{(k)}$ , defined for times ${i}/{2^{k}}$ , $i \leq 2^k$ , to continuous curves $(\xi ^{(k)})_{0 \leq t \leq 1}$ . For this, note that for all times $s, t \in [0, 1]$ , $s < t$ , there exists a Borel map $S_{s \to t}: X \times X \to \mathcal {C}([s, t];X)$ such that for all $x, z \in X$ , $S(x, z)$ belongs to $\Gamma _{x \to z}^{s, t}$ [Reference VillaniVil09, Proposition 7.16]. Indeed, let $E_{s,t}$ be the function given by $E_{s,t}(\xi ) := (\xi _{s}, \xi _{t})$ for $\xi : [s, t] \to X$ minimizing curve. Since X is a geodesic space, any two points of X can be joined by at least one minimizing curve, so $E_{s, t}$ is onto $X \times X$ . Moreover, $E_{s, t}$ is a continuous map between complete separable metric spaces, and $E_{s,t}^{-1}(x, z)$ is compact for every $x, z$ . Therefore, $E_{s,t}$ admits a measurable right-inverse $S_{s \to t}$ [Reference DellacherieDel75], that is, $E_{s,t} \circ S_{s \to t} = \mathrm {Id}$ . Thus, $S_{s \to t}$ is a measurable recipe to join two points $x,z$ by a minimizing curve, which was to be proved.

For $t \in ({i}/{2^k}, ({i+1})/{2^k})$ , define $\xi _{t}^{(k)}$ by $e_t (S_{{i}/{2^k} \to ({i+1})/{2^k}} ( \xi _{{i}/{2^{k}}}, \xi _{({i+1})/{2^{k}}} ) )$ . Then, the law of $(\xi _{t}^{(k)})_{0 \leq t \leq 1}$ is a probability on $C(X)$ . Let us denote it by $\Theta ^{(k)}$ . Note that $(e_t)_{*}\Theta ^{(k)}=\mu _{t}$ for every $t={i}/{2^k}$ , $i \leq 2^k$ , and $\Theta ^{(k)}$ is concentrated on $\Gamma $ .

In short, from $\zeta $ , we iteratively constructed probability measures $\Theta ^{(k)}$ on $\Gamma $ up to a step k. We want to pass to the limit as $k \longrightarrow \infty $ . Given $\varepsilon>0$ , since $\mu _{0}$ and $\mu _{1}$ are Radon measures, there exist compact sets $K_0$ and $K_1$ such that $\mu _{0}(X \backslash K_0) \leq \varepsilon $ , $\mu _{1} (X \backslash K_1) \leq \varepsilon $ . Also, the set $\Gamma _{K_0 \to K_1}^{0, 1}$ is compact [Reference VillaniVil09, Definition 7.13 and Example 7.15] and

$$ \begin{align*} \Theta^{(k)}(\Gamma ~\backslash~ \Gamma_{K_0 \to K_1}^{0, 1}) &= \mathbb{P}((\xi_{0}, \xi_{1}) \notin K_0 \times K_1) \\ & \leq \mathbb{P}(\xi_{0} \notin K_0) + \mathbb{P}(\xi_{1} \notin K_1) \\ & = \mu_{0}(X \backslash K_0) + \mu_{1} (X \backslash K_1) \\ & \leq 2 \varepsilon. \end{align*} $$

Then, we can take a subsequence of $(\Theta ^{(k)})_{k}$ that converges weakly to $\Theta $ . Since $\Gamma $ is closed in the topology of uniform convergence [Reference VillaniVil09, Theorem 7.16(v)], $\Theta $ is supported in $\Gamma $ . Moreover, given a constant a, for every $t={1}/{2^{a}} \in [0, 1]$ , if $k> a$ , we have $(e_t)_{*}\Theta ^{(k)} = \mu _{t}$ and, passing to the limit $k \longrightarrow \infty $ , $(e_t)_{*}\Theta = \mu _{t}$ . Finally, since $\mu _{t}$ depends continuously on t, to show that $(e_t)_{*}\Theta = \mu _{t}$ for every $t \in [0, 1]$ , it suffices to show that $(e_t)_{*}\Theta $ is continuous as a function of t. In other words, we need to show that, given u a bounded continuous function on X, $U(t) = \mathbb {E} u (\xi _{t})$ is a continuous function of t if $\xi $ is random geodesic with law $\Theta $ . In fact, since $t \mapsto \xi _{t}$ is continuous and the composition of continuous functions is also continuous, $t \mapsto u(\xi _{t})$ is continuous. Moreover, let $\{ u_n \}$ be Lebesgue integrable functions such that $u_n \longrightarrow u$ . Since u is bounded, $|u_n| \leq g$ for some integrable function g; by Lebesgue’s dominated convergence theorem, $\mathbb {E}u_n \longrightarrow \mathbb {E}u$ . From these results, the continuity of $U(t)$ follows, as we wanted.

Accordingly, we constructed $\xi $ such that for each $t \in [0, 1]$ , $\mu _{t}$ is the law of $\xi _{t}$ , where $(\xi _{t})_{0 \leq t \leq 1}$ is a dynamical optimal coupling of $(\mu _{0}, \mu _{1})$ . In other terms, we say that $\zeta $ is displacement interpolation.

Step 4: An important observation about displacement interpolation. By [Reference VillaniVil09, Theorem 8.5], if $\{ \mu _t \}$ is a displacement interpolation between two compactly supported probability measures on X, and $t_0 \in (0, 1)$ is given, then, for every $t \in [0, 1]$ , the transport map $T_{t_0 \to t}$ between the points $\xi (t_0)$ and $\xi (t)$ is well defined $\mu _{t_0}$ -almost everywhere and it is Lipschitz continuous. In other words, $T_{t_0 \to t}$ is a solution of the Monge problem between $\mu _{t_0}$ and $\mu _t$ . For the completeness of the text, we will comment briefly on the proof.

Note that $(e_0, e_1, e_0, e_1)_{*}(\Theta \otimes \Theta ) = \gamma _{0 \to 1} \otimes \gamma _{0 \to 1}$ . So, if one property holds true $\gamma _{0 \to 1} \otimes \gamma _{0 \to 1}$ -a.a. for quadruples, this property, for the endpoints of pairs of curves, holds true $\Theta \otimes \Theta $ -a.a. Since $\gamma _{0 \to 1}$ is optimal, it has a property named c-cyclical monotonicity [Reference VillaniVil09, Theorem 5.10], so that $c(x, y) + c(\tilde {x}, \tilde {y}) \leq c(x, \tilde {y}) + c(\tilde {x}, y)$ , $\Theta \otimes \Theta (dx,\,dy,\,d\tilde {x},\,d\tilde {y})$ -a.a. Thus, $c(\xi (0), \xi (1)) + c(\tilde {\xi }(0), \tilde {\xi }(1)) \leq c(\xi (0), \tilde {\xi }(1)) + c(\tilde {\xi }(0), \xi (1))$ , $\Theta \otimes \Theta (d\xi , d\tilde {\xi })$ -a.a.

Moreover, by Mather’s shortening lemma [Reference VillaniVil09, Theorem 8.1 and Corollary 8.2],

(11)

$$ \begin{align} \sup_{0 \leq t \leq 1} d(\xi_t, \tilde{\xi_t}) \leq C_{K} d(\xi_{t_0}, \tilde{\xi_{t_0}}), \end{align} $$

where $C_K$ is a constant. Suppose that $\Theta $ is supported on a compact set S. Equation (11) defines a closed set for all pairs of curves $\xi , \tilde {\xi } \in S \otimes S$ . Let $e_{t_0}(S)$ be the union of all $\xi (t_0)$ , when $\xi $ varies over S, and $T_{t_0 \to t}$ by $T_{t_0 \to t}(\xi (t_0)) = \xi (t)$ . Note that if $\xi $ , $\tilde {\xi }$ in S are such that $\xi (t_0) = \tilde {\xi }(t_0)$ , then equation (11) implies $\xi = \tilde {\xi }$ . Moreover, $T_{t_0 \to t}$ is Lipschitz-continuous. So, $(\xi (t_0), T_{t_0 \to t}(\xi (t_0)))$ is a Monge coupling of $(\mu _{t_0}, \mu _{t})$ .

Step 5: Absolute continuity of measures $\mu _t$ on $\zeta $ with respect to the volume measure on X, when $\mu _0$ and $\mu _1$ are compactly supported. Without loss of generality, let us suppose that $\mu _{1}$ is absolutely continuous with respect to the volume measure on X, vol. If $\mu _0$ and $\mu _1$ are compactly supported, then $\zeta $ has a compact support. Indeed, let $A_0, A_1 \subset X$ be the compact supports of $\mu _0, \mu _1$ and $\gamma _{0 \to 1}$ be the transference plan with marginals $\mu _0$ and $\mu _1$ . Consider the canonical projections $(\mathrm {proj}_1)$ , $(\mathrm {proj}_2)$ on the first and second components, respectively. Since $(\mathrm {proj}_1)_{*}\gamma _{0 \to 1} =\mu _0$ and $(\mathrm {proj}_2)_{*}\gamma _{0 \to 1} =\mu _1$ ,

$$ \begin{align*} (\mathrm {proj}_1)_{*}\gamma_{0 \to 1}(X \times X) = \mu_{0}(X) = \mu_{0}(A_0) = \gamma_{0 \to 1}(A_0 \times X), \end{align*} $$

$$ \begin{align*} (\mathrm {proj}_2)_{*}\gamma_{0 \to 1}(X \times X) = \mu_{1}(X) = \mu_{1}(A_1) = \gamma_{0 \to 1}(X \times A_1). \end{align*} $$

Therefore, $\gamma _{0 \to 1}$ is concentrated in a compact set $A_0 \times A_1$ . Moreover, since $\gamma _{0 \to 1} = (e_0, e_1)_{*}\Theta $ and the evaluation map is continuous, $\Theta $ is concentrated in a compact set. The compactness of the $\zeta $ support follows from $(e_t)_{*} \Theta = \mu _{t}$ .

We can use Step 4, and there is a Lipschitz map T solving the Monge problem between $\mu _{t}$ and $\mu _{1}$ , $t \in (0, 1)$ . Let N be a set such that the volume measure is zero and consider $T(N)$ . If $T(N)$ is not Borel measurable, consider a negligible Borel set that contains $T(N)$ (which by abuse of notation, we will continue denoting $T(N)$ ). Note that $N \subset T^{-1}(T(N))$ , so

$$ \begin{align*} \mu_{t}(N) \leq \mu_{t}(T^{-1}(T(N))) = (T_{*} \mu_{t}) (T(N)) = \mu_{1} (T(N)) \end{align*} $$

and then $\mu _{t} (N)=0$ , since $\text {vol}(T(N)) \leq \| T \|_{\text {Lip}} \text {vol}(N) = 0$ (this inequality occurs since T is Lipschitz and $\| T \|_{\text {Lip}}$ stands for the Lipschitz constant) and $\mu _1 \ll \text {vol}$ by hypothesis. So, $\mu _{t}(N)=0$ for every Borel set N such that vol $(N)=0$ , that is, $\mu _{t} \ll \text {vol}$ .

Step 6: Absolute continuity of measures $\mu _t$ of $\zeta $ with respect to the volume measure on X: the general case. Without loss of generality, suppose $\mu _{1} \ll \text {vol}$ . Let us assume that there is some case in which neither $\mu _0$ nor $\mu _1$ is compactly supported. We will prove our statement (ii) by contradiction. Suppose that $\mu _{\tau }$ , for $\tau \in (0,1)$ , is not absolutely continuous with respect to the volume measure on X. Then, there exists a set $Z_{\tau } \subset X$ , such that vol $(Z_{\tau })=0$ and $\mu _{\tau }(Z_{\tau })> 0$ . Consider $\mathcal {Z}:=\{ \xi \in \Gamma : \xi _{\tau } \in Z_{\tau } \}$ so that $\Theta (\mathcal {Z}) = \mathbb {P}(\xi _{s\tau } \in Z_{\tau }) = \mu _{\tau }(Z_{\tau })> 0$ . Since $\Theta $ is a regular measure, there exists $\mathcal {K} \subset \mathcal {Z}$ compact such that $\Theta (\mathcal {K})> 0$ . So, if we set

and consider $\gamma _{0 \to 1}':= (e_0, e_1)_{*}\Theta '$ and $\mu _{t}'=(e_t)_{*}\Theta '$ , we have

$$ \begin{align*} \mu_t' \leq \frac{(e_t)_{*}\Theta}{\Theta (\mathcal{K})} =\frac{\mu_{t}}{\Theta (\mathcal{K})}. \end{align*} $$

In this way, $(\mu _{t}')$ is a displacement interpolation and, considering the previous equation for $t=1$ , $\mu _{1}' \ll \mu _1 \ll \text {vol}$ . Note that now, $\mu _{\tau }'$ is concentrated on $e_{\tau }(\mathcal {K}) \subset e_{\tau }(\mathcal {Z}) \subset Z_{\tau }$ and then $\mu _{\tau }'$ is singular. However, $\mu _0'$ is supported in $e_0(\mathcal {K})$ and $\mu _{1}'$ is supported in $e_1(\mathcal {K})$ , which are compact. This is the case of Step 5. Then, $\mu _{\tau }' \ll \text {vol}$ , which is a contradiction.

(iii) In this item, we denote $Y:=\Omega $ , where $\Omega $ is the subset of the quotient space $X^*$ on which the metric measure foliation is defined (see Definition 5.6), and $\pi := p$ , where p is the quotient map. The weak continuity of f in this case was proved in Proposition 5.8. Let $\psi $ be a minimizing curve on Y connecting y and $y'$ . Consider the path $\zeta = \{ \mu _{y_{t}} : t \in [0, 1] \}$ as constructed in item (i). Since $\psi $ was taken as a minimizing curve and f is an isometry, $\zeta $ is minimizing.

Observe that every optimal transference plan between $\mu _y$ and $\mu _{y'}$ is supported on $\{ (x, x') \in \pi ^{-1}(y) \times \pi ^{-1}(y') ~:~ d^*(y, y') = d(x, x') \}$ . In fact, let $\gamma _{y \to y'}$ be an optimal transference plan for $\mu _y$ , $\mu _{y'}$ . Since supp $(\mu _y) \subset \pi ^{-1}(y)$ , supp $(\mu _{y'}) \subset \pi ^{-1}(y')$ and $\pi $ is 1-Lipschitz, we have $\gamma _{y \to y'}$ supported on $\{ (x, x') \in \pi ^{-1}(y) \times \pi ^{-1}(y') ~:~ d^*(y, y') \leq d(x, x') \}$ . Consider the set $\Upsilon := \{ (x, x') \in \pi ^{-1}(y) \times \pi ^{-1}(y') ~:~ d^*(y, y') < d(x, x') \}$ . If $\gamma _{y \to y'} (\Upsilon )>0$ , then

$$ \begin{align*} \gamma_{y \to y'} (\Upsilon)\,d^{*} (y, y')^2 \leq & ~\gamma_{y \to y'} (\Upsilon) \,d(x, x')^2 \\ < & \int_{\Upsilon} d(x, x')^2 \,d\gamma_{y \to y'}. \end{align*} $$

So,

$$ \begin{align*} d^{*} (y, y')^2 < & \int_{\Upsilon}\,d(x, x')^2 \,d\gamma_{y \to y'} + \gamma_{y \to y'} (X \times X \backslash \Upsilon) \,d^{*} (y, y')^2 \\ = & \int_{\Upsilon}\,d(x, x')^2 \,d\gamma_{y \to y'} + \int_{X \times X \backslash \Upsilon} d^*(y, y')^2 \,d\gamma_{y \to y'} \\ \leq & \int_{\Upsilon}\,d(x, x')^2 \,d\gamma_{y \to y'} + \int_{X \times X \backslash \Upsilon}\,d(x, x')^2 \,d\gamma_{y \to y'} \\ =& \int_{X \times X}\,d(x, x')^2\,d\gamma_{y \to y'}. \end{align*} $$

Then, $d^{*} (y, y')^2 < W_2 (\mu _{y}, \mu _{y'})^2$ , which is a contradiction.

Therefore, we have the transport between $\mu _y$ and $\mu _{y'}$ orthogonal to the leaves. Furthermore, since $\zeta $ is minimizing, the existence of the optimal transport plan is guaranteed.

Suppose without loss of generality that $\mu _y$ is absolutely continuous with respect to the volume measure of the leaf y. Then, the support of $\mu _y$ contains more than one point. Considering the transport problem described above, for each x in the support of $\mu _y$ such that $(x, x')$ is in the support of $\gamma _{y \to y'}$ for some $x'$ , each one of the intermediate leaves $y_t$ must contain a corresponding point $x_{y_t}$ . Thus, each leaf will have the distribution $\mu _{y(t)}$ absolutely continuous with respect to the volume measure of the respective leaf.

Remark 5.11. Since $\zeta $ is a displacement interpolation, it is a constant speed geodesic, by [Reference Ambrosio and GigliAG13, Theorem 2.10]. That is, the path in $(\mathscr {P}(X), W_{2})$ given by the disintegration map f evaluated at the minimal curve $\psi $ in Y is a constant speed geodesic in $\mathscr {P}_2(X)$ .

Remark 5.12. One could, for instance, take the disintegration map in the domain of weak continuity given by Proposition 5.2.

Remark 5.13. Theorem B(iii) holds, for instance, in the case of the disintegration of the volume measure in the solid torus (Example 3.2), or in the context of Examples 3.3, 5.5 and 5.7.

Acknowledgements

This work was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)—Finance Code 001. R.P. has been partially supported by São Paulo Research Foundation (FAPESP)—grant #2018/05309-3 and grant #2019/14724-7. C.S.R. has been partially supported by São Paulo Research Foundation (FAPESP)—grant #2016/00332-1, grant #2018/13481-0 and grant #2020/04426-6. The opinions, hypotheses and conclusions or recommendations expressed in this work are the responsibility of the authors and do not necessarily reflect the views of FAPESP. C.S.R. would also like to acknowledge support from the Max Planck Society, Germany, through the award of a Max Planck Partner Group for Geometry and Probability in Dynamical Systems. The authors are thankful to Pedro Catuogno, Rostislav Matveev, Florentin Münch and Ali Tahzibi for helpful discussions, and to the anonymous referee for valuable suggestions to improve the manuscript.

References

Ambrosio, L. and Gigli, N.. A User’s Guide to Optimal Transport (Lecture Notes in Mathematics, 2062). Springer-Verlag, Berlin, 2013.10.1007/978-3-642-32160-3_1CrossRef Google Scholar

Ambrosio, L., Gigli, N. and Savaré, G.. Gradient Flows in Metric Spaces and in the Space of Probability Measures. Birkhäuser Verlag, Basel, 2005.Google Scholar

Ambrosio, L.. Lecture notes on optimal transport problems. Mathematical Aspects of Evolving Interfaces (Lecture Notes in Mathematics, 1812). Springer, Berlin, 2000.Google Scholar

Ambrosio, L. and Pratelli, A.. Existence and Stability Results in the L1 Theory of Optimal Transportation (Lecture Notes in Mathematics, 1813). Springer, Berlin, 2003.10.1007/978-3-540-44857-0_5CrossRef Google Scholar

Butterley, O. and Melbourne, I.. Disintegration of invariant measures for hyperbolic skew products. Israel J. Math. 219 (2017), 171–188.10.1007/s11856-017-1477-zCrossRef Google Scholar

Chang, J. T. and Pollard, D.. Conditioning as disintegration. Stat. Neerl. 51(3) (1997), 287–317.10.1111/1467-9574.00056CrossRef Google Scholar

Dellacherie, C.. Ensembles analytiques. Théorèmes de séparation et applications. Séminaire de Probabilités (Lecture Notes in Mathematics, 465). Ed. P. A. Meyer. Springer, Berlin, 1975.Google Scholar

Dellacherie, C. and Meyer, P. A.. Probabilities and Potential (North-Holland Mathematics Studies, 29). North-Holland Publishing Company, Amsterdam, 1978.Google Scholar

Federer, H.. Geometric Measure Theory (Grundlehren der Mathematischen Wissenschaften, 153). Springer, New York, 1969.Google Scholar

Galatolo, S.. Statistical properties of dynamics: introduction to the functional analytic approach. Preprint, 2017, arXiv:1510.02615v2.Google Scholar

Galaz-García, F., Kell, M., Mondino, A. and Sosa, G.. On quotients of spaces with Ricci curvature bounded below. J. Funct. Anal. 275 (2018), 1368–1446.10.1016/j.jfa.2018.06.002CrossRef Google Scholar

Galatolo, S. and Lucena, R.. Spectral gap and quantitative statistical stability for systems with contracting fibers and Lorenz-like maps. Discrete Contin. Dyn. Syst. 40 (2020), 1309–1360.10.3934/dcds.2020079CrossRef Google Scholar

Granieri, L. and Maddalena, F.. Transport problems and disintegration maps. ESAIM Control Optim. Calc. Var. 19(3) (2013), 888–905.10.1051/cocv/2012037CrossRef Google Scholar

Jost, J.. Riemannian Geometry and Geometric Analysis. Springer-Verlag, Berlin, 2011.10.1007/978-3-642-21298-7CrossRef Google Scholar

Oliveira, K. and Viana, M.. Fundamentos da Teoria Ergódica. Sociedade Brasileira de Matemática, Rio de Janeiro, 2014.Google Scholar

Parthasarathy, K. R.. Probability Measures on Metric Spaces. Academic Press, New York, 1967.10.1016/B978-1-4832-0022-4.50006-5CrossRef Google Scholar

Rokhlin, V. A.. On the Fundamental Ideas of Measure Theory (American Mathematical Society Translation, 71). American Mathematical Society, Providence, RI, 1952.Google Scholar

Simmons, D.. Conditional measures and conditional expectation; Rohlin’s disintegration theorem. Discrete Contin. Dyn. Syst. 32(7) (2012), 2565–2582.10.3934/dcds.2012.32.2565CrossRef Google Scholar

Sturm, K. T.. On the geometry of metric measure spaces. Acta Math. 196 (2006), 65–131.10.1007/s11511-006-0002-8CrossRef Google Scholar

Tjur, T.. A Constructive Definition of Conditional Distributions (Institute of Mathematical Statistics, 13). University of Copenhagen, Copenhagen, 1975.Google Scholar

Varão, R.. Center foliation: absolute continuity, disintegration and rigidity. Ergod. Th. & Dynam. Sys. 36 (2016), 256–275.10.1017/etds.2014.53CrossRef Google Scholar

Villani, C.. Topics in Optimal Transportation (Graduate Studies in Mathematics, 58). American Mathematical Society, Providence, RI, 2003.10.1090/gsm/058CrossRef Google Scholar

Villani, C.. Optimal Transport: Old and New (Grundlehren der Mathematischen Wissenschaften, 338). Springer-Verlag, Berlin, 2009.10.1007/978-3-540-71050-9CrossRef Google Scholar

Von Neumann, J.. Zur Operatorenmethode in der klassischen Mechanik. Ann. of Math. (2) 33 (1932), 587–642.10.2307/1968537CrossRef Google Scholar

Figure 1 Representation of $\mathcal {F}^{s} = \{\{x\} \times D^{2} \}_{x \in S^{1}}$.

Figure 2 Transport plan 1.

Figure 3 Transport plan 2.

Figure 4 Transport plan 3.

Figure 5 Idea of a metric measure foliation as a space fibration.

Article contents

Geometric properties of disintegration of measures

Abstract

Keywords

MSC classification

1. Introduction

1.1. Main results

2. Spaces of probability measures, Wasserstein spaces and optimal transport

Definition 2.2. (Wasserstein distance)

Definition 2.3. (Wasserstein space)

Theorem 2.4. [Reference FedererFed69, Theorem 2.3.5]

Lemma 2.5. (Gluing lemma) [Reference VillaniVil09, Ch. 1]

3. Disintegration of measures

4. Disintegration maps

Definition 4.1. (Disintegration map)

Definition 4.3. (Disintegration map—product spaces)

5. Absolute continuity from disintegration maps

Acknowledgements

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests