Hostname: page-component-745bb68f8f-f46jp Total loading time: 0 Render date: 2025-01-24T07:29:01.353Z Has data issue: false hasContentIssue false

Geometric properties of disintegration of measures

Published online by Cambridge University Press:  11 October 2024

RENATA POSSOBON
Affiliation:
Institute of Mathematics, Department of Applied Mathematics, Universidade Estadual de Campinas, 13.083-859 Campinas, SP, Brazil (e-mail: re.possobon@gmail.com)
CHRISTIAN S. RODRIGUES*
Affiliation:
Institute of Mathematics, Department of Applied Mathematics, Universidade Estadual de Campinas, 13.083-859 Campinas, SP, Brazil (e-mail: re.possobon@gmail.com)
Rights & Permissions [Opens in a new window]

Abstract

In this paper, we study a connection between disintegration of measures and geometric properties of probability spaces. We prove a disintegration theorem, addressing disintegration from the perspective of an optimal transport problem. We look at the disintegration of transport plans, which are used to define and study disintegration maps. Using these objects, we study the regularity and absolute continuity of disintegration of measures. In particular, we exhibit conditions for which the disintegration map is weakly continuous and one can obtain a path of measures given by this map. We show a rigidity condition for the disintegration of measures to be given into absolutely continuous measures.

Type
Original Article
Copyright
© The Author(s), 2024. Published by Cambridge University Press

1. Introduction

The disintegration of a measure over a partition of the space on which it is defined is a way to rewrite this measure as a combination of probability measures, which are concentrated on the elements of the partition. As an example, consider the probability space $(X, \mathcal {F}, \mu )$ and its partition into a finite number of measurable subsets $P_{1}, \ldots , P_{n}$ with positive measure. A disintegration of $\mu $ with respect to (w.r.t.) this partition is a family of probabilities $\{\mu _1, \ldots \mu _n \}$ on X such that for $i=1, \ldots , n$ , we have $\mu _{i} (P_i) = 1$ and, for every measurable set $E \subset X$ , the conditional measures are given by $\mu _i (E) = {\mu (E \cap P_i)}/ {\mu (P_i)}$ . It is possible to write the original measure as a combination of the conditional ones

$$ \begin{align*} \mu(E) = \sum_{i=1}^{n} \mu (P_{i}) \mu_{i}(E) = \sum_{i=1}^{n} \mu (P_{i}) \frac{\mu(E \cap P_i)}{\mu(P_i)}. \end{align*} $$

More generally, consider a probability space $(X, \mathcal {F}, \mu )$ and a partition $\mathcal {P}$ of X into measurable subsets. Let $\varrho $ be the natural projection that associates each point $x \in X$ to the element $P \in \mathcal {P}$ which contains x. The measurable function $\varrho $ can be used to induce a probability $\hat {\mu }$ on $\mathcal {P}$ . A subset $B \subset \mathcal {P}$ is measurable if and only if $\varrho ^{-1}(B)$ is a measurable subset of X. Then, the family $\hat {\mathcal {B}}(\mathcal {P})$ of measurable subsets is a $\sigma $ -algebra on $\mathcal {P}$ . Let $\hat {\mu }$ denote the measure given by

$$ \begin{align*} \hat{\mu}(B) = \varrho_{*}\mu (B) := \mu \circ \varrho^{-1} (B) = \mu(\{x \in X : \varrho(x) \in B\}) \end{align*} $$

for every $B \in \hat {\mathcal {B}}(\mathcal {P})$ . In this case, $\hat {\mu }$ is called the law of $\varrho $ , denoted by $\mathrm {law}(\varrho )$ . A disintegration of $\mu $ w.r.t. $\mathcal {P}$ into conditional measures is a family $\{{\mu }_{P}:P\in \mathcal {P} \}$ of probability measures on X such that for every $E \in \mathcal {F}$ :

  1. (1) $\mu _{P}(P)=1$ for $\hat {\mu }$ -almost every (a.e.) $P \in \mathcal P$ ;

  2. (2) $P \mapsto \mu _{P}(E)$ is measurable;

  3. (3) $\mu (E)=\int \mu _{P}(E) ~d \hat {\mu }(P)$ .

There are several reasons why one may wish to study such possible combinations of measures. In ergodic theory, for example, the disintegration of a measure is directly related to the ergodic decomposition of invariant measures, which are crucial objects encoding the asymptotic behaviour of dynamical systems [Reference Oliveira and VianaOV14]. The concept of disintegration, however, appears in much broader context in areas, such as probability [Reference Chang and PollardCP97, Reference ParthasarathyPar67] and geometry [Reference SturmStu06], among others [Reference Butterley and MelbourneBM17, Reference Galatolo and LucenaGL20, Reference VarãoVar16, Reference VillaniVil03].

The idea of disintegrating a measure was devised by Von Neumann in 1932 [Reference Von NeumannVon32]. Since then, different versions of disintegration theorems have been presented and used, for example, in [Reference Ambrosio, Gigli and SavaréAGS05, Reference Ambrosio and PratelliAP03, Reference Dellacherie and MeyerDM78, Reference TjurTue75], to name a few. In particular, the well-known Rokhlin disintegration theorem shows that there exists a disintegration of $\mu $ relative to $\mathcal {P}$ if X is a complete separable metric space and $\mathcal {P}$ is a measurable partition. By measurable partition, we mean that there exists some measurable set $X_{0} \subset X$ such that $\mu (X)=\mu (X_{0})$ and $\mathcal {P} = \bigvee _{n=1}^{\infty } \mathcal P_{n} = \{ P_1 \cap P_2 \cap \cdots : P_n \in \mathcal {P}_n ~\text {for all}~ n \geq 1 \}$ restricted to $X_{0}$ , for an increasing sequence $\mathcal P_{1} \prec \mathcal P_{2} \prec \cdots \prec \mathcal P_{n} \prec \cdots $ of countable partitions [Reference RokhlinRok52].

Recently, Simmons has proposed a more general and subtle formulation of Rokhlin’s disintegration theorem, where he has considered any universally measurable space $(X, \mathcal {B}, \mu )$ and a measure space Y for which there exists an injective map $Y \to \{0, 1\}^{\mathbb N}$ . That is, Y is any subspace of the standard Borel space. He has shown that there is a (unique) system of conditional measures $(\mu _{y})_{y \in Y}$ , a disintegration of $\mu $ [Reference SimmonsSim12]. Then, his formulation is further developed to address $\sigma $ -finite measure spaces with absolutely continuous morphisms. One of the facts standing out in Simmons’ formulation is a fibre-wise perspective, which we wish to further explore.

Even though geometric properties of the space where a measure is defined and statistical properties obtained via disintegration theorems seem to be strictly connected, for instance, in foliated manifolds, very little geometric information is taken into account while studying disintegration of measures. In particular, intrinsic geometric properties of probability spaces are very often neglected. The purpose of this paper is to advertise the viewpoint of tackling disintegration of measures taking into consideration intrinsic structures of probability spaces obtained from optimal transport theory. To this end, we will formulate disintegration of probability measures in terms of a transportation problem to explore a fibre-wise formulation of a disintegration and its consequences.

1.1. Main results

As a first result in this paper, we prove a disintegration theorem, Theorem A, and we introduce a fibre-wise perspective on disintegration. Using this disintegration theorem, we study conditions in which one can obtain a path of conditional measures in the space of probability measures. In particular, in Propositions 5.2, 5.4 and 5.8, we investigate the weak continuity of the disintegration map, which parametrizes the conditional measures. The last result in this paper is Theorem B. Its first part shows how to construct a path of conditional measures in the space of probabilities. Its second part gives us a sort of rigidity result for disintegration of measures. Namely, we show that if one of the measures in this path is absolutely continuous, then all measures in the associated path must also be absolutely continuous. The third part is a particular case of absolute continuity in which the disintegration map is an isometry.

The paper is organized as follows. Section 2 contains the main concepts from optimal transport theory to be used throughout the paper. In §3, a disintegration theorem, stated as Theorem A, is proved. In §4, Theorem A is used to define and study what we called disintegration maps. These crucial objects are used in §5. There we proved a series of propositions about a disintegration map: in Proposition 5.2, we show that this map is nearly weakly continuous, under some assumptions on the reference measure $\nu $ . In Propositions 5.4 and 5.8, we study hypotheses about the disintegration for which this map is weakly continuous. Afterwards, we prove our main result in this section, Theorem B, about paths of measures given via disintegration maps and a rigidity condition establishing absolute continuity of measures in these paths.

2. Spaces of probability measures, Wasserstein spaces and optimal transport

In this section, we set up some notation to be used throughout the text. We also introduce some basic terminology from optimal transport theory, which is meant for readers not familiar with this area. Those who are skilled on the topic may wish to skip this section. Our main references for this section are [Reference Ambrosio, Gigli and SavaréAGS05, Reference AmbrosioAmb00, Reference VillaniVil09].

The cornerstone for the theory of optimal transportation is considered to be a logistic problem addressed by Gaspar Monge in 1781. The main idea is to transport masses from a given location to another one at minimal cost. To state it in a modern formulation, let $\mathscr {P}(X)$ be the set of all Borel probability measures on X. The problem amounts to the following.

Monge transport problem. Let X, Y be Radon spaces. Given measures $\mu \in \mathscr {P}(X)$ , $\nu \in \mathscr {P}(Y)$ and a fixed Borel cost function $c: X \times Y \to [0, \infty ]$ , minimize

$$ \begin{align*} T \mapsto \int_{X} c(x, T(x)) ~d\mu \end{align*} $$

among all maps T such that $T_{*}\mu = \nu $ .

The maps T fulfilling $T_{*}\mu = \nu $ are called transport maps. The Monge transport problem actually may be ill-posed and such a map does not need to exist. That is the case, for example, when one of the measures is a Dirac mass and the other one is not. A way around is given by a different formulation as proposed by Kantorovich.

Monge–Kantorovich transport problem. Let X, Y be Radon spaces. Given measures $\mu \in \mathscr {P}(X)$ , $\nu \in \mathscr {P}(Y)$ and a fixed Borel cost function $c: X \times Y \to [0, \infty ]$ , minimize

(1) $$ \begin{align} \gamma \mapsto \int_{X \times Y} c(x, y) ~d\gamma(x, y) \end{align} $$

among all measures $\gamma \in \mathscr {P}(X \times Y)$ with marginals $\mu $ and $\nu $ , i.e. $\gamma $ satisfying $(\mathrm {proj}_{X})_{*}\gamma =\mu $ and $(\mathrm {proj}_{Y})_{*}\gamma =\nu $ , where $\mathrm {proj}_{X}$ and $\mathrm {proj}_{Y}$ are the canonical projections $(x, y) \mapsto x$ and $(x, y) \mapsto y$ , respectively.

The measures $\gamma \in \mathscr {P}(X \times Y)$ are called transport plans. We denote the set of all transport plans with marginals $\mu $ and $\nu $ by $\Pi (\mu , \nu )$ . The value $C(\mu , \nu ) = \inf _{\gamma \in \Pi (\mu , \nu )} \int _{X \times Y} c(x, y) ~d\gamma (x, y)$ is called optimal cost. Note that $(\mathrm {proj}_{X})_{*}\gamma =\mu $ is equivalent to $\gamma [A \times Y]=\mu [A]$ for every $A \in \mathcal {B}(X)$ , and $(\mathrm {proj}_{Y})_{*}\gamma =\nu $ is equivalent to $\gamma [X \times B]=\nu [B]$ for every $B \in \mathcal {B}(Y)$ . Moreover, it is possible to describe this problem in terms of the coupling of measures, in the following sense.

Definition 2.1. Let $(X, \mu )$ and $(Y, \nu )$ be probability spaces. A coupling of $(\mu , \nu )$ is a pair $(\mathcal {X}, \mathcal {Y})$ of measurable functions in a probability space $(\Omega , \mathbb {P})$ such that law $(\mathcal {X})=\mu $ and law $(\mathcal {Y})=\nu $ .

Considering $\Omega = X \times Y$ , coupling $\mu $ and $\nu $ means to construct $\gamma \in \mathscr {P}(X \times Y)$ with marginals $\mu $ and $\nu $ . Then, the Monge–Kantorovich transport problem can be understood as the minimization of the total cost, over all possible couplings of $(\mu , \nu )$ .

Whenever these problems are stated in metric spaces, we may choose the cost function to be the distance function itself, which in turn allows us to introduce a distance function between measures.

Definition 2.2. (Wasserstein distance)

Let $(X, d)$ be a separable complete metric space. Consider probability measures $\mu $ and $\nu $ on X and $p \in [1, \infty )$ . The Wasserstein distance of order p between $\mu $ and $\nu $ is given by

$$ \begin{align*} W_{p}(\mu, \nu):= \bigg( \inf\limits_{\gamma \in \Pi(\mu, \nu)} \int d(x_{1}, x_{2})^{p} ~d\gamma (x_{1}, x_{2})\bigg)^{{1}/{p}}. \end{align*} $$

In general, $W_{p}$ is not a distance in the strict sense, because it can take the value $+\infty $ . To rule this situation out, it is natural to constrain $W_{p}$ to a subset in which it takes finite values.

Definition 2.3. (Wasserstein space)

The Wasserstein space of order p is defined by

$$ \begin{align*} \mathscr{P}_{p}(X):= \bigg\{ \mu \in \mathscr{P}(X) : \int d(x, \tilde{x})^{p} \mu(dx) < +\infty \bigg\} \end{align*} $$

for $\tilde {x} \in X$ arbitrary.

Therefore, $W_{p}$ sets a (finite) distance on $\mathscr {P}_{p}(X)$ . It turns out that if $(X, d)$ is a complete separable metric space and d is bounded, then the p-Wasserstein distance metrizes the weak topology over $\mathscr {P}(X)$ [Reference VillaniVil09, Corollary 6.13]. Furthermore, if X is a complete metric space, then so is $\mathscr {P}(X)$ with the p-Wasserstein distance [Reference VillaniVil09, Theorem 6.18].

Although the Wasserstein distance is defined for every $p \geq 1$ , in this paper, we will chose either $p=1$ or $p=2$ , as stated later on. The reason is that for $W_{1}$ , we can explicitly compute distance bounds, while for $W_{2}$ , the space $\mathscr {P}_{2}(X)$ inherits geometric properties of the space X. The choice is always indicated throughout the text.

We finish this section recalling two well-known results for later use. The first one is regarding measurable and continuous functions, the Lusin theorem in a specific form. The other one gives us a ‘recipe’ to glue different couplings.

Theorem 2.4. [Reference FedererFed69, Theorem 2.3.5]

Let M be a locally compact metric space and N a separable metric space. Consider $\mu $ a Borel measure on M, $A \subset M$ a measurable set with finite measure and $f : M \to N$ a measurable map. Then, for each $\delta> 0$ , there is a closed set $K \subset A$ , with $\mu (A \backslash K) < \delta $ , such that the restriction of f to K is continuous.

Lemma 2.5. (Gluing lemma) [Reference VillaniVil09, Ch. 1]

Let $(X_{i}, \mu _{i})$ , $i = 1, 2, 3$ , be complete separable metric probability spaces. If $(\mathcal {X}_1,\mathcal {X}_2)$ is a coupling of $(\mu _{1}, \mu _{2})$ and $(\mathcal {Y}_2, \mathcal {Y}_3)$ is a coupling of $(\mu _{2}, \mu _{3})$ , then one can construct a triple of random variables $(\mathcal {Z}_{1}, \mathcal {Z}_{2}, \mathcal {Z}_{3})$ such that $(\mathcal {Z}_{1}, \mathcal {Z}_{2})$ has the same law as $(\mathcal {X}_1,\mathcal {X}_2)$ , and $(\mathcal {Z}_{2}, \mathcal {Z}_{3})$ has the same law as $(\mathcal {Y}_2, \mathcal {Y}_3)$ . If $\mu _{12}$ stands for the law of $(\mathcal {X}_1,\mathcal {X}_2)$ on $X_1 \times X_2$ and $\mu _{23}$ stands for the law of $(\mathcal {X}_2,\mathcal {X}_3)$ on $X_2 \times X_3$ , then to construct the joint law $\mu _{123}$ of $(\mathcal {Z}_{1}, \mathcal {Z}_{2}, \mathcal {Z}_{3})$ , one just has to glue $\mu _{12}$ and $\mu _{23}$ along their common marginal $\mu _{2}$ .

3. Disintegration of measures

To grasp some of the properties of the probability spaces while studying disintegration of measures, we would like to associate the latter with the optimal transport theory. Before doing so, we prove the following disintegration theorem. Our proof is based on the idea of choosing a dense subset of a vector space using separability. Then, we extend a tailor-made linear functional to the whole space, which is implicitly used in the proof of [Reference Dellacherie and MeyerDM78, III-70], although our conditions are different.

Theorem A. Let X and Y be locally compact and separable metric spaces. Let $\pi{\kern-1pt} :{\kern-1pt} X {\kern-1pt}\to{\kern-1pt} Y$ be a Borel map and take $\mu \in \mathcal {M}_{+}(X)$ , where $\mathcal {M}_{+}(X)$ is the set of all positive and finite Radon measures on X. Define $\nu = \pi _{*}\mu $ in $\mathcal {M}_{+}(Y)$ . Then, there exist measures $\mu _{y} \in \mathcal {M}_{+}(X)$ such that:

  1. (1) $y \mapsto \mu _{y}$ is a Borel map and $\mu _{y} \in \mathscr {P}(X)$ for $\nu $ -a.e. $y \in Y$ ;

  2. (2) $\mu = \nu \otimes \mu _{y}$ , that is, $\mu (A)= \int _{Y} \mu _{y} (A) ~d\nu (y)$ for every $A \in \mathcal {B}(X)$ ;

  3. (3) $\mu _{y}$ is concentrated on $\pi ^{-1}(y)$ for $\nu $ -a.e. $y \in Y$ .

Proof. We shall first consider the disintegration of measures on compact metric spaces. Then, we tackle the general case as it is stated.

Step 1: To get started, consider X to be a compact metric space with its Borel $\sigma $ -algebra $\mathcal {B}(X)$ and let $\mu $ be a Radon measure on X. If $(Y, \mathcal {E})$ is a measurable space, we define a measurable map $q: \mathcal {B}(X) \to \mathcal {E}$ so that we set $\nu =q_{*}\mu $ . Let $C(X)$ be the set of all continuous real functions $\omega :X \to \mathbb {R}$ . Then, for each $\omega \in C(X)$ , we associate a measure $\unicode{x3bb} $ given by

$$ \begin{align*} \unicode{x3bb}(A)= \int\limits_{q^{-1}(A)} \omega (x) ~d\mu (x) \end{align*} $$

for every $A \in \mathcal {E}$ . The measure $\unicode{x3bb} $ is absolutely continuous w.r.t. $\nu $ . Indeed, for every $A \in \mathcal {E}$ , we have that $\nu (A) = \mu (q^{-1}(A)) = 0$ implies $\unicode{x3bb} (A)=0$ . Therefore, since $\nu $ and $\unicode{x3bb} $ are positive measures and $\unicode{x3bb} \ll \nu $ , by the Radon–Nikodym theorem, there exists $h: Y \to [0, \infty ]$ , the density of $\unicode{x3bb} $ w.r.t. $\nu $ , such that $\unicode{x3bb} (A) = \int _{A} h ~d\nu $ for $A \in \mathcal {E}$ . Thus,

(2) $$ \begin{align} \int\limits_{A} h(y) ~d\nu(y) = \int\limits_{q^{-1}(A)} \omega(x) ~d\mu(x). \end{align} $$

Recall that $C(X)$ is a separable space with the supremum norm. Let $\mathcal {H} = \{ \omega _1 \equiv 1, \omega _2, \omega _3, \ldots \}$ be a dense subset of $C(X)$ . Suppose, without loss of generality, that $\mathcal {H}$ is a vector space over $\mathbb {Q}$ . Then, for each $n \in \mathbb {N}$ , we consider the Radon–Nikodym density $h_{n}$ associated with $\omega _n$ given by equation (2) so that for each $A \in \mathcal {E}$ ,

(3) $$ \begin{align} \int\limits_{A} h_{n}(y) ~d\nu(y) = \int\limits_{q^{-1}(A)} \omega_n(x) ~d\mu(x). \end{align} $$

Note that $h_{n} \geq 0$ almost always (a.a.), if $\omega _n \geq 0$ , since $\nu =q_{*}\mu $ .

Step 2: We will use the associated densities to construct a linear functional which will be extended using the Hahn–Banach theorem. To do so, we denote by $\mathbb {A}$ the set of all $y \in Y$ , such that, if $\omega _i= \alpha \omega _j + \beta \omega _k$ , then we have for their associated densities that the relation $h_{i}(y) = \alpha h_{j}(y) + \beta h_{k}(y)$ holds true, where $\alpha , \beta \in \mathbb {Q}$ and the associated density to $\omega _{1}$ is set to $h_{1}(y) = 1$ . The set $\mathbb {A}$ is measurable and $\nu (\mathbb {A})=1$ . Indeed,

$$ \begin{align*} \int_{q^{-1}(Y)} \omega_i(x) ~d\mu &= \int_{q^{-1}(Y)} (\alpha \omega_j + \beta \omega_k)(x)\,d\mu \\&= \alpha \bigg(\! \int_{q^{-1}(Y)}\! \omega_j(x) \,d\mu \!\bigg) + \beta \bigg(\! \int_{q^{-1}(Y)} \!\omega_k(x) ~d\mu \!\bigg),\! \text{ which by equation }(3),\\&= \alpha \bigg( \int_Y h_{j}(y) ~d\nu \bigg) + \beta \bigg( \int_Y h_{k}(y) ~d\nu \bigg) \\&= \int_Y \alpha h_{j}(y) + \beta h_{k}(y) ~d\nu, \end{align*} $$

so $ \int h_{i}(y)\,d\nu = \int \alpha h_{j}(y) + \beta h_{k}(y)\,d\nu $ . Consequently, $h_{i}(y) = \alpha h_{j}(y) + \beta h_{k}(y) \nu $ -a.a. Therefore, whenever $\omega _{n}$ is a linear combination of elements of $\mathcal {H}$ , then the associated densities defined by equation (3) can also be written point-wise as linear combinations of Radon–Nikodym densities.

Step 3: For each $y \in \mathbb {A}$ , we define the functional $\tilde {\varphi _y}: \mathcal {H} \to \mathbb {R}$ , given by $\tilde {\varphi _y}(\omega _n):=h_{n}(y)$ . Note that $\tilde {\varphi _y}: \mathcal {H} \to \mathbb {R}$ is $\mathbb {Q}$ -linear with $\| \tilde {\varphi _y} \| \leq 1$ and, since $\tilde {\varphi _y} (1)=1$ , we have actually that $\| \tilde {\varphi _y} \| = 1$ . Therefore, by the Hahn–Banach theorem, $\tilde {\varphi _y}$ can be extended to a continuous positive linear functional $\varphi _y: C(X) \to \mathbb {R}$ , with $\| \varphi _y \| = 1$ . Furthermore, the Riesz–Markov–Kakutani representation theorem assures that there exists a unique Radon measure $\mu _y$ on X such that $\varphi _y(\omega )= \int \omega ~d \mu _y$ for every $\omega \in C(X)$ and $\varphi _y (1)=1$ . Note that $\int \omega ~d \mu _y \leq 1$ , since $\| \varphi _y \| = 1$ . Thus, we conclude that $\mu _y$ is a probability measure. Note also that $\mu _y$ is supported on $q^{-1} \{ y \in \mathbb {A}\}$ . For $y \notin \mathbb {A}$ , consider $\mu _y = 0$ .

Step 4: Observe that $y \mapsto \int _{X} \omega _n\,d\mu _y$ is $\mathcal {E}$ -measurable for every $\omega _n \in \mathcal {H}$ and

$$ \begin{align*} \int_{Y} \int_{q^{-1}(y)} \omega_n(x) \,d\mu_y ~d\nu = \int_{Y} \varphi_y(\omega_n) \,d\nu = \int_{Y} h_{n}(y)\,d\nu = \int_{X} \omega_n(x)\,d \mu. \end{align*} $$

By the definition of $\mathcal {H}$ , we have that for each $\omega \in C(X)$ , there exists a sequence $(\omega _{i})_{i}$ , with $\omega _i \to \omega $ uniformly. So, by uniform convergence, we have that $y \mapsto \int _{X} \omega ~d\mu _y$ is $\mathcal {E}$ -measurable for every $\omega \in C(X)$ . Furthermore,

$$ \begin{align*} \int_{Y} \int_{q^{-1}(y)} \omega(x)\,d\mu_y \,d\nu = \int_{X} \omega(x)\,d \mu. \end{align*} $$

The same holds true for any bounded and $\mathcal {B}(X)$ -measurable function $\omega $ . Indeed, denote by $\mathscr {C}$ the class of functions such that $y \mapsto \int _{X} \omega \,d\mu _y$ is $\mathcal {E}$ -measurable and $\int _{Y} \int _{q^{-1}(y)} \omega (x)\,d\mu _y \,d\nu = \int _{X} \omega (x)\,d \mu $ . Note that $C(X) \subset \mathscr {C}$ , from what was shown before. If $A \subset X$ is an open set, then . So, let $\mathcal {D}$ be the set of all characteristic functions in $\mathscr {C}$ . If for $n \in \mathbb {N}$ , we have that . If , we have that . Thus, the class of measurable sets whose characteristic functions are in $\mathcal {D}$ is $\mathcal {B}(X)$ . Therefore, the result follows by monotone convergence. This completes the proof of Theorem A for compact spaces.

Step 5: Let X be a locally compact and separable metric space. Then, the set of continuous real-valued functions with compact support on X, denoted by $C_c(X)$ , is a vector space. Such a vector space can be seen as the union of the spaces $C_c(K_{i})$ of continuous functions with support on compact sets $K_{i}$ . Since $\mu $ is a Radon measure on X, the map $\varphi : C_c(X) \to \mathbb {R}$ , such that, $\omega \mapsto \int _{X} \omega (x)\,d\mu $ , is a continuous positive linear map. Note also that $\mu $ is supported on a set $\tilde {\mathcal {K}}$ , which is a countable union of compact subsets $K_{i} \subset X$ . So, we can imbed X into a compact metric space $\mathcal {K}$ and identify $\mu $ with a measure on $\mathcal {K}$ with support on $\tilde {\mathcal {K}}$ , and construct the measures $\mu _y$ as above. Consider $\mu _{y}=0$ for y such that $\pi ^{-1}(y) \notin \tilde {\mathcal {K}}$ . Thus, there exist probability measures $\mu _{y}$ on X such that each $\mu _{y}$ is supported on $\pi ^{-1}(y)$ and, for every $\omega \in C_c(X)$ ,

(4) $$ \begin{align} \int_{Y} \int_{\pi^{-1}(y)} \omega(x)\,d\mu_y \,d\nu = \int_{X} \omega(x)\,d \mu. \end{align} $$

Since Y is a locally compact and separable metric space and $\pi : X \to Y$ is a Borel map, then $y \mapsto \mu _y$ is a Borel map for $\nu $ -a.e. $y \in Y$ . Furthermore, note that equation (4) is equivalent to say that $\mu (A)= \int _{Y} \mu _{y} (A)\,d\nu (y)$ for every $A \in \mathcal {B}(X)$ and $\mu _{y}$ is concentrated on $\pi ^{-1}(y)$ for $\nu $ -a.e. $y \in Y$ . This concludes the proof.

Many interesting examples arise when we consider Theorem A for the case of product spaces with the Borel map $\pi $ as the canonical projection on the first component, as follows.

Corollary 3.1. Let X and Y be locally compact and separable metric spaces. Let $\mathrm {proj}_{X}: X \times Y \to X $ be the canonical projection on the first component, take $\gamma \in \mathcal {M}_{+}(X \times Y)$ and set $\mu = {\mathrm {proj}_{X}}_{*}\gamma \in \mathcal {M}_{+}(X)$ . Then, there exist measures $\gamma _{x} \in \mathcal {M}_{+}(X \times Y)$ such that:

  1. (1) $x \mapsto \gamma _{x}$ is a Borel map and $\gamma _{x} \in \mathscr {P}(X \times Y)$ for $\mu $ -a.e. $x \in X$ ;

  2. (2) $\gamma = \mu \otimes \gamma _{x}$ , i.e. $\gamma (A)= \int _{X} \gamma _{x} (A) ~d\mu (x)$ for every $A \in \mathcal {B}(X \times Y)$ ;

  3. (3) $\gamma _{x}$ is concentrated on $\mathrm {proj}_{X}^{-1} (x)$ for $\mu $ -a.e. $x \in X$ .

In fact, since $\gamma _{x}$ is concentrated on $\mathrm {proj}_{X}^{-1}(x)=\{x\} \times Y$ , we can consider each $\gamma _{x}$ as a measure on Y, writing $\gamma (B)=\int _{X} \gamma _{x}(\{y : (x, y) \in B \}) ~d\mu $ for every $B \in \mathcal {B}(X \times Y)$ , adding a different point of view to disintegration. The following example illustrates this case of disintegration of measures.

Example 3.2. Consider a solid torus $S^{1} \times D^{2}$ . Let $\mathcal {F}^{s} = \{\{x\} \times D^{2} \}_{x \in S^{1}}$ be a foliation of $S^{1} \times D^{2}$ , as represented in Figure 1. Given a measure $\gamma \in \mathscr {P}(S^{1} \times D^{2})$ , let $\mathrm {proj}_{S^1}: S^{1} \times D^{2} \to S^{1}$ be the canonical projection on the first component and set $\mu ={\mathrm {proj}_{S^1}}_{*}\gamma $ . Theorem A, with $\pi := \mathrm {proj}_{S^1}$ , gives us a disintegration $\{ \gamma _{x} : x \in S^{1} \}$ of $\gamma $ along the leaves. Since the measures $\gamma _{x}$ are concentrated on ${\mathrm {proj}}_{S^1}^{-1}(x)=\{x\} \times D^2$ for $\mu $ -a.e. $x \in S^{1}$ , we can consider each $\gamma _{x}$ as a measure on $D^{2}$ . That is, we can define a probability on $D^2$ for each $x \in S^1$ .

Figure 1 Representation of $\mathcal {F}^{s} = \{\{x\} \times D^{2} \}_{x \in S^{1}}$ .

The point of view from Corollary 3.1 is, somehow, a generalization of cases of disintegration of a probability measure along leaves in a foliated compact Riemannian manifold, as in Example 3.2. In this sense, we remark that different versions of disintegration theorems can be related by taking suitable hypotheses, as we do in the following example.

Example 3.3. Let $M_1$ and $M_2$ be compact Riemannian manifolds and set the product space $\Sigma = M_{1} \times M_{2}$ . Let $\mathcal {F}^{s}=\{ \{x\} \times M_{2} \}_{x \in M_{1}}$ be a foliation of $\Sigma $ . Given $\gamma \in \mathscr {P}(M_1 \times M_2)$ , let ${\mathrm {proj}}_{M_{1}} : \Sigma \to M_{1}$ be the canonical projection on $M_{1}$ and set $\mu ={{\mathrm {proj}}_{M_{1}}}_{*}\gamma $ . By Theorem A, there exists a family $\{ \gamma _{x} : x \in M_{1} \} \subset \mathscr {P}(M_{1} \times M_{2})$ such that:

  1. (1) $x \mapsto \gamma _{x}$ is a Borel map and $\gamma _{x} \in \mathscr {P}(M_{1} \times M_{2})$ for $\mu $ -a.e. $x \in M_{1}$ ;

  2. (2) $\gamma (A)= \int _{M_{1}} \gamma _{x} (A)\,d\mu (x)$ for every $A \in \mathcal {B}(M_{1} \times M_{2})$ ;

  3. (3) $\gamma _{x}$ is concentrated on $\mathrm {proj}_{M_{1}}^{-1} (x)$ for $\mu $ -a.e. $x \in M_{1}$ .

Furthermore, we can consider each $\gamma _{x}$ as a probability on $M_{2}$ , as stated above. Note that this result agrees with Rokhlin’s disintegration theorem. In fact, let $\varrho : \Sigma \to \mathcal {F}^{s}$ be a map that associates each point $(x, y) \in \Sigma $ to the $\zeta $ element of $\mathcal {F}^{s}$ that contains $(x, y)$ . Consider $\hat {\gamma }=\varrho _{*}\gamma $ . Note that $\mathcal {F}^{s}$ is a measurable partition of $\Sigma $ and $\Sigma $ is a complete separable metric space. So, Rokhlin’s disintegration theorem describes a disintegration of $\gamma $ relative to $\mathcal {F}^{s}$ by a family $\{ \gamma _{\zeta } : \zeta \in \mathcal {F}^{s} \}$ , such that, for $E \subset \Sigma $ measurable set:

  1. (1) $\gamma _{\zeta }(\zeta )=1$ for $\hat {\gamma }$ -a.e. $\zeta \in \mathcal {F}^{s}$ ;

  2. (2) $\zeta \mapsto \gamma _{\zeta }(E)$ is measurable;

  3. (3) $\gamma (E)=\int \gamma _{\zeta }(E) \,d\hat {\gamma }(\zeta )$ .

Rewrite $\mathcal {F}^{s}$ by $\{ \zeta _{x} \}_{x \in M_{1}}$ , where $\zeta _{x}=\{ x \} \times M_{2}$ for each $x \in M_{1}$ , and consider $\gamma _{x}'={{\mathrm {proj}}_{M_1}}_{*}\gamma $ . For each $x \in M_{1}$ , let $\gamma _{\zeta _{x}}$ be the restriction of $\gamma _{\zeta }$ to $\zeta = \{ x \} \times M_{2}$ . Note that $\varrho ^{-1}(\zeta )= \{ x \} \times M_{2} = \zeta _{x} = {{\mathrm {proj}}_{M_1}}^{-1}(x)$ , so

$$ \begin{align*} \int \gamma_{\zeta}(E) d\hat{\gamma}(\zeta) &= \int \gamma_{\zeta}(E) \gamma(\varrho^{-1}\,(d\zeta)) \\ &= \int_{M{1}} \gamma_{\zeta_{x}}(E \cap \zeta_{x}) \gamma({{\mathrm {proj}}_{M_1}}^{-1}\,(dx)) \\ &= \int_{M{1}} \gamma_{\zeta_{x}}(E \cap \zeta_{x})\,d \gamma_{x}' \end{align*} $$

and then

$$ \begin{align*} \gamma(E)=\int_{M{1}} \gamma_{\zeta_{x}}(E \cap \zeta_{x})\,d \gamma_{x}'. \end{align*} $$

Moreover, note that $\gamma _{\zeta _{x}}$ is supported in ${{\mathrm {proj}}}_{M_1}^{-1}(x) = \zeta _{x}$ . Hence, we have a disintegration $\{\gamma _{\zeta _{x}} : \zeta _{x} \in \mathcal {F}^{s} \}$ along the leaves of $\Sigma $ associated to $\gamma _{x}'$ . In addition, it is possible to identify ${{\mathrm {proj}}}_{M_1}^{-1}(x)$ with $M_{2}$ , and write this disintegration by $\{\gamma _{x} : x \in M_1 \} \subset \mathscr {P}(M_{2})$ , as desired.

Such an example is one of the possible roles of disintegration of measures. In dynamics, for instance, this kind of disintegration appears in several contexts. To name a few, in [Reference Butterley and MelbourneBM17], the regularity of this kind of disintegration is investigated considering invariant measures for hyperbolic skew products. Specifically, for this purpose, a function that associates each x in X with a probability measure $\gamma _x$ obtained via a disintegration of $\gamma $ is analysed. In the next section, we will see that this type of application can be thought in a more general framework and it has important properties. We can also cite [Reference GalatoloGal17, Reference Galatolo and LucenaGL20], where the disintegration of Example 3.2 is used to study the behaviour of the transfer operator in a solenoidal map.

We can actually obtain uniqueness and absolute continuity of the disintegration in the context of Theorem A. If $\unicode{x3bb} \in \mathcal {M}_{+}(Y)$ is another measure, such that there exists a Borel map $y \mapsto \eta _{y}$ for which $\mu = \unicode{x3bb} \otimes \eta _{y}$ , with $\eta _{y}$ concentrated on $\pi ^{-1}(y)$ for $\mu $ -a.e. $y \in Y$ , then $\unicode{x3bb} |_{C} \ll \nu $ , where $C=\{ y \in Y : \eta _{y}(X)>0 \}$ and $\eta _{y} \ll \mu _{y}$ for $\nu $ -a.e. $y \in Y$ . See the details in [Reference Ambrosio and PratelliAP03]. In the following proposition, we focus on the case of product spaces. Taking $\gamma \in \mathcal {M}_{+}(X \times Y)$ with $\mu ={\mathrm {proj}_{X}}_{*}\gamma $ , we obtain uniqueness of $\gamma _{x}$ and $\mu $ in $\gamma =\mu \otimes \gamma _{x}$ .

Proposition 3.4. Let $X \times Y$ , X be locally compact and separable metric spaces. Let $\mathrm {proj}_{X}: X \times Y \to X$ be the projection on the first component and consider $\gamma \in \mathcal {M}_{+}(X \times Y)$ , $\nu \in \mathcal {M}_{+}(X)$ and $\mu ={\mathrm {proj}_{X}}_{*}\gamma $ . Let $x \mapsto \eta _x$ be a Borel $\mathcal {M}_{+}$ -valued map defined on X such that:

  1. (1) $\gamma = \nu \otimes \eta _x$ ;

  2. (2) $\eta _x$ is concentrated on $\mathrm {proj}_{X}^{-1}(x)$ for $\nu $ -a.e. $x \in X$ .

Then, the $\eta _{x}$ are uniquely defined $\nu $ -a.a. by conditions (1) and (2). Moreover, for $C=\{ x \in X: \eta _x(X \times Y)>0 \}$ , $\nu |_C$ is absolutely continuous w.r.t. $\mu $ . In particular, $({\nu |_C}/{\mu }) \eta _x = \gamma _{x}$ for $\mu $ -a.e. $x \in X$ , where $\gamma _{x}$ are the conditional probabilities as in Corollary 3.1.

Proof. Let $\eta _{x}$ and $\eta _{x}'$ be measures satisfying conditions (1) and (2). Let $(A_{n})_{n}$ be a sequence of open sets such that the finite intersection is also an open set which generates $\mathcal {B}(X \times Y)$ , the Borel $\sigma $ -algebra of $X \times Y$ . Consider $B \in \mathcal {B}(X \times Y)$ and $A=A_n \cap ~{\mathrm {proj}}_{X}^{-1}(B)$ for any $n \in \mathbb {N}$ . By condition (1), we have that

$$ \begin{align*} \gamma(A)=\int_{X} \eta_{x}(A)\,d\nu = \int_{B} \eta_{x}(A_n)\,d \nu \end{align*} $$

and

$$ \begin{align*} \gamma(A)=\int_{X} \eta_{x}' (A)\,d\nu = \int_{B} \eta_{x}' (A_n)\,d \nu. \end{align*} $$

Therefore,

$$ \begin{align*} \int_{B} \eta_{x}(A_n)\,d \nu = \int_{B} \eta_{x}' (A_n)\,d \nu. \end{align*} $$

Given that B is arbitrary, then $\eta _{x}(A_n) = \eta _{x}'(A_n)$ for $\nu $ -a.e. $x \in X$ and for any $n \in \mathbb {N}$ . So, there exists a set $N \subset X$ , with $\nu (N)=0$ such that $\eta _{x}(A_n) = \eta _{x}'(A_n)$ for any $n \in \mathbb {N}$ , $x \in (X-N)$ . Hence, for $\nu $ -a.e. $x \in X$ , $\eta _{x} = \eta _{x}'$ .

Let us denote $C=\{ x \in X : \eta _x(X \times Y)>0 \}$ . Let $\mathcal {G} \subset C$ be such that $\mu (\mathcal {G})=0$ . So, ${\mathrm {proj}}_{X}^{-1}(\mathcal {G})$ is such that $\gamma ({\mathrm {proj}}_{X}^{-1}(\mathcal {G}))=0$ . Therefore, condition (2) implies

$$ \begin{align*} 0=\int_{X} \eta_{x}({\mathrm {proj}}_{X}^{-1}(\mathcal{G}))\,d\nu = \int_{\mathcal{G}} \eta_{x} (X \times Y)\,d\nu. \end{align*} $$

Since $\eta _{x} (X \times Y)>0$ in $C \supset \mathcal {G}$ , we have that $\nu (\mathcal {G})=0$ . That is, $\nu |_{C} \ll \mu $ . Moreover, writing $\nu |_{C}=f \mu $ implies that $\gamma = f \mu \otimes \eta _{x}$ . However, by Corollary 3.1, $\gamma = \mu \otimes \gamma _{x}$ and then $({\nu |_{C}}/{\mu }) \eta _{x} = \gamma _{x}$ .

4. Disintegration maps

From the optimal transport perspective, Theorem A in fact deals with the disintegration of transport plans. In this sense, it is possible to define a function from Y to $\mathscr {P}(X)$ with certain properties which actually establishes the link between disintegration of measures and the geometric properties of the measure spaces. We will call such an object the disintegration map.

Definition 4.1. (Disintegration map)

Let X and Y be locally compact and separable metric spaces. Consider a measure $\mu \in \mathcal {M}_{+}(X)$ , a Borel map $\pi : X \to Y$ and a disintegration of $\mu $ given by Theorem A, so that $\mu = \nu \otimes \mu _y$ . We define the disintegration map:

$$ \begin{align*} f: Y &\to (\mathscr{P}(X), W_{p}) \\ y & \mapsto \mu_{y}, \end{align*} $$

such that $\mu = \nu \otimes f(y)$ , where $W_{p}$ is the p-Wasserstein distance.

Remark 4.2. To clarify which measures are associated with the disintegration map, we will say that ‘f is a disintegration map of $\mu $ w.r.t. $\nu $ ’.

Although we may define such a map to a general $W_{p}$ , our main interest will be when either $p=1$ or $p=2$ . On the one hand, for $p=1$ , there is an explicit formula for the Wasserstein distance which we can apply to study the disintegration of measures in product spaces. On the other hand, when $p=2$ , the theory of optimal transport allows for a geometric characterization of $\mathscr {P}_{2}(X)$ in terms of the geometric properties of X. More precisely, the study of geodesics defined in $\mathscr {P}_{2}(X)$ and the convexity properties of certain functionals along these geodesics play a crucial role in the metric theory of gradient flows, which allows us to infer geometric properties of X itself. We shall address the case when $p=2$ in the next section. Before that, we use the disintegration maps to further characterize the disintegration of measures in product spaces. In this section, consider the following definition of a disintegration map.

Definition 4.3. (Disintegration map—product spaces)

Let X and Y be locally compact and separable metric spaces. Consider $\gamma \in \mathscr {P}(X \times Y)$ and a disintegration of $\gamma $ given by Corollary 3.1, so that $\gamma = \mu \otimes \gamma _{x}$ . In this case, the disintegration map reads as

$$ \begin{align*} f: X &\to (\mathscr{P}(Y), W_{1}) \\ x & \mapsto \gamma_{x} \end{align*} $$

such that $\gamma = \mu \otimes f(x)$ .

In the lines of [Reference Ambrosio and PratelliAP03, Reference Granieri and MaddalenaGM13], we start by showing the following.

Proposition 4.4. A map $f: X \to (\mathscr {P}(Y), W_{1})$ is a disintegration map if and only if it is Borel.

Proof. Denote by $\text {Lip}_{1}(Y)$ the set of Lipschitz functions whose Lipschitz constants are less than or equal to $1$ . By the Arzelà–Ascoli theorem, the space $\text {Lip}_{1}(Y)$ is compact with respect to the uniform convergence [Reference Ambrosio, Gigli and SavaréAGS05, Proposition 3.3.1]. Let $D \subset \text {Lip}_{1}(Y)$ be a countable dense subset and take $\varphi \in \text {Lip}_{1}(Y)$ . If the measures $\nu _{1}$ , $\nu _{2} \in \mathscr {P}(Y)$ have bounded support, we can use the duality formula to obtain

$$ \begin{align*} W_{1}(\nu_{1}, \nu_{2}) = \sup\limits_{\varphi \in Lip_{1}(Y)} \bigg\{\! \int_{Y} \varphi ~d(\nu_{1} -\nu_{2}) \bigg\} = \sup\limits_{\varphi \in D} \int_{Y} \varphi \,d(\nu_{1}-\nu_{2}); \end{align*} $$

see [Reference Ambrosio, Gigli and SavaréAGS05, §7.1] for details. For all $\varphi \in \text {Lip}_{1}(Y)$ ,

$$ \begin{align*} \psi_{\varphi}: X &\to \mathbb{R} \\ x &\mapsto \int_{Y} \varphi \,d(\nu-f(x)) \end{align*} $$

is Borel. For f to be a Borel map, it suffices that

$$ \begin{align*} f^{-1}(B(\nu, r))= \bigcap\limits_{\varphi \in D} \psi_{\varphi}^{-1}((-r, r)):=A, \end{align*} $$

where $B(\nu , r)$ is an open ball of radius r centred at $\nu $ in $(\mathscr {P}(Y), W_{1})$ . In fact, if $x \in A$ , then $|\psi _{\varphi }(x)|<r$ for all $\varphi \in D$ and $W_{1}(\nu , f{(x)})<r$ (by the definition of $\psi _{\varphi }$ ), so that $f{(x)} \in B(\nu , r)$ . In the same way, if $f{(x)} \in B(\nu , r)$ , then $W_{1}(\nu , f{(x)})<r$ and, by the duality formula,

$$ \begin{align*} |\psi_{\varphi}(x)|:=\Big|\int_{Y} \varphi\,d(\nu-f{(x)})\Big|<r \end{align*} $$

for every $\varphi \in D$ , so that $x \in A$ . Thus, one way is proven. Conversely, let $A \subset Y$ be an open subset. Note that

Let $I_{\varphi }$ be a function given by

(5) $$ \begin{align} I_{\varphi}: (\mathscr{P}(Y), W_{1}) & \to \mathbb{R} \nonumber \\ \unicode{x3bb} & \mapsto \int_{Y} \varphi(y) \,d\unicode{x3bb}, \end{align} $$

where $\varphi $ is a lower semicontinuous function over Y. Since $W_{1}$ metrizes the weak* topology of $\mathscr {P}(Y)$ [Reference Ambrosio, Gigli and SavaréAGS05, Ch. 7], the function $I_{\varphi }$ is lower semicontinuous. For every $x \in X$ , one obtains $\int _{Y} \varphi (y) ~df{(x)}=I_{\varphi }(f(x))$ . By assumption, f is Borel and then $f(\cdot )(A):X \to \mathbb {R}$ is a composition of a lower semicontinuous function and a Borel map, so it is a Borel map. Therefore, f is a disintegration map, which concludes the proof.

The disintegration map in fact can be written in terms of the Monge problem. Given a transport map $T:X \to Y$ and the measures $\mu \in \mathscr {P}(X)$ and $\nu = T_{*}\mu \in \mathscr {P}(Y)$ , the disintegration map is given by $x \mapsto \delta _{T(x)}$ , where $\delta _{T(x)}$ is a Dirac measure at $T(x)$ . It is possible to show that there exists a relationship among measures in $(\mathscr {P}(Y), W_{1})$ , via push forward of $\mu $ by disintegration maps, and the second marginal of transport plans induced by distinct transport maps.

Lemma 4.5. Let $T:X \to Y$ and $S: X \to Y$ be transport maps. Consider a measure $\mu \in \mathscr {P}(X)$ , and the applications f and g given by

$$ \begin{align*} f: X &\to (\mathscr{P}(Y), W_{1})\\[-2pt]x &\mapsto \delta_{T(x)}, \end{align*} $$
$$ \begin{align*} g: X &\to (\mathscr{P}(Y), W_{1}) \\[-2pt] x &\mapsto \delta_{S(x)}. \end{align*} $$

Then, $T_{*}\mu =S_{*}\mu $ if and only if $f_{*}\mu =g_{*}\mu $ .

Proof. Define $\varphi (y)=\psi (\delta _{y})$ , where $\psi \in C((\mathscr {P}(Y), W_{1}))$ is arbitrarily chosen. Then,

$$ \begin{align*} \int_{Y} \psi d(f_{*}\mu) &= \int_{X} \psi(f(x)) \,d\mu \\ &= \int_{X} \psi (\delta_{T(x)})\,d\mu \\ &= \int_{X} \varphi(T(x))\,d\mu \\ &= \int_{X} \varphi(S(x))\,d\mu \\ &= \int_{Y} \psi\,d(g_{*}\mu). \end{align*} $$

Given the arbitrary choice of $\psi $ , it follows that $f_{*}\mu =g_{*}\mu $ . Conversely, consider $\varphi \in C(Y)$ and the application $I_{\varphi }$ defined by equation (5). Note that

$$ \begin{align*} \int_{\mathscr{P}(Y)} I_{\varphi}(\unicode{x3bb}) \,d(f_{*}\mu)=\int_{\mathscr{P}(Y)} I_{\varphi}(\unicode{x3bb})\,d(g_{*}\mu) \end{align*} $$

if and only if

$$ \begin{align*} \int_{X} I_{\varphi}(f{(x)})\,d\mu = \int_{X} I_{\varphi}(g{(x)}) \,d\mu, \end{align*} $$

which in turn occurs if and only if

$$ \begin{align*} \int_{X} \bigg( \int_{Y} \varphi (y)\,df{(x)} \bigg)\,d\mu = \int_{X} \bigg( \int_{Y} \varphi (y) \,dg{(x)} \bigg) \,d\mu. \end{align*} $$

Since

$$ \begin{align*} \int_{Y} \varphi (y)\,df{(x)}=\varphi (T(x)) \end{align*} $$

and

$$ \begin{align*} \int_{Y} \varphi (y)\,dg{(x)}=\varphi (S(x)), \end{align*} $$

the last equation can be written as

$$ \begin{align*} \int_{X} \varphi (T(x))\,d\mu = \int_{X} \varphi (S(x))\,d\mu. \end{align*} $$

Due to the arbitrary choice of $\varphi $ , it follows that $T_{*}\mu =S_{*}\mu $ .

Corollary 4.6. Let $T:X \to Y$ and $S: X \to Y$ be transport maps. Consider a measure $\mu \in \mathscr {P}(X)$ , the disintegration maps f and g defined as in Lemma 4.5, and the measures $\gamma =\mu \otimes f(x)$ and $\eta =\mu \otimes g(x)$ . Then, $f_{*}\mu =g_{*}\mu $ if and only if ${\mathrm {proj}_{Y}}_{*}\gamma = {\mathrm {proj}_{Y}}_{*} \eta $ .

Given $\gamma , \eta \in \Pi (\mu , \nu )$ , as defined in §2, with $\gamma = \mu \otimes f{(x)}$ and $\eta =\mu \otimes g{(x)}$ , we say that $\gamma $ is equivalent by disintegration to $\eta $ (and we denote $\gamma \approx \eta $ ) if $f_{*}\mu =g_{*}\mu $ . With this equivalence in mind, it is possible to define an equivalent class among the transport plans as follows.

Definition 4.7. Given $\gamma \in \Pi (\mu , \nu )$ with $\gamma =\mu \otimes f(x)$ , the transport class of $\gamma $ is defined as the equivalence class $[\gamma ]=\{ \eta =\mu \otimes g(x):g_{*}\mu =f_{*}\mu \}$ .

Thus, we have that all transport plans induced by transport maps belong to the same transport class. Moreover, in the next proposition, we prove that it is possible to use equivalence by disintegration to assure the existence of a transport map.

Proposition 4.8. Consider a transport map $T:X \to Y$ such that $T_{*}\mu =\nu $ and $\gamma =\mu \otimes \delta _{T(x)}$ for a non-atomic measure $\mu \in \mathscr {P}(X)$ . If $\eta \in [\gamma ]$ , then there exists a transport map $S: X \to Y$ such that $\eta = \mu \otimes \delta _{S(x)}$ .

Proof. By the approximation theorem [Reference AmbrosioAmb00, Theorem 9.3] and by the definition of equivalent by disintegration, there exist a sequence of Borel functions $S_{n}: X \to Y$ such that $\eta = \lim \limits _{n \to \infty } \mu \otimes \delta _{S_{n}(x)}$ and $(S_{n})_{*}\mu =\nu $ for every $n \in \mathbb {N}$ . So $\mu \otimes \delta _{S_{n}(x)} \in [\gamma ]$ for every $n \in \mathbb {N}$ . Consider $\varphi (y)=|y|^2$ and observe that

$$ \begin{align*} \int_{X} |S_{n}(x)|^2\,d\mu = \int_{X} \varphi (S_{n}(x))\,d\mu = \int_{Y} \varphi(y)\,d\nu = \int_{Y} |y|^2\,d\nu < \infty. \end{align*} $$

Let $\psi \in C((\mathscr {P}(Y), W_{1}))$ be a function given by $\psi (\delta _{y})=|y|^2$ and let $\Delta \subset \mathscr {P}(Y)$ be the set of Dirac measures. Observe that $\psi $ is Lipschitz over $\Delta $ w.r.t. $W_1$ . So, take $\psi $ any Lipschitz extension over $\mathscr {P}(Y)$ . Since $(\delta _{S_{n}})_{*}\mu =(\delta _{T})_{*}\mu $ , we have that

$$ \begin{align*} \int_{X}|S_{n}(x)|^2\,d\mu = \int_{X} \psi(\delta_{S_{n}(x)})\,d\mu = \int_{X} \psi (\delta_{T(x)})\,d\mu = \int_{X} |T(x)|^{2} \,d\mu \end{align*} $$

for every $n \in \mathbb {N}$ . Therefore, moving on to a subsequence, we can assume that $S_{n}$ is weakly convergent to S. By [Reference AmbrosioAmb00, Lemma 9.1], $\mu \otimes \delta _{S_{n}(x)}$ converges weakly to $\mu \otimes \delta _{S(x)}$ . This concludes the proof.

In this context, the Monge problem can be interpreted as: minimize $\int _{X \times Y} c(x, y) \,d \gamma $ in a fixed transport class of $\Pi (\mu , \nu )$ , that is, obtain

$$ \begin{align*} \min \bigg\{\! \int_{X \times Y} c(x, y)\,d\gamma : \gamma \in [\mu \otimes \delta_{T}] \bigg\} \end{align*} $$

for a given transport map T. Regarding the Monge–Kantorovich problem, note that the second part of the proof of Lemma 4.5 applies to general transport plans, so that the second marginal can be fixed by the disintegration maps, as follows.

Lemma 4.9. Consider $\mu \in \mathscr {P}(X)$ , the disintegration maps $f: X \to \mathscr {P}(Y)$ and $g: X \to \mathscr {P}(Y)$ , and the transport plans $\gamma = \mu \otimes f(x)$ and $\eta =\mu \otimes g(x)$ . Then, $f_{*}\mu =g_{*}\mu $ implies ${\mathrm {proj}_{Y}}_{*}\gamma = {\mathrm {proj}_{Y}}_{*} \eta $ .

The reciprocal of Lemma 4.9, however, is not true. Consider, for instance, the following transport problem. Three factories, $x_{1}$ , $x_{2}$ and $x_{3}$ , with the same production, supply 100% of their products to two stores, $y_{1}$ and $y_{2}$ . Suppose store $y_{2}$ has a demand five times greater than store $y_{1}$ . Let $\mu $ be the measure related to the production of the factories and $\nu $ be the measure related to the amount of delivered products to the stores. These measures are given by

$$ \begin{align*} \mu=\tfrac{1}{3} \delta_{x_{1}}+\tfrac{1}{3} \delta_{x_{2}}+\tfrac{1}{3} \delta_{x_{3}}, \end{align*} $$
$$ \begin{align*} \nu=\tfrac{1}{6}\delta_{y_{1}}+\tfrac{5}{6}\delta_{y_{2}}. \end{align*} $$

Consider the transport plan illustrated in Figure 2, which divides the products which are produced in the factory $x_{1}$ between the two stores. The corresponding disintegration map f is given by $f(x_1)=\tfrac 12 \delta _{y_{1}} + \tfrac 12 \delta _{y_{2}}$ , $f(x_2)=\delta _{y_{2}}$ and $f(x_3)=\delta _{y_{2}}$ .

Figure 2 Transport plan 1.

Note that since $\mu $ is of the type $\sum \limits _{i} \alpha _{i} \delta _{x_{i}}$ , we have that $f_{*}\mu = \sum \limits _{i} \alpha _{i} \delta _{f(x_{i})}$ . Suppose, however, that there is a change in logistics and in the new transport plan, Figure 3, the factory $x_{1}$ delivers its production only to the store $y_{2}$ and the factory $x_{2}$ divides its production between the two stores. In this new situation, the disintegration map is given by $g(x_1)= \delta _{y_{2}}$ , $g(x_2)=\tfrac 12 \delta _{y_{1}} + \tfrac 12 \delta _{y_{2}}$ and $g(x_3)=\delta _{y_{2}}$ . Nevertheless, on the one hand, $f_{*}\mu =g_{*}\mu $ , i.e., the transport class does not change. On the other hand, if in this new transport plan the factories $x_{1}$ and $x_{2}$ deliver to both stores, Figure 4, in such a way that $x_{1}$ sends 30% of its production to $y_{1}$ and 70% to $y_{2}$ , and $x_{2}$ sends 20% of its production to $y_{1}$ and 80% to $y_{2}$ , the disintegration map is given by $h(x_1) = {3}/{10} \delta _{y_{1}} + {7}/{10} \delta _{y_{2}}$ , $h(x_2) = {2}/{10} \delta _{y_{1}} + {8}/{10} \delta _{y_{2}}$ and $h(x_3) = \delta _{y_{2}}$ . In this case, $f_{*}\mu \neq h_{*}\mu $ . So, the transport plans $1$ and $3$ do not belong to the same transport class.

Figure 3 Transport plan 2.

Figure 4 Transport plan 3.

Furthermore, if there is a small change in transport plan 3, so that $x_{1}$ sends 10% of its production to $y_{1}$ and 90% to $y_{2}$ , and $x_ {2}$ sends 40% of its production to $y_{1}$ and 60% to $y_{2}$ , the new disintegration map is given by $k(x_1) = {1}/{10} \delta _{y_{1}} + {9}/{10} \delta _{y_{2}}$ , $k(x_2) = {4}/{10} \delta _{y_{1}} + {6}/{10} \delta _{y_{2}}$ and $k(x_3) = \delta _{y_{2}}$ . Therefore, $h_{*}\mu \neq k_{*}\mu $ . Thus, with the previous definition for transport class, the transport class changes when either the number of factories that deliver products to more than one store is changed or even if the fraction of delivered production is changed.

Therefore, to be compatible with the Monge–Kantorovich problem, we need another definition of transport class. Take $\mu \otimes f \in \Pi (\mu , \nu )$ and $\Lambda =f_{*}\mu $ , and recall that $(\mathrm {proj}_{Y})_{*}(\mu \otimes f)=\nu $ , then for every $\varphi \in C(Y)$ ,

$$ \begin{align*} \int_{Y}\varphi(y)\,d\nu &=\int_{X} \bigg( \int_{Y} \varphi(y)\,df{(x)} \bigg)\,d\mu \\ &= \int_{X} I_{\varphi}(f{(x)})\,d\mu \\ &= \int_{\mathscr{P}(Y)}I_{\varphi}(\unicode{x3bb})\,d\Lambda(\unicode{x3bb}) \\ &= \int_{\mathscr{P}(Y)} \bigg( \int_{Y} \varphi (y)\,d\unicode{x3bb} \bigg)\,d\Lambda, \end{align*} $$

where $I_{\varphi }$ is given by equation (5) and $\unicode{x3bb} \in \mathscr {P}(Y)$ . Hence, every probability $\Lambda $ in $(\mathscr {P}(Y), W_{1})$ satisfying

(6) $$ \begin{align} \int_{\mathscr{P}(Y)} \unicode{x3bb} \,d\Lambda=\nu \end{align} $$

defines a transport class $[\gamma ]{\kern-1pt}={\kern-1pt}\{\mu {\kern-1pt}\otimes{\kern-1pt} f {\kern-1pt}:{\kern-1pt} f_{*}\mu {\kern-1pt}={\kern-1pt}\Lambda \}$ . As an example, a transport class $[\mu {\kern-1pt}\times{\kern-1pt} \nu ]$ corresponds to the measure $\Lambda =\delta _{\nu }$ . From this point of view, the Monge–Kantorovich problem in the class $\Lambda $ can be thought as

$$ \begin{align*} MK_{\Lambda}(c, \mu, \nu)=\inf\limits_{\gamma} \bigg\{\! \int_{X \times Y} c(x, y)\,d\gamma : \gamma = \mu \otimes f, f_{*}\mu=\Lambda \bigg\}. \end{align*} $$

This notion of transport class also allows us to see the Monge–Kantorovich problem as an abstract Monge problem between the spaces X and $\mathscr {P}(Y)$ , considering the cost

$$ \begin{align*} \tilde{c}(x, \unicode{x3bb})=\int_{Y}c(x,y),d\unicode{x3bb}. \end{align*} $$

In fact, for every disintegration map $f: X \to \mathscr {P}(Y)$ , such that $f_{*}\mu =\Lambda $ ,

$$ \begin{align*} \int_{X} \tilde{c}(x, f{(x)})\,d\mu &= \int_{X} \bigg( \int_{Y}c(x,y)\,df{(x)} \bigg)\,d \mu \\ &= \int_{X \times Y} c(x, y)\,d(\mu \otimes f) \end{align*} $$

and the problem $MK_{\Lambda }(c, \mu , \nu )$ is equivalent to the Monge problem with parameters $(\tilde {c}, \mu , \Lambda )$ . Thus, the disintegration of transport plans introduces another perspective for the optimization problems proposed by Monge and Kantorovich.

5. Absolute continuity from disintegration maps

To connect what we just developed to the geometric properties of the measure spaces and the space of measures, we shall study how paths of measures are given by disintegration maps. Our interest hereafter is to use the $2$ -Wasserstein distance.

To construct such paths in $2$ -Wasserstein space, it is essential to analyse the weak continuity of the disintegration map. In the Wasserstein space, one can study a characterization of convergence. We say that $\{ \mu _k \}$ converges weakly to $\mu $ when

$$ \begin{align*} \int \varphi ~\mu_k \longrightarrow \int \varphi \,d\mu \end{align*} $$

for any bounded continuous function $\varphi $ . Moreover, Wasserstein distances metrize weak convergence, that is, if $\{ \mu _k \}$ is a sequence in $\mathscr {P}_p (X)$ and $\mu \in \mathscr {P}_p(X)$ , then $\mu _k$ converges weakly to $\mu $ is equivalent to $W_p (\mu _k, \mu ) \longrightarrow 0$ . When referring to continuity of the disintegration map, we use weak continuity meaning continuity with respect to weak convergence on $\mathscr {P}_p (X)$ .

A map f on an metric space $(Y, \nu )$ is called nearly continuous if, for each $\varepsilon> 0$ , there exist $\mathcal {K} \subset Y$ closed with $\nu (Y \backslash \mathcal {K}) < \varepsilon $ such that f restricted to $\mathcal {K}$ is continuous. Let f be a disintegration map of $\mu $ w.r.t. $\nu $ . If $\nu $ is absolutely continuous with respect to the volume measure of Y, we can apply Lusin’s Theorem 2.4 to show that f is nearly weakly continuous. We also need the following lemma, which is actually a known fact within the optimal transport community. For completeness, we add it here in our context.

Lemma 5.1. Let $(X, d)$ be a separable metric space. The $2$ -Wasserstein space $(\mathscr {P}(X), W_{2})$ is a separable metric space.

Proof. Let $\mathcal {D} \subset X$ be a countable and dense subset. Consider the space $\mathcal {M}$ defined by $\mathcal {M} := \{ \nu \in \mathscr {P}(X) : \nu = \sum _j a_j \delta _{x_j}, ~\text {with}~ a_j \in \mathbb {Q} ~\text {and}~ x_j \in \mathcal {D} \}.$ We want to show that $\mathcal {M}$ is dense in $(\mathscr {P}(X), W_2)$ .

Given $\mu \in (\mathscr {P}(X), W_{2})$ , then for any $\varepsilon> 0$ and $x_0 \in \mathcal {D}$ , there exists a compact set $K \subset X$ such that

$$ \begin{align*} \int_{X \backslash K}\,d(x_0, x)^{2} \,d\mu \leq \varepsilon^{2}. \end{align*} $$

Since K is compact, we may cover it with a family $\{ B(x_k, {\varepsilon }/{2}):1 \leq k \leq N, x_k \in \mathcal {D} \}$ and define

$$ \begin{align*} B_{k}':= B(x_k, \varepsilon) \backslash \bigcup_{j<k} B(x_j, \varepsilon) \end{align*} $$

so that $\{ B_{k}' \}$ are disjoint and still cover K. Define $\varphi $ on X by

$$ \begin{align*} \begin{cases} \varphi(B_{k}' \cap K) = \{ x_k \}, \\ \varphi(X-K)=\{ x_0 \}. \end{cases} \end{align*} $$

So $d(x, x_k) \leq \varepsilon $ for every $x \in K$ . Therefore,

$$ \begin{align*} \int_{X}\,d(x, \varphi(x))^{2} \,d\mu \leq \varepsilon^2 \int_{K}\,d\mu + \int_{X \backslash K} d(x, x_0)^2 \,d\mu \leq 2 \varepsilon^2 \end{align*} $$

and then $W_2 (\mu , \varphi _{*}\mu ) \leq 2 \varepsilon ^2$ . Note that $\varphi _{*}\mu $ can be written as $\sum _{0 \leq j \leq N} \alpha _j \delta _{x_j}$ , that is, $\mu $ might be approximated by a finite combination of Dirac masses. Moreover, the coefficients $\alpha _j$ might be replaced by rational coefficients (up to a small error in Wasserstein distance): since Wasserstein distances are controlled by weighted total variation [Reference VillaniVil09, Theorem 6.15],

$$ \begin{align*} W_{2} \bigg( \sum_{0 \leq j \leq N} \alpha_j \delta_{x_j}, \sum_{0 \leq j \leq N} \beta_j \delta_{x_j} \bigg) \leq 2^{{1}/{2}} \bigg[ \max_{k, l}\,d(x_k, x_l) \bigg] \sum_{0 \leq j \leq N} |\alpha_j - \beta_j|^{{1}/{2}}, \end{align*} $$

which can become small, taking suitable coefficients $\beta _j \in \mathbb {Q}$ . Thus, it follows that $\mathcal {M}$ is dense in $\mathscr {P}(X)$ . Consequently, $(\mathscr {P}(X), W_2)$ is separable.

Proposition 5.2. Let $X, ~Y$ be locally compact, complete and separable metric spaces. Consider $\pi : X \to Y$ a Borel map, $\mu \in \mathcal {M}_{+}(X)$ , $\operatorname {\mathrm {vol}}_Y$ volume measure on Y and $\nu := \pi _{*}\mu $ . If $\nu \ll \operatorname {\mathrm {vol}}_Y$ , then the disintegration map of $\mu $ w.r.t. $\nu $ is nearly weakly continuous.

Proof. Consider the $2$ -Wasserstein distance $W_{2}$ on $\mathscr {P}(X)$ , with d a complete bounded metric for X. Here, $W_{2}$ metrizes the weak convergence of $\mathscr {P}(X)$ [Reference VillaniVil09, Theorem 6.9]. Furthermore, $(\mathscr {P}(X), W_{2})$ is a separable space. Moreover, from Theorem A, the map $y \mapsto \mu _{y} \in \mathscr {P}(X)$ is measurable and $\nu $ is a Borel measure on Y, since $\nu \ll \operatorname {\mathrm {vol}}_Y$ . Then, by Lusin’s theorem for Y and $\varepsilon> 0$ , there exists $\mathcal {K} \subset Y$ with $\nu (Y\backslash \mathcal {K}) < \varepsilon $ such that the disintegration map restricted to $\mathcal {K}$ is weakly continuous.

Although Proposition 5.2 is relevant by itself, we would like to have conditions for which the disintegration map is weakly continuous at every point, so that, given any two points $y, y'$ in Y, we can construct a path of conditional measures connecting $\mu _{y}$ and $\mu _{y'}$ . Note that the 2-Wasserstein space is actually connected, and therefore we can always find a path connecting two measures. Nevertheless, we require this path to be specifically given by the disintegration map. This is not trivial indeed and we can easily construct examples in which this map is not weakly continuous.

Example 5.3. Consider $X = Y = [0,1]$ . Let $\mu $ be the Lebesgue measure on X and take the map $\pi : X \to Y$ given by

$$ \begin{align*} \pi(x) = \begin{cases} 2x & \text{if } x< \frac{1}{2}, \\ 1 & \text{if } x \geq \frac{1}{2}. \end{cases} \end{align*} $$

Note that $\pi $ is Borel measurable, since it is continuous. Define

$$ \begin{align*} \nu := \pi_{*} \mu = \tfrac{1}{2} \unicode{x3bb} + \tfrac{1}{2} \delta_1, \end{align*} $$

where $\unicode{x3bb} $ is the Lebesgue measure on Y and $\delta _1$ is the Dirac measure at $y=1$ . A disintegration $\{ \mu _y \}_{y \in Y}$ of $\mu $ with respect to $\pi $ is given by

$$ \begin{align*} \mu_y = \begin{cases} \delta_{\pi^{-1}(y)} & \text{if } y<1, \\ 2\unicode{x3bb}|_{[{1}/{2}, 1]} & \text{if } y=1. \end{cases} \end{align*} $$

In this case, the disintegration map is not weakly continuous at $y=1$ . In fact, consider a sequence $(y_n)_n$ in Y such that $y_n \longrightarrow y=1$ . Note that

$$ \begin{align*} W_2^2(\mu_{y_n}, \mu_y) =& \inf_{\gamma \in \Pi(\mu_{y_n}, \mu_y)} \int_{X \times X} \,d(x_1, x_2)^2 \,d \gamma \\ =& \int_{X \times X}\,d(x_1, x_2)^2 \,d (\delta_{\pi^{-1}(y_n)} \times \mu_y) \\ =& \int_{X}\,d(\pi^{-1}(y_n), x_2)^2 \,d\mu_{y}. \end{align*} $$

Then, at the limit $y_n \longrightarrow 1$ , we have

$$ \begin{align*} W_2^2(\mu_{y_n}, \mu_y) \longrightarrow \int_{X} d \bigg(\frac{1}{2}, x_2 \bigg)^2 \,d\mu_{y} \neq 0. \end{align*} $$

Therefore, f is not weakly continuous at