Hostname: page-component-8448b6f56d-42gr6 Total loading time: 0 Render date: 2024-04-24T10:48:59.906Z Has data issue: false hasContentIssue false

Unified signature cumulants and generalized Magnus expansions

Published online by Cambridge University Press:  09 June 2022

Peter K. Friz
Affiliation:
Institut für Mathematik, Technische Universität Berlin, Str. des 17. Juni 136, Berlin 10586, Germany; E-mail: friz@math.tu-berlin.de Weierstraß-Institut für Angewandte Analysis und Stochastik, Mohrenstr. 39, Berlin 10117, Germany; E-mail: tapia@wias-berlin.de
Paul P. Hager
Affiliation:
Institut für Mathematik, Humboldt Universität zu Berlin, Unter den Linden 6, Berlin 10099, Germany; E-mail: paul.hager@hu-berlin.de
Nikolas Tapia
Affiliation:
Institut für Mathematik, Technische Universität Berlin, Str. des 17. Juni 136, Berlin 10586, Germany; E-mail: friz@math.tu-berlin.de Weierstraß-Institut für Angewandte Analysis und Stochastik, Mohrenstr. 39, Berlin 10117, Germany; E-mail: tapia@wias-berlin.de

Abstract

The signature of a path can be described as its full non-commutative exponential. Following T. Lyons, we regard its expectation, the expected signature, as a path space analogue of the classical moment generating function. The logarithm thereof, taken in the tensor algebra, defines the signature cumulant. We establish a universal functional relation in a general semimartingale context. Our work exhibits the importance of Magnus expansions in the algorithmic problem of computing expected signature cumulants and further offers a far-reaching generalization of recent results on characteristic exponents dubbed diamond and cumulant expansions with motivations ranging from financial mathematics to statistical physics. From an affine semimartingale perspective, the functional relation may be interpreted as a type of generalized Riccati equation.

Type
Computational Mathematics
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2022. Published by Cambridge University Press

1 Introduction and main results

Write $\mathcal {T} := T\mathopen {(\mkern -3mu(}\mathbb {R}^d\mathclose {)\mkern -3mu)} = \Pi _{k \ge 0} (\mathbb {R}^d)^{\otimes k}$ for the tensor series over $\mathbb {R}^d$ , equipped with a concatenation product, elements of which are written indifferently as

$$ \begin{align*}\mathbf{x} = (\mathbf{x}^{(0)},\mathbf{x}^{(1)},\mathbf{x}^{(2)},\dotsc) \equiv \mathbf{x}^{(0)}+ \mathbf{x}^{(1)}+ \mathbf{x}^{(2)}+\dotsb. \end{align*} $$

The affine subspace $\mathcal {T}_0$ (respectively, $\mathcal {T}_1$ ) with scalar component $\mathbf {x}^{(0)} = 0$ (respectively, $ = 1$ ) has a natural Lie algebra (respectively, formal Lie group) structure, with the comutator Lie-bracket $[\mathbf {y}, \mathbf {x}] = \mathbf {y}\mathbf {x} - \mathbf {x}\mathbf {y}$ for $\mathbf {x}, \mathbf {y} \in \mathcal {T}_0$ , where $\mathbf {x}\mathbf {y}$ stands for the concatenation product.Footnote 1

Let further $\mathscr {S} = \mathscr {S} (\mathbb {R}^d)$ , respectively, $\mathscr {S}^c= \mathscr {S}^c(\mathbb {R}^d)$ , denote the class of càdlàg,Footnote 2 respectively, continuous, d-dimensional semimartingales on some filtered probability space $(\Omega , (\mathcal {F}_t)_{t \ge 0}, \mathbb {P})$ . We recall that this is the somewhat decisive class of stochastic processes that allows for a reasonable, stochastic integration theory. Classic texts include [Reference Revuz and Yor59, Reference Le Gall43] (continuous semimartingale) and [Reference Jacod and Shiryaev36, Reference Protter57] (càdlàg semimartingales); a concise introduction for readers with no background in stochastic analysis is [Reference Aït-Sahalia and Jacod2], Chapter 1.Footnote 3 (Readers with no background in probability may also focus in a first reading on deterministic semimartingales; these are precisely càdlàg paths of finite variation on compacts.) Following Lyons [Reference Lyons45], for a continuous semimartingale $X\in \mathscr {S}^c$ , the signature given as the formal sum of iterated Stratonovich integrals,

$$\begin{align*}\mathrm{Sig} (X)_{s, t} = 1 + X_{s, t} + \int_s^t X_{s, u}\,{\circ\mathrm d}X_u + \int_s^t \left( \int_s^{u_1} X_{s, u_2}\,{\circ \mathrm d}X_{u_2} \right)\,{\circ\mathrm d} X_{u_1} + \cdots \end{align*}$$

for $0 \le s \le t$ defines a random element in $\mathcal {T}_1$ and, as a process, a formal $\mathcal {T}_1$ -valued semimartingale. By regarding the d-dimensional semimartingale X as a $\mathcal {T}_0$ -valued semimartingale ( $X \leftrightarrow \mathbf {X} = (0,X,0,\dots $ )), we see that the signature of $X\in \mathscr {S}^c$ satisfies the Stratonovich stochastic differential equation

(1.1) $$ \begin{align} \mathrm d S = S\,{\circ\mathrm d} \mathbf{X}. \end{align} $$

In the general case of $\mathbf {X}\in \mathscr {S}(\mathcal {T}_0)$ with possibly nontrivial higher-order semimartingale components $\mathbf {X} = (0, \mathbf {X}^{(1)}, \mathbf {X}^{(2)}, \dots )$ , the solution to equation (1.1) is also known as the Lie group valued stochastic exponential (or development), with classical references [Reference McKean51, Reference Hakim-Dowek and Lépingle31]; the càdlàg case [Reference Estrade20] is consistent with the geometric or Marcus [Reference Marcus49, Reference Marcus50, Reference Kurtz, Pardoux and Protter41, Reference Applebaum4, Reference Friz and Shekhar25] interpretation of equation (1.1)Footnote 4 with jump behavior $S_t = e^{\Delta \mathbf {X}_t} S_{t-}$ . From a stochastic differential geometry point of view, one aims for an intrinsic understanding of equation (1.1) valid for arbitrary Lie groups. For instance, if $\mathbf {X}$ takes values in any sub Lie algebra $\mathcal {L} \subset \mathcal {T}_0$ , then S takes values in the group $\mathcal {G} = \exp \mathcal {L}$ . In case of a d-dimensional semimartingale X, the minimal choice is the free Lie algebra $\mathrm {Lie}\mathopen {(\mkern -3mu(}\mathbb {R}^d\mathclose {)\mkern -3mu)}$ spanned by $\mathbb {R}^d$ (see, for example, [Reference Reutenauer58]), and the resulting Lie algebra structure of iterated integrals (both in the smooth and Stratonovich semimartingale case) is well-known. The extrinsic linear ambient space $\mathcal {T} \supset \exp {\mathcal {L}}$ will be important to us. Indeed, writing $S_t=\mathrm {Sig}(\mathbf {X})_{0,t}$ for the (unique, global) $\mathcal {T}_1$ -valued solution of equation (1.1) driven by $\mathcal {T}_0$ -valued $\mathbf {X}$ , started at $S_0 = 1$ , we define, whenever $\mathrm {Sig} (\mathbf {X})_{0, T}$ is (componentwise) integrable, the expected signature and signature cumulants (SigCum)

$$\begin{align*}\boldsymbol{\mu} (T) := \mathbb{E} (\mathrm{Sig} (\mathbf{X})_{0, T})\in\mathcal{T}_1, \quad \boldsymbol{\kappa} (T) := \log \boldsymbol{\mu} (T) \in \mathcal{T}_0. \end{align*}$$

Already when $\mathbf {X}$ is deterministic, and sufficiently regular to make equation (1.1) meaningful, this leads to an interesting (ordinary differential) equation for $\boldsymbol {\kappa }$ with accompanying (Magnus) expansion, well understood as an effective computational tool [Reference Iserles, Munthe-Kaas, Nørsett and Zanna34, Reference Blanes, Casas, Oteo and Ros6]. The importance of the stochastic case $\mathbf {X} = \mathbf {X}(\omega )$ , with expectation and logarithm thereof, was developed by Lyons and coauthors; see [Reference Lyons45] and references therein, with a variety of applications, ranging from machine learning to numerical algorithms on Wiener space known as cubature [Reference Lyons and Victoir47]; signature cumulants were named and first studied in their own right in [Reference Bonnier and Oberhauser7]. The joint appearance of cumulants and Magnus type-expansion is also seen in non-commutative probability [Reference Celestino, Ebrahimi-Fard, Patras and Perales10], although the methods and aims appear quite different.Footnote 5

In the special case of $d=1$ and $\mathbf {X}=(0,X,0,\dots )$ , where X is a scalar semimartingale, $\boldsymbol {\mu } (T)$ and $\boldsymbol {\kappa } (T)$ are nothing but the sequence of moments and cumulants of the real valued random variable $X_T-X_0$ . When $d> 1$ , the expected signature / cumulants provides an effective way to describe the process X on $[0,T]$ ; see [Reference LeJan and Qian44, Reference Lyons45, Reference Chevyrev and Lyons13]. The question arises how to compute them. If one takes $\mathbf {X}$ as d-dimensional Brownian motion, the signature cumulant $\boldsymbol {\kappa }(T)$ equals $(T/2) \mathbf {I}_d$ , where $\mathbf {I}_d$ is the identity $2$ -tensor over $\mathbb {R}^d$ . This is known as Fawcett’s formula [Reference Lyons and Victoir47, Reference Friz and Hairer24]. Loosely speaking, and postponing precise definitions, our main result is a vast generalization of Fawcett’s formula.

Theorem 1.1 FunctEqu $\mathscr {S}$ -SigCum

For sufficiently integrable $\mathbf {X}\in \mathscr {S}(\mathcal {T}_0)$ , the (time-t) conditional signature cumulants $ \boldsymbol {\kappa }_t (T) \equiv \boldsymbol {\kappa }_t := \log \mathbb {E}_t (\mathrm {Sig} (\mathbf {X})_{t, T})$ is the unique solution of the functional equation

(1.2)

where all integrals are understood in an Itô– and Riemann–Stieltjes sense, respectively,Footnote 6 and $\operatorname {\mathrm {ad}}{\mathbf {x}}=[\mathbf {x}, \cdot ]:\mathcal {T}_0\to \mathcal {T}_0$ denotes the adjoined operator associated to $\mathbf {x} \in \mathcal {T}_0$ . The functions $H,G,Q$ are defined in equation (4.1) below; see also also Section 2 for further notation.

As displayed in Figures 1 and 2, this theorem has an avalanche of consequences on which we now comment.

  • Equation (1.2) allows to compute $\boldsymbol {\kappa }^{(n)} \in (\mathbb {R}^d)^{\otimes n}$ as a function of $\boldsymbol {\kappa }^{(1)},\dotsc ,\boldsymbol {\kappa }^{(n-1)}$ . (This remark applies mutatis mutandis to all special cases seen as vertices in Figure 1.) The resulting expansions, displayed in Figure 2, are of computational interest. In particular, our approach allows us, in some special cases, either to derive closed expressions for the conditional cumulant series $\boldsymbol {\kappa }_t$ or to characterize it a as the unique solution of a certain parabolic PDE (see Section 6). In any case, this provides a means of computation that can in principle be more efficient than the naïve Monte Carlo approach. Even when such a concrete form of $\boldsymbol {\kappa }_t$ is not available, the recursive nature of the expressions for each homogeneous component can be useful in some numerical scenarios.

    Figure 1 FunctEqu $\mathscr {S}$ -SigCum (Theorem 4.1) and implications. $\mathscr {S}$ (respectively, $\mathscr {S}^{c}$ ) stands for general (respectively, continuous) semimartingales and $\mathscr {V}$ (respectively, $\mathscr {V}^{c}$ ) stands for finite variation (respectively, finite variation and continuous) processes.

    Figure 2 Computational consequence: accompanying recursions.

  • The most classical consequence of equation (1.2) appears when $\mathbf {X}$ is a deterministic continuous semimartingale: that is, in particular, the components of $\mathbf {X}$ are continuous paths of finite variation, which also covers the absolutely continuous case with integrable componentwise derivative $\dot {\mathbf {X}}$ . In this case all bracket terms and the final jump-sum disappear. What remains is a classical differential equation due to [Reference Hausdorff32], here in backward form:

    (1.3) $$ \begin{align} - \mathrm{d} \boldsymbol{\kappa}_t (T) = H(\operatorname{\mathrm{ad}}{\boldsymbol{\kappa}_{t}})\mathrm{d}\mathbf{X}_t. \end{align} $$
    The accompanying expansion is then precisely Magnus expansion [Reference Magnus48, Reference Iserles and Nørsett33, Reference Iserles, Munthe-Kaas, Nørsett and Zanna34, Reference Blanes, Casas, Oteo and Ros6]. By taking $\mathbf {X}$ continuous and piecewise linear on two adjacent intervals, say $[0,1)\cup [1,2)$ , one obtains the Baker–Campbell–Hausdorff formula (see, e.g., [Reference Miller52, Theorem 5.5])
    (1.4) $$ \begin{align} \begin{aligned} \boldsymbol{\kappa}_0(2)=\log\bigl(\exp(\mathbf{x}_1)\exp(\mathbf{x}_2) \bigr) &=: \operatorname{BCH}(\mathbf{x}_1,\mathbf{x}_2)\\ &=\mathbf{x}_2+\int_0^1\Psi(\exp(\operatorname{\mathrm{ad}} t\mathbf{x}_1)\circ\exp(\operatorname{\mathrm{ad}}\mathbf{x}_2))(\mathbf{x}_1)\,\mathrm dt, \end{aligned} \end{align} $$
    with
    $$\begin{align*}\Psi(z):=\frac{\ln(z)}{z-1}=\sum_{n\ge 0}\frac{(-1)^n}{n+1}(z-1)^n. \end{align*}$$
    It is also instructive to let $\mathbf {X}$ piecewise constant on these intervals, with $\Delta \mathbf {X}_1 =\mathbf {x}_1, \Delta \mathbf {X}_2 = \mathbf {x}_2$ , in which case equation (1.2) reduces to the first equality in equation (1.4). Such jump variations of the Magnus expansion are discussed in Section 5.1.
  • Write $\pi _{\mathrm {Sym}}: \mathcal {T} \to \mathcal {S}$ for the canonical projection to the extended symmetric algebra $\mathcal {S}$ , the linear space identified with symmetric tensor series, and define the $\mathcal {S}$ -valued semimartingale $\hat {\mathbf {X}} := \pi _{\mathrm {Sym}}(\mathbf {X})$ and symmetric signature cumulants $\hat {\boldsymbol {\kappa }}(T) := \log (\mathbb {E}_{\cdot }(\mathrm {Sig}(\hat {\mathbf {X}})_{\cdot , T})) = \pi _{\mathrm {Sym}}(\boldsymbol {\kappa }(T))$ (see Section 2.3.1 for more detail). Then equation (1.2), in its projected and commutative form becomes (see also Section 5.2)

    (1.5) $$ \begin{align} \begin{aligned} \qquad \text{FunctEqu }\mathscr{S}\text{-Cum:} \quad \hat{\boldsymbol{\kappa}}_t (T) & = \mathbb{E}_t\bigg\{ \hat{\mathbf{X}}_{t,T} + \frac{1}{2} \left\langle (\hat{\mathbf{X}}+ \hat{\boldsymbol{\kappa}})^{c} \right\rangle_{t,T}\\ &\qquad + \sum_{t < u \le T}\bigg(\exp \Big( \Delta \hat{\mathbf{X}}_u + \Delta \hat{\boldsymbol{\kappa}}_u \Big) - 1 - (\Delta \hat{\mathbf{X}}_u + \Delta \hat{\boldsymbol{\kappa}}_u ) \bigg) \bigg\}, \end{aligned} \end{align} $$
    where $\exp \colon \mathcal {S}_0 \mapsto \mathcal {S}_1$ is defined by the usual power series. First-level tensors are trivially symmetric, and therefore equation (1.5) applies to the a $\mathbb {R}^d$ -valued semimartingale X via the canonical embedding $\hat {\mathbf {X}} = (0, X, 0, \dots ) \in \mathscr {S}(\mathcal {S}_0)$ . More interestingly, the case $\hat {\mathbf {X}} = (0,aX,b\langle X \rangle , 0, \dotsc )$ for a d-dimensional continuous martingale X can be seen to underlie the expansions of [Reference Friz, Gatheral and Radoičić23], which improves and unifies previous results [Reference Lacoin, Rhodes and Vargas42, Reference Alos, Gatheral and Radoičić3] treating $(a,b)=(1,0)$ and $(a,b)=(1,-1/2)$ , respectively. Following Gatheral and coworkers, equation (1.5) and subsequent expansions involve ‘diamond’ products of semimartingales, given, whenever well-defined, by
    $$ \begin{align*}(A \diamond B)_t(T) := \mathbb{E}_t \big( \left\langle A^c, B^c \right\rangle_{t,T} \big). \end{align*} $$
    We note that equation (1.5) induces recursive formulae for cumulants, dubbed Diamond expansions in Figure 2, previously discussed in [Reference Lacoin, Rhodes and Vargas42, Reference Alos, Gatheral and Radoičić3, Reference Friz, Gatheral and Radoičić23, Reference Fukasawa and Matsushita28], together with a range of applications, from quantitative finance (including rough volatility models [Reference Abi Jaber, Larsson and Pulido1, Reference Gatheral and Keller-Ressel29]) to statistical physics: in [Reference Lacoin, Rhodes and Vargas42], the authors rely on such formulae to compute the cumulants function of log-correlated Gaussian fields, more precisely approximations thereof, that underlies the Sine-Gordon model, which is a key ingredient in their renormalization procedure.

    With regard to the existing (‘commutative’) literature, our algebraic setup is ideally suited to work under finite moment assumptions; we are able to deal with jumps, not treated in [Reference Lacoin, Rhodes and Vargas42, Reference Alos, Gatheral and Radoičić3]. Equation (1.5), the commutative shadow of equation (1.2), should be compared with Riccati’s ordinary differential equation from affine process theory [Reference Duffie, Filipović and Schachermayer19, Reference Cuchiero, Filipović, Mayerhofer and Teichmann16, Reference Keller-Ressel, Schachermayer and Teichmann40]. A systematic comparison would lead us too far astray from our main object of study; nevertheless, we illustrate the connection in Remark 6.9. Of course, our results, in particular equations (1.2) and (1.5), are not restricted to affine semimartingales. In turn, expected signatures and cumulants - and subsequently all our statements above these - require moments, which is not required for the Riccati evolution of the characteristic function of affine processes. Of recent interest, explicit diamond expansions have been obtained for ‘rough affine’ processes, non-Markov by nature, with a cumulant generating function characterized by Riccati Volterra equations; see [Reference Abi Jaber, Larsson and Pulido1, Reference Gatheral and Keller-Ressel29, Reference Friz, Gatheral and Radoičić23]. It is remarkable that analytic tractability remains intact when one passes to path space and considers signature cumulants, as we illustrate in Section 6.3.

  • Finally, we mention Signature-SDEs [Reference Arribas, Salvi and Szpruch5], tractable classes of stochastic differential equations that can be studied from an infinite dimensional affine and polynomial perspective [Reference Cuchiero, Svaluto-Ferro and Teichmann18]. Calibration of such models hinges on the efficient computation of expected signatures, which is the very purpose of this paper.

We conclude this introduction with some remarks on convergence. As explained, this work contains generalizations of cumulant type recursions, previously studied in [Reference Alos, Gatheral and Radoičić3, Reference Lacoin, Rhodes and Vargas42, Reference Friz, Gatheral and Radoičić23], the interest therein being the algorithmic computation of cumulants. Basic facts of analytic functions show that classical moment- and cumulant-generating functions, for random variables with finite moments of all orders, have a radius of convergence $\rho \ge 0$ , directly related to growth of the corresponding sequence. Convergence, in the sense $\rho> 0$ , implies that the moment problem is well-posed. That is, the moments (equivalently: cumulants) determine the law of the underlying random variable. (See also [Reference Friz, Gatheral and Radoičić23] for a related discussion in the context of diamond expansions.) The point of view taken here is to work directly on this space of sequences, which is even more natural in the non-commutative setting, as already seen in the deterministic setting of [Reference Magnus48]. While convergence of expected signatures or signature cumulants series is not directly an interesting question,Footnote 7 understanding their growth most certainly is: in a celebrated paper [Reference Chevyrev and Lyons13], it was shown that under a growth condition of the expected signature, the ‘expected signature problem’ is well-posed; that is, the expected signature (equivalently: signature cumulants) determines the law of the random signature. With this in mind, it is conceivable that Theorem 1.1 will prove useful toward controlling the growth of signature cumulants (and hence expected signatures).

2 Preliminaries

2.1 The tensor algebra and tensor series

Denote by $T({\mathbb {R}^d})$ the tensor algebra over ${\mathbb {R}^d}$ : that is,

$$ \begin{align*} T({\mathbb{R}^d}):= \bigoplus_{k=0}^\infty ({\mathbb{R}^d})^{\otimes k}, \end{align*} $$

elements of which are finite sums (also known as tensor polynomials) of the form

(2.1) $$ \begin{align} \mathbf{x} = \sum_{k \ge 0} \mathbf{x}^{(k)} = \sum_{w \in \mathcal{W}^d} \mathbf{x}^w e_w \end{align} $$

with $\mathbf {x}^{(k)} \in ({\mathbb {R}^d})^{\otimes k}, \mathbf {x}^w \in \mathbb {R}$ and linear basis vectors $e_w := e_{i_1}\dotsm e_{i_k}\in ({\mathbb {R}^d})^{\otimes k}$ , where w ranges over all words $w=i_1\dotsm i_k\in \mathcal {W}_d$ over the alphabet $\{1,\dots ,d\}$ . Note $\mathbf {x}^{(k)} = \sum _{|w|=k} \mathbf {x}^w e_w$ , where $|w|$ denotes the length a word w. The element $e_\emptyset = 1 \in ({\mathbb {R}^d})^{\otimes 0} \cong \mathbb {R}$ is the neutral element of the concatenation (also known as a tensor) product, which is obtained by linear extension of $e_we_{w'}=e_{ww'}$ , where $ww' \in \mathcal {W}_d$ denotes concatenation of two words. We thus have, for $\mathbf {x},\mathbf {y} \in T({\mathbb {R}^d})$ ,

$$ \begin{align*}\mathbf{x}\mathbf{y} = \sum_{k \ge 0} \sum_{\ell =0}^k \mathbf{x}^{(\ell)} \mathbf{y}^{(k-\ell)} = \sum_{w \in \mathcal{W}^d} \left( \sum_{w_1w_2 = w} \mathbf{x}^{w_1}\mathbf{y}^{w_2} \right) e_w \in T({\mathbb{R}^d}). \end{align*} $$

This extends naturally to infinite sums, also known as tensor series, elements of the ‘completed’ tensor algebra

$$ \begin{align*} \mathcal{T} := T\mathopen{(\mkern-3mu(}\mathbb{R}^d\mathclose{)\mkern-3mu)}:= \prod_{k=0}^\infty ({\mathbb{R}^d})^{\otimes k}, \end{align*} $$

which are written as in equation (2.1), but now as formal infinite sums with identical notation and multiplication rules; the resulting algebra $\mathcal {T}$ obviously extends $T(\mathbb {R}^d)$ . For any $n\in {\mathbb {N}_{\ge 1}}$ , define the projection to tensor levels by

$$ \begin{align*}\pi_n: \mathcal{T} \to ({\mathbb{R}^d})^{\otimes n}, \quad \mathbf{x} \mapsto \mathbf{x}^{(n)}.\end{align*} $$

Denote by $\mathcal {T}_0$ and $\mathcal {T}_1$ the subspaces of tensor series starting with $0$ and $1$ , respectively; that is, $\mathbf {x} \in \mathcal {T}_0$ (respectively, $\mathcal {T}_1$ ) if and only if $\mathbf {x}^\emptyset =0$ (respectively, $\mathbf {x}^\emptyset =1$ ). Restricted to $\mathcal {T}_0$ and $\mathcal {T}_1$ , respectively, the exponential and logarithm in $\mathcal {T}$ , defined by the usual series,

$$ \begin{align*} \exp\colon\mathcal{T}_0 \to \mathcal{T}_1,& \quad \mathbf{x} \mapsto \exp(\mathbf{x}) := 1 + \sum_{k=1}^\infty \frac{1}{k!}(\mathbf{x})^k, \\ \log\colon\mathcal{T}_1 \to \mathcal{T}_0,& \quad 1 + \mathbf{x} \mapsto \log( 1 + \mathbf{x}) := \sum_{k=1}^\infty \frac{(-1)^{k+1}}{k}(\mathbf{x})^k, \end{align*} $$

are globally defined and inverse to each other. The vector space $\mathcal {T}_0$ becomes a Lie algebra with the commutator bracket

$$ \begin{align*}\left[ \mathbf{x}, \mathbf{y} \right] := \mathbf{x}\mathbf{y}-\mathbf{y}\mathbf{x}, \quad \mathbf{x}, \mathbf{y} \in \mathcal{T}_0.\end{align*} $$

Define the adjoined operator associated to an Lie-algebra element $\mathbf {y} \in \mathcal {T}_0$ by

$$ \begin{align*} \operatorname{\mathrm{ad}}{\mathbf{y}}\colon \mathcal{T}_0 \to \mathcal{T}_0, \ \mathbf{x} \mapsto \left[ \mathbf{y}, \mathbf{x} \right]. \end{align*} $$

The exponential image $\mathcal {T}_1=\exp (\mathcal {T}_0)$ is a Lie group, at least formally so. We refrain from equipping the infinite-dimensional $\mathcal {T}_1$ with a differentiable structure, not necessary in view of the ‘locally finite’ nature of the group law $(\mathbf {x},\mathbf {y}) \mapsto \mathbf {x} \mathbf {y}$ .

Let $(a_k)_{k\ge 1}$ be a sequence of real numbers and $\mathbf {x}\in \mathcal {T}_0$ . Then we can always define a linear operator on $\mathcal {T}_0$ by

$$ \begin{align*} \left[ \sum_{k \ge 0}a_k(\operatorname{\mathrm{ad}}\mathbf{x})^{k} \right]: \mathcal{T}_0 \to \mathcal{T}_0, \quad \mathbf{y}\mapsto \sum_{k \ge 0}a_k(\operatorname{\mathrm{ad}}\mathbf{x})^{k}(\mathbf{y}), \end{align*} $$

where $(\operatorname {\mathrm {ad}}\mathbf {x})^{0} = \mathrm {Id}$ is the identity operator and $(\operatorname {\mathrm {ad}} \mathbf {x})^{n} = \operatorname {\mathrm {ad}}\mathbf {x} \circ (\operatorname {\mathrm {ad}} \mathbf {x})^{n-1}$ for any $n\in {\mathbb {N}_{\ge 1}}$ . Indeed, there is no convergence issue due to the graded structure, as can be seen by projecting to some tensor level $n\in {\mathbb {N}_{\ge 1}}$

(2.2)

where the inner summation in the right-hand side is over a finite set of multi-indices $\ell = (l_1, \dotsc , l_{k+1})\in ({\mathbb {N}_{\ge 1}})^{k+1}$ , where and . In the following we will simply write $(\operatorname {\mathrm {ad}}\mathbf {x}\operatorname {\mathrm {ad}}\mathbf {y}) \equiv (\operatorname {\mathrm {ad}}\mathbf {x} \circ \operatorname {\mathrm {ad}}\mathbf {y})$ for the composition of adjoint operators. Further, when $\ell = (l_1)$ is a multi-index of length one, we will use the notation $(\operatorname {\mathrm {ad}}\mathbf {x}^{(l_2)} \cdots \operatorname {\mathrm {ad}}\mathbf {x}^{(l_{k+1})}) \equiv \mathrm {Id}$ . Note also that the iteration of adjoint operations can be explicitly expanded in terms of left- and right-multiplication, as follows:

(2.3)

For a word $w \in \mathcal {W}_d$ with $|w|>0$ , we define the directional derivative for a function $f\colon \mathcal {T} \to \mathbb {R}$ by

$$ \begin{align*} (\partial_w f)(\mathbf{x}):=\partial_t (f(\mathbf{x} + t e_w))\big\vert_{t=0}, \end{align*} $$

for any $\mathbf {x} \in \mathcal {T}$ such that the right-hand derivative exists.

2.2 The outer tensor algebra

Denote by $\mathrm {m}: T({\mathbb {R}^d})\otimes T({\mathbb {R}^d}) \to T({\mathbb {R}^d})$ the multiplication (concatenation) map of the tensor algebra. Note that $\mathrm {m}$ is linear and, due to the non-commutativity of the tensor product, not symmetric. The map can naturally be extended to linear map $\mathrm {m}: T({\mathbb {R}^d}) \overline {\otimes } T({\mathbb {R}^d}) \to \mathcal {T}$ , where $T({\mathbb {R}^d}) \overline {\otimes } T({\mathbb {R}^d})$ is the following graded algebra:

$$ \begin{align*} T({\mathbb{R}^d}) \overline{\otimes} T({\mathbb{R}^d}) := \prod_{n=0}^{\infty} \left(\bigoplus_{i = 0}^{n} ({\mathbb{R}^d})^{\otimes i} \otimes ({\mathbb{R}^d})^{\otimes (n-i)}\right). \end{align*} $$

Note that there is the following natural linear embedding

$$ \begin{align*} \mathcal{T}\otimes\mathcal{T} \hookrightarrow T({\mathbb{R}^d}) \overline{\otimes} T({\mathbb{R}^d}), \quad \mathbf{x}\otimes \mathbf{y} \mapsto \sum_{n=0}^{\infty} \left(\sum_{i=0}^{n} \mathbf{x}^{(i)}\otimes \mathbf{y}^{(n-i)}\right). \end{align*} $$

We will of course refrain from explicitly denoting the embedding and simply regard $\mathbf {x} \otimes \mathbf {y}$ as an element in $T({\mathbb {R}^d}) \overline {\otimes } T({\mathbb {R}^d})$ . We emphasize that here $\otimes $ does not denote the (inner) tensor product in $\mathcal {T}$ , for which we did not reserve a symbol, but it denotes another (outer) tensor product. We can lift two linear maps $g, f\colon \mathcal {T}\to \mathcal {T}$ to a linear map $g \odot f: T({\mathbb {R}^d}) \overline {\otimes } T({\mathbb {R}^d}) \to \mathcal {T}$ defined by

$$ \begin{align*} g \odot f := \mathrm{m}\circ(g \otimes f). \end{align*} $$

In particular, for all $\mathbf {x},\mathbf {y}\in \mathcal {T}$ , it holds that

$$ \begin{align*} (g \odot f) (\mathbf{x} \otimes \mathbf{y}) = g(\mathbf{x})f(\mathbf{y})\in \mathcal{T}. \end{align*} $$

2.3 Some quotients of the tensor algebra

2.3.1 The symmetric tensor algebra

The symmetric algebra over ${\mathbb {R}^d}$ , denoted by $S({\mathbb {R}^d})$ , is the quotient of $T({\mathbb {R}^d})$ by the two-sided ideal I generated by $\{xy-yx:x,y\in {\mathbb {R}^d}\}$ . A linear basis of $S({\mathbb {R}^d})$ is then given by $\{ \hat {e}_v \}$ over non-decreasing words, $v=(i_1,\dotsc ,i_n) \in \widehat {\mathcal {W}}_d$ , with $1 \le i_1 \le \dots \le i_n \le d, n \ge 0$ . Every $\hat {\mathbf {x}} \in S(\mathbb {R}^d)$ can be written as a finite sum,

$$ \begin{align*}\hat{\mathbf{x}} = \sum_{v \in \widehat{\mathcal{W}}_d} \hat{\mathbf{x}}^v \hat{e}_v , \end{align*} $$

and we have an immediate identification with polynomials in d commuting indeterminates. The canonical projection

(2.4) $$ \begin{align} \pi_{\mathrm{Sym}}:T({\mathbb{R}^d})\twoheadrightarrow S({\mathbb{R}^d}), \quad\mathbf{x} \mapsto \sum_{w\in\mathcal{W}_d} \mathbf{x}^{w}\hat{e}_{\hat{w}}, \end{align} $$

where $\hat {w}\in \hat {\mathcal {W}}_d$ denotes the non-decreasing reordering of the letters of the word $w\in \mathcal {W}_d$ , is an algebra epimorphism, which extends to an epimorphism $\pi _{\mathrm {Sym}}: \mathcal {T}\twoheadrightarrow \mathcal {S}$ , where $\mathcal {S} = S \mathopen {(\mkern -3mu(} \mathbb {R}^d\mathclose {)\mkern -3mu)}$ is the algebra completion, identifiable as a formal series in d non-commuting (respectively, commuting) indeterminates. As a vector space, $\mathcal {S}$ can be identified with symmetric formal tensor series. Denote by $ \mathcal {S}_0$ and $\mathcal {S}_1$ the affine space given by those $\hat {\mathbf {x}}\in \mathcal {S}$ with $ \hat {\mathbf {x}}^\emptyset =0$ and $ \hat {\mathbf {x}}^\emptyset =1$ , respectively. The usual power series in $\mathcal {S}$ define $\widehat {\exp {}}\colon \mathcal {S}_0 \to \mathcal {S}_1$ with inverse $\widehat {\log {}}\colon \mathcal {S}_1 \to \mathcal {S}_0$ , and we have

$$ \begin{align*} \pi_{\mathrm{Sym}}\exp{(\mathbf{x} + \mathbf{y})} &= \widehat{\exp{}}(\hat{\mathbf{x}})\widehat{\exp{}}(\hat{\mathbf{y}}), \quad \mathbf{x}, \mathbf{y} \in \mathcal{T}_0\\ \pi_{\mathrm{Sym}}\log{(\mathbf{x} \mathbf{y})} &= \widehat{\log{}}(\hat{\mathbf{x}}) + \widehat{\log{}}(\hat{\mathbf{y}}), \quad \mathbf{x},\mathbf{y} \in \mathcal{T}_1. \end{align*} $$

We shall abuse notation in what follows and write $\exp $ (respectively, $\log $ ), instead of $\widehat {\exp }$ (respectively, $ \widehat {\log }$ ).

2.3.2 The (step-n) truncated tensor algebra

For $n\in {\mathbb {N}}$ , the subspace

$$\begin{align*}\mathcal{I}_n:=\prod_{k=n+1}^\infty({\mathbb{R}^d})^{\otimes k} \end{align*}$$

is a two-sided ideal of $\mathcal {T}$ . Therefore, the quotient space $\mathcal {T}/\mathcal {I}_n$ has a natural algebra structure. We denote the projection map by $\pi _{(0,n)}$ . We can identify $\mathcal {T}/\mathcal {I}_n$ with

$$\begin{align*}\mathcal{T\,}^n := \bigoplus_{k=0}^n({\mathbb{R}^d})^{\otimes k}, \end{align*}$$

equipped with truncated tensor product

$$ \begin{align*}\mathbf{x}\mathbf{y} = \sum_{k=0}^{n} \sum_{\ell_1 + \ell_2=k} \mathbf{x}^{(\ell_1)} \mathbf{y}^{(\ell_2)} = \sum_{w \in \mathcal{W}^d,|w|\le n} \left( \sum_{w_1w_2 = w} \mathbf{x}^{w_1}\mathbf{y}^{w_2} \right) e_w \in \mathcal{T\,}^n. \end{align*} $$

The sequence of algebras $(\mathcal {T}^n:n\ge 0)$ forms an inverse system with limit $\mathcal {T}$ . There are also canonical inclusions $\mathcal {T}^k\hookrightarrow \mathcal {T}^{n}$ for $k\le n$ ; in fact, this forms a direct system with limit $T({\mathbb {R}^d})$ . The usual power series in $\mathcal {T}^{n}$ define $\exp _n\colon \mathcal {T}^n_0 \to \mathcal {T}^n_1$ with inverse $\log _n\colon \mathcal {T}^n_1 \to \mathcal {T}^n_0$ ; we may again abuse notation and write $\exp $ and $\log $ when no confusion arises. (In the proof section Section 7.2, we will stick to the $\exp _n$ notation to emphasis the presence of truncation.) As before, $\mathcal {T}^n_0$ has a natural Lie algebra structure, and $\mathcal {T}^n_1$ (now finite dimensional) is a bona fide Lie group.

We equip $T(\mathbb {R}^d)$ with the norm

$$ \begin{align*} |a|_{T(\mathbb{R}^d)} := \max_{k\in{\mathbb{N}}}|a^{(k)}|_{({\mathbb{R}^d})^{\otimes k}}, \end{align*} $$

where $|\cdot |_{({\mathbb {R}^d})^{\otimes k}}$ is the Euclidean norm on $({\mathbb {R}^d})^{\otimes k}\cong \mathbb {R}^{d^k}$ , which makes it a Banach space. The same norm makes sense in $\mathcal {T}^n$ , and since the definition is consistent in the sense that $|a|_{\mathcal {T}^k} = |a|_{\mathcal {T}^n}$ for any $a \in \mathcal {T}^{n}$ and $k \ge n$ , and $|a|_{\mathcal {T}^n} = |a|_{({\mathbb {R}^d})^{\otimes n}}$ for any $a \in ({\mathbb {R}^d})^{\otimes n}$ . We will drop the index whenever it is possible and write simply $|a|$ .

2.4 Semimartingales

Let $\mathscr {D}$ be the space of adapted càdlàg process $X\colon \Omega \times [0,T) \to \mathbb {R}$ with $T\in (0,\infty ]$ defined on some filtered probability space $(\Omega , (\mathcal {F}_t)_{0 \le t \le T}, \mathbb {P})$ . The space of semimartingales $\mathscr {S}$ is given by the processes $X\in \mathscr {D}$ that can be decomposed as

$$ \begin{align*}X_{t}=X_0+M_{t}+A_{t}, \end{align*} $$

where $M \in \mathscr {M}_{\mathrm {loc}}$ is a càdlàg local martingale and $A\in \mathscr {V}$ is a càdlàg adapted process of locally bounded variation, both started at zero. Recall that every $X \in \mathscr {S}$ has a well-defined continuous local martingale part denoted by $X^c\in \mathscr {M}^c_{\mathrm {loc}}$ . The quadratic variation process of X is then given by

$$ \begin{align*} [X]_t = \left\langle X^{c} \right\rangle_t + \sum_{0 < u \le t} (\Delta X_u)^{2}, \quad 0 \le t \le T, \end{align*} $$

where $\left \langle \cdot \right \rangle $ denotes the (predictable) quadratic variation of a continuous semimartingale. Covariation square, respectively, angle brackets $[X,Y]$ and $\left \langle X^{c}, Y^{c} \right \rangle $ , for another real-valued semimartingale Y, are defined by polarization. For $q \in [1, \infty )$ , write $\mathcal {L}^{q} = L^{q}(\Omega , \mathcal {F}, \mathbb {P})$ ; then a Banach space $\mathscr {H}^q \subset \mathscr {S}$ is given by those $X\in \mathscr {S}$ with $X_0 = 0$ and

$$ \begin{align*}\| X \|_{\mathscr{H}^q} := \inf_{X = M + A} \bigg\Vert \left[ M \right]^{1 / 2}_{T} + \int_0^{T} |\mathrm{d} A_s | \bigg\Vert_{\mathcal{L}^{q}}< \infty. \end{align*} $$

Note that for a local martingale $M \in \mathscr {M}_{\mathrm {loc}}$ , it holds that (see [Reference Protter57, Ch. V, p. 245])

For a process $X\in \mathscr {D}$ , we define

and define the space $\mathscr {S}^q\subset \mathscr {S}$ of semimartingales $X\in \mathscr {S}$ such that $\Vert {X}\Vert _{\mathscr {S}^q}< \infty $ . Note that there exists a constant $c_q>0$ depending on q such that (see [Reference Protter57, Ch. V, Theorem 2])

(2.5) $$ \begin{align} \Vert{X}\Vert_{\mathscr{S}^q} \le c_q \| X \|_{\mathscr{H}^q}. \end{align} $$

We view d-dimensional semimartingales, $X= \sum _{i=1}^d X^i e_i \in \mathscr {S} (\mathbb {R}^d)$ , as special cases of tensor series valued semimartingales $\mathscr {S} (\mathcal {T}\,)$ of the form

$$ \begin{align*}\mathbf{X} = \sum_{w \in \mathcal{W}_d} \mathbf{X}^w e_w\end{align*} $$

with each component $\mathbf {X}^w$ a real-valued semimartingale. This extends mutatis mutandis to the spaces of $\mathcal {T}$ -valued adapted càdlàg processes $\mathscr {D}(\mathcal {T}\,)$ , martingales $\mathscr {M}(\mathcal {T}\,)$ and adapted càdlàg processes with finite variation $\mathscr {V}(\mathcal {T}\,)$ . Note also that we typically deal with $\mathcal {T}_0$ -valued semimartingales, which amounts to having only words with length $|w| \ge 1$ . Standard notions such as continuous local martingale $\mathbf {X}^{c}$ and jump process $\Delta \mathbf {X}_t = \mathbf {X}_t - \mathbf {X}_{t^-}$ are defined componentwise.

Brackets: Now let $\mathbf {X}$ and $\mathbf {Y}$ be $\mathcal {T}$ -valued semimartingales. We define the (non-commutative) outer quadratic covariation bracket of $\mathbf {X}$ and $\mathbf {Y}$ by

Similarly, define the (non-commutative) inner quadratic covariation bracket by

for continuous $\mathcal {T}$ -valued semimartingales $\mathbf {X}, \mathbf {Y}$ , this coincides with the predictable quadratic covariation

$$\begin{align*}\left\langle \mathbf{X}^{}, \mathbf{Y}^{} \right\rangle_t := \sum_{w\in\mathcal{W}_d }\left(\sum_{w_1 w_2 = w} \left\langle \mathbf{X}^{w_1}, \mathbf{Y}^{w_2} \right\rangle_t \right)e_w \in \mathcal{T}. \end{align*}$$

As usual, we may write

and $\left \langle \mathbf {X} \right \rangle \equiv \left \langle \mathbf {X}, \mathbf {X} \right \rangle $ .

$\mathscr {H}$ -spaces: The definition of $\mathscr {H}^{q}$ -norm naturally extends to tensor valued martingales. More precisely, for $\mathbf {X}^{(n)} \in \mathscr {S}(({\mathbb {R}^d})^{\otimes n})$ with $n\in {\mathbb {N}_{\ge 1}}$ and $q\in [1,\infty )$ , we define

where the infimum is taken over all possible decompositions $\mathbf {X}^{(n)} = \mathbf {M} + \mathbf {A}$ with $\mathbf {M}\in \mathscr {M}_{\mathrm {loc}}(({\mathbb {R}^d})^{\otimes n})$ and $\mathbf {A}\in \mathscr {V}(({\mathbb {R}^d})^{\otimes n})$ , where

with the supremum taken over all partitions of the interval $[0,T]$ . One may readily check that

Further define the following subspace $\mathscr {H}^{q,N} \subset \mathscr {S}(\mathcal {T}_0^{N})$ of homogeneously q-integrable semimartingales

$$ \begin{align*} \mathscr{H}^{q,N} :=\left\{ \mathbf{X} \in \mathscr{S}(\mathcal{T}_0^{N}) \;\Big\vert\; \mathbf{X}_0 = 0,\; \vert\mkern-2.5mu\vert\mkern-2.5mu\vert{\mathbf{X}}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert_{\mathscr{H}^{q,N}} < \infty \right\}, \end{align*} $$

where for any $\mathbf {X} \in \mathscr {S}(\mathcal {T\,}^{N})$ , we define

$$ \begin{align*} \vert\mkern-2.5mu\vert\mkern-2.5mu\vert{\mathbf{X}}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert_{\mathscr{H}^{q,N}} := \sum_{n=1}^{N} \big( \left\Vert {\mathbf{X}^{(n)}} \right\Vert _{\mathscr{H}^{qN/n}}\big)^{1/n}. \end{align*} $$

Note that $\vert \mkern -2.5mu\vert \mkern -2.5mu\vert {\cdot }\vert \mkern -2.5mu\vert \mkern -2.5mu\vert _{\mathscr {H}^{q,N}}$ is sub-additive and positive definite on $\mathscr {H}^{q, N}$ , and it is homogeneous under dilation in the sense that

We also introduce the following subspace of $\mathscr {S}(\mathcal {T}\,)$ :

$$ \begin{align*} \mathscr{H}^{\infty-}(\mathcal{T}) := \left\{ \mathbf{X} \in \mathscr{S}(\mathcal{T}):\; \mathbf{X}^w\in\mathscr{H}^q, \;\forall\, 1 \le q < \infty,\;w\in\mathcal{W}_d\right\}. \end{align*} $$

Note that if $\mathbf {X}\in \mathscr {S}(\mathcal {T}\,)$ such that $\vert \mkern -2.5mu\vert \mkern -2.5mu\vert {\mathbf {X}^{(0,N)}}\vert \mkern -2.5mu\vert \mkern -2.5mu\vert _{\mathscr {H}^{1,N}} < \infty $ for all $N\in {\mathbb {N}_{\ge 1}}$ , then it also holds that $\mathbf {X}\in \mathscr {H}^{\infty -}(\mathcal {T}\,)$ .

Stochastic integrals: We are now going to introduce a notation for the stochastic integration with respect to tensor valued semimartingales. Denote by $\mathcal {L}(\mathcal {T}; \mathcal {T}) = \{ f: \mathcal {T} \to \mathcal {T}\;|\; f \text { is linear.}\}$ the space of endomorphism on $\mathcal {T}$ , and let $\mathbf {F}: \Omega \times [0,T] \to \mathcal {L}(\mathcal {T}; \mathcal {T})$ with $(t, \omega ) \mapsto \mathbf {F}_t(\omega; \cdot )$ such that it holds

(2.6) $$ \begin{align} &(\mathbf{F}_t(\mathbf{x}))_{0 \le t \le T} \in \mathscr{D}(\mathcal{T}), \quad \text{for all } \mathbf{x}\in\mathcal{T} \end{align} $$
(2.7) $$ \begin{align} \text{and}\quad &\mathbf{F}_t(\omega; \mathcal{I}_n) \subset \mathcal{I}_n, \quad \text{for all } n\in{\mathbb{N}}, \; (\omega, t)\in\Omega\times[0,T], \end{align} $$

where $\mathcal {I}_n\subset \mathcal {T}$ was introduced in Section 2.3.2, consisting of series with tensors of level $n+1$ and higher. In this case, we can define the stochastic Itô integral (and then analogously the Stratonovich/Marcus integral) of $\mathbf {F}$ with respect to $\mathbf {X}\in \mathscr {S}(\mathcal {T}\,)$ by

(2.8)

For example, let $\mathbf {Y}, \mathbf {Z} \in \mathscr {D}(\mathcal {T}\,)$ , and define $\mathbf {F} := \mathbf {Y}\,\mathrm {Id}\,\mathbf {Z}$ : that is, $\mathbf {F}_t(\mathbf {x}) = \mathbf {Y}_t \, \mathbf {x} \, \mathbf {Z}_t$ , the concatenation product from the left and right, for all $\mathbf {x}\in \mathcal {T}$ . Then we see that $\mathbf {F}$ indeed satisfies the conditions in equations (2.6) and (2.7), and we have

(2.9) $$ \begin{align} \int_{(0, \cdot]} (\mathbf{Y}_{t-}\,\mathrm{Id}\,\mathbf{Z}_{t-})(\mathrm{d} \mathbf{X}_t)= \int_{(0, \cdot]} \mathbf{Y}_{t-}\mathrm{d} \mathbf{X}_t \mathbf{Z}_{t-}= \sum_{w\in\mathcal{W}_d}\left( \sum_{w_1 w_2w_3 = w} \int_{(0, \cdot]} \mathbf Z^{w_1}_{t-} \mathbf{Y}^{w_3}_{t-}\,\mathrm d \mathbf{X}_t^{w_2} \right)e_w. \end{align} $$

Another important example is given by $\mathbf {F} = (\operatorname {\mathrm {ad}} \mathbf {Y})^{k}$ for any $\mathbf {Y}\in \mathscr {D}(\mathcal {T}_0)$ and $k\in {\mathbb {N}}$ . Indeed, we immediately see that $\mathbf {F}$ satisfies the condition in equation (2.7); and recalling from equation (2.3) that the iteration of adjoint operations can be expanded in terms of left- and right-multiplication, we also see that $\mathbf {F}$ satisfies equation (2.6). More generally, let $(a_k)_{k=0}^{\infty }\subset \mathbb {R}$ , and let $\mathbf {X}\in \mathscr {S}(\mathcal {T}_0)$ ; then the following integral

(2.10)

is well defined in the sense of equation (2.9). The definition of the integral with integrands of the form $\mathbf {F}: \Omega \times [0,T] \to \mathcal {L}(T({\mathbb {R}^d}) \overline {\otimes } T({\mathbb {R}^d}); \mathcal {T})$ with respect to processes $\mathbf {X} \in \mathscr {S}(T({\mathbb {R}^d}) \overline {\otimes } T({\mathbb {R}^d}))$ is completely analogous.

Quotient algebras: All of this extends in a straightforward way to the case of semimartingales in the quotient algebra of Section 2.3: that is, symmetric and truncated algebra, in particular given $\mathbf {X}$ and $\mathbf {Y}$ in $\mathscr {S}(\mathcal {S})$ have well-defined continuous local martingale parts denoted by $\mathbf {X}^c,\mathbf {Y}^c$ , respectively, with inner (predictable) quadratic covariation given by

$$\begin{align*}\langle\mathbf{X}^c,\mathbf{Y}^c\rangle =\sum_{w_1,w_2\in \widehat{\mathcal{W}}_d}\langle \mathbf{X}^{w_1,c},\mathbf{Y}^{w_2,c}\rangle\hat{e}_{w_1}\hat{e}_{w_2}. \end{align*}$$

Write $\mathcal {S}^N$ for the truncated symmetric algebra, linearly spanned by $\{ \hat {e}_{w}: w \in \widehat {\mathcal {W}}_d, |w| \le N\}$ and $\mathcal {S}^N_0$ for those elements with zero scalar entry. In complete analogy with non-commutative setting discussed above, we then write $\widehat {\mathscr {H}}^{q,N} \subset \mathscr {S}(\mathcal {S}^N_0)$ for the corresponding space of homogeneously q-integrable semimartingales.

2.5 Diamond products

We extend the notion of the diamond product introduced in [Reference Alos, Gatheral and Radoičić3] for continuous scalar semimartingales to our setting. Denote by $\mathbb {E}_t$ the conditional expectation with respect to the sigma algebra $\mathcal {F}_t$ . Then we have the following:

Definition 2.1. For $\mathbf {X}$ and $\mathbf {Y}$ in $\mathscr {S}(\mathcal {T}\,)$ , define

$$ \begin{align*}(\mathbf{X} \diamond \mathbf{Y})_t(T) := \mathbb{E}_t \big( \left\langle \mathbf{X}^c, \mathbf{Y}^c \right\rangle_{t,T} \big)=\sum_{w\in\mathcal{W}_d}\left( \sum_{w_1w_2=w}(\mathbf{X}^{w_1}\diamond\mathbf{Y}^{w_2})_t(T) \right)e_w \in \mathcal{T} \end{align*} $$

whenever the $\mathcal {T}$ -valued quadratic covariation that appears on the right-hand side is integrable. Similar to the previous section, we also define an outer diamond, for $\mathbf {X},\mathbf {Y}\in \mathcal {T}$ , by

This definition extends immediately to semimartingales with values in the quotient algebras of Section 2.3. In particular, given $\hat {\mathbf {X}}$ and $\hat {\mathbf {Y}}$ in $\mathscr {S}(\mathcal {S})$ , we have

$$ \begin{align*}( \hat{\mathbf{X}} \diamond \hat{\mathbf{Y}})_t(T) := \mathbb{E}_t \big( \langle \hat{\mathbf{X}}^c, \hat{\mathbf{Y}}^c \rangle_{t,T} \big) = \sum_{w_1,w_2\in \widehat{\mathcal{W}}_d} (\hat{\mathbf{X}}^{w_1} \diamond \hat{\mathbf{Y}}^{w_2})_t(T) \hat{e}_{w_1}\hat{e}_{w_2} \in \mathcal{S}, \end{align*} $$

where the last expression is given in terms of diamond products of scalar semimartingales.

Lemma 2.2. Let $p,q,r\in [1,\infty )$ such that $1/p + 1/q + 1/r < 1$ , and let $X\in \mathscr {M}_{\mathrm {loc}}^{c}(({\mathbb {R}^d})^{\otimes l})$ , $Y\in \mathscr {M}_{\mathrm {loc}}^{c}(({\mathbb {R}^d})^{\otimes m})$ , and $Z\in \mathscr {D}(({\mathbb {R}^d})^{\otimes n})$ with $l,m,n\in {\mathbb {N}}$ , such that . Then it holds that for all $0 \le t \le T$

$$ \begin{align*} \mathbb{E}_t\left(\int_t^{T}Z_{u-}\mathrm{d}(X \diamond Y)_u(T)\right) = -\mathbb{E}_t\left(\int_{t}^T Z_{u-}\mathrm{d}\left\langle X, Y \right\rangle_u\right). \end{align*} $$

Proof. Using the Kunita-Watanabe inequality (Lemma 7.1) we see that the expectation on the right-hand side is well defined. Further note that it follows from Emery’s inequality (Lemma 7.3) and Doob’s maximal inequality that the local martingale

$$ \begin{align*} \int_0^{\cdot}Z_{u-}\mathrm{d}(\mathbb{E}_u\left\langle X, Y \right\rangle_T) \end{align*} $$

is a true martingale. Recall the definition of the diamond product, and observe that the difference of the left- and right-hand sides of the above equation is a conditional expectation of a martingale interment and is hence zero.

2.6 Generalized signatures

We now give the precise meaning of equation (1.1): that is, $\mathrm d\mathbf {S}=\mathbf {S}\,{\circ \mathrm d}\mathbf {X}$ , or component-wise, for every word $w\in \mathcal {W}_d$ ,

$$\begin{align*}\mathrm d\mathbf{S}^w=\sum_{w_1w_2=w}\mathbf{S}^{w_1}\,{\circ\mathrm d}\mathbf{X}^{w_2}, \end{align*}$$

where the driving noise $\mathbf {X}$ is a $\mathcal {T}_0$ -valued semimartingale, so that $\mathbf {X}^{\emptyset } \equiv 0$ . Following [Reference Marcus49, Reference Marcus50, Reference Estrade20, Reference Kurtz, Pardoux and Protter41, Reference Friz and Shekhar25, Reference Bruned, Curry and Ebrahimi-Fard8], the integral meaning of this equation, started at time s from $\mathbf {s} \in \mathcal {T}_1$ , for times $t \ge s$ , is given by

(2.11) $$ \begin{equation} \mathbf{S}_t = \mathbf{s} + \int_{(s,t]} \mathbf{S}_{u-}\,\mathrm d \mathbf{X}_u + \frac{1}{2}\int_s^{t} \mathbf{S}_{u-}\,\mathrm d\left\langle \mathbf{X}^{c} \right\rangle_u + \sum_{s< u \le t} \mathbf{S}_{u-}\big(\exp(\Delta \mathbf{X}_u)-1-\Delta \mathbf{X}_u\big), \end{equation} $$

leaving the component-wise version to the reader. We have

Proposition 2.3. Let $\mathbf {s}\in \mathcal {T}_1$ and suppose $\mathbf {X}$ takes values in $\mathcal {T}_0$ . For every $s \ge 0$ and $\mathbf {s} \in \mathcal {T}_1$ , equation (2.11) has a unique global solution on $\mathcal {T}_1$ starting from $\mathbf {S}_s=\mathbf {s}$ .

Proof. Note that $\mathbf {S}$ solves equation (2.11) iff $\mathbf {s}^{-1} \mathbf {S}$ solves the same equation started from $1 \in \mathcal {T}_1$ . We may thus take $\mathbf {s} = 1$ without loss of generality. The graded structure of our problem, and more precisely that $\mathbf {X} = (0,X,\mathbb {X},\dots )$ in equation (2.11) has no scalar component, shows that the (necessarily) unique solution is given explicitly by iterated integration, as may be seen explicitly when writing out $\mathbf {S}^{(0)} \equiv 1$ , $\mathbf {S}^{(1)}_t = \int _s^t \mathrm d X = X_{s,t} \in \mathbb {R}^d$ ,

$$ \begin{align*}\mathbf{S}^{(2)}_t = \int_{(s,t]} \mathbf{S}^{(1)}_{u-}\,\mathrm d X_u +\mathbb{X}_{t} -\mathbb{X}_{s} + \frac{1}{2} \left\langle X^{c} \right\rangle_{s,t} + \frac{1}{2} \sum_{s< u \le t} (\Delta X_u)^2 \in (\mathbb{R}^d)^{\otimes 2}, \end{align*} $$

and so on. (In particular, we do not need to rely on abstract existence, uniqueness results for Marcus SDEs [Reference Kurtz, Pardoux and Protter41] or Lie group stochastic exponentials [Reference Hakim-Dowek and Lépingle31].)

Definition 2.4. Let $\mathbf {X}$ be a $\mathcal {T}_0$ -valued semimartingale defined on some interval $[s,t]$ . Then

$$ \begin{align*}\mathrm{Sig} (\mathbf{X} \vert_{[s,t]}) \equiv \mathrm{Sig}(\mathbf{X})_{s,t}\end{align*} $$

is defined to be the unique solution to equation (2.11) on $[s,t]$ , such that $\mathrm {Sig}(\mathbf {X})_{s,s}=1$ .

The following can be seen as a (generalized) Chen relation.

Lemma 2.5. Let $\mathbf {X}$ be a $\mathcal {T}_0$ -valued semimartingale on $[0,T]$ and $0 \le s \le t \le u \le T$ . Then the following identity holds with probability one, for all such $s,t,u$ :

(2.12) $$ \begin{align} \mathrm{Sig}(\mathbf{X})_{s,t}\mathrm{Sig}(\mathbf{X})_{t,u}=\mathrm{Sig}(\mathbf{X})_{s,u}. \end{align} $$

Proof. Call $\Phi _{t \leftarrow s} \mathbf {s} := \mathbf {S}_t$ the solution to equation (2.11) at time $t \ge s$ , started from $\mathbf {S}_s = \mathbf {s}$ . By uniqueness of the solution flow, we have $ \Phi _{u \leftarrow t} \circ \Phi _{t \leftarrow s} = \Phi _{u \leftarrow s}. $ It now suffices to remark that, thanks to the multiplicative structure of equation (2.11), we have $ \Phi _{t \leftarrow s} \mathbf {s} = \mathbf {s} \mathrm {Sig}(\mathbf {X})_{s,t}$ .

3 Expected signatures and signature cumulants

3.1 Definitions and existence

Throughout this section, let $\mathbf {X} \in \mathscr {S}(\mathcal {T}_0)$ be defined on a filtered probability space $(\Omega , \mathcal {F}, (\mathcal {F}_t)_{0 \le t \le T}, \mathbb {P})$ . Recall that $\mathbb {E}_t$ denotes the conditional expectation with respect to the sigma algebra $\mathcal {F}_t$ . When $\mathbb {E}(|\mathrm {Sig}(\mathbf {X})^w_{0,t}|)<\infty $ for all $0 \le t \le T$ and all words $w\in \mathcal {W}_d$ , then the (conditional) expected signature

$$ \begin{align*} \boldsymbol{\mu}_t(T) := \mathbb{E}_t\left(\mathrm{Sig}(\mathbf{X})_{t,T}\right) = \sum_{w\in\mathcal{W}_d}\mathbb{E}_t(\mathrm{Sig}(\mathbf{X})^w_{t,T})e_w \in \mathcal{T}_1, \quad 0 \le t \le T, \end{align*} $$

is well defined. In this case, we can also define the (conditional) signature cumulant of $\mathbf {X}$ by

$$ \begin{align*} \boldsymbol{\kappa}_{t}(T):=\log\left(\boldsymbol{\mu}_t(T)\right) \in \mathcal{T}_0, \quad 0 \le t \le T. \end{align*} $$

An important observation is the following:

Lemma 3.1. Given $\mathbb {E}(|\mathrm {Sig}(\mathbf {X})^w_{0,t}|)<\infty $ for all $0 \le t \le T$ and words $w\in \mathcal {W}_d$ , then $\boldsymbol {\mu }(T) \in \mathscr {S}(\mathcal {T}_1)$ and $\boldsymbol {\kappa }(T)\in \mathscr {S}(\mathcal {T}_0)$ .

Proof. It follows from the relation in equation (2.12) that

$$ \begin{align*} \boldsymbol{\mu}_t(T) = \mathbb{E}_t\left(\mathrm{Sig}(\mathbf{X})_{t,T}\right) = \mathbb{E}_t\left(\mathrm{Sig}(\mathbf{X})_{0,t}^{-1}\mathrm{Sig}(\mathbf{X})_{0,T}\right) = \mathrm{Sig}(\mathbf{X})_{0,t}^{-1}\mathbb{E}_t\left(\mathrm{Sig}(\mathbf{X})_{0,T}\right). \end{align*} $$

Therefore, projecting to the tensor components, we have

$$ \begin{align*} \boldsymbol{\mu}_t(T)^w = \sum_{w_1w_2 = w}(-1)^{|w_1|}\mathrm{Sig}(\mathbf{X})^{w_1}_{0,t}\mathbb{E}_t\left(\mathrm{Sig}(\mathbf{X})^{w_2}_{0,T}\right), \quad 0 \le t \le T, \quad w \in \mathcal{W}_d. \end{align*} $$

Since $(\mathrm {Sig}(\mathbf {X})^w_{0, t})_{0 \le t \le T}$ and $(\mathbb {E}_t(\mathrm {Sig}(\mathbf {X})^w_{0,T})_{0 \le t \le T}$ are semimartingales (the latter in fact a martingale), it follows from Itô’s product rule that $\boldsymbol {\mu }^w(T)$ is also a semimartingale for all words $w\in \mathcal {W}_d$ , hence $\boldsymbol {\mu }(T)\in \mathscr {S}(\mathcal {T}_1)$ . Further recall that $\boldsymbol {\kappa }(T) = \log (\boldsymbol {\mu }(T))$ , and therefore it follows from the definition of the logarithm on $\mathcal {T}_1$ that each component $\boldsymbol {\kappa }(T)^w$ with $w\in \mathcal {W}_d$ is a polynomial of $(\boldsymbol {\mu }(T)^{v})_{v\in \mathcal {W}_d, |v|\le |w|}$ . Hence it follows again by Itô’s product rule that $\boldsymbol {\kappa }(T)\in \mathscr {S}(\mathcal {T}_0)$ .

It is of strong interest to have a more explicit necessary condition for the existence of the expected signature. The following theorem, the proof of which can be found in Section 7.1, yields such a criterion.

Theorem 3.2. Let $q\in [1, \infty )$ and $N\in {\mathbb {N}_{\ge 1}}$ ; then there exist two constants $c,C>0$ depending only on d, N and q, such that for all $\mathbf {X} \in \mathscr {H}^{q,N}$

$$ \begin{align*} c\vert\mkern-2.5mu\vert\mkern-2.5mu\vert{\mathbf{X}}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert_{\mathscr{H}^{q,N}} \le \vert\mkern-2.5mu\vert\mkern-2.5mu\vert{\mathrm{Sig}(\mathbf{X})_{0,\cdot}}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert_{\mathscr{H}^{q,N}} \le C\vert\mkern-2.5mu\vert\mkern-2.5mu\vert{\mathbf{X}}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert_{\mathscr{H}^{q,N}}. \end{align*} $$

In particular, if $\mathbf {X}\in \mathscr {H}^{\infty -}(\mathcal {T}_0)$ , then $\mathrm {Sig}(\mathbf {X})_{0,\cdot }\in \mathscr {H}^{\infty -}(\mathcal {T}_1)$ , and the expected signature exists.

Remark 3.3. Let $\mathbf {X} = (0, M, 0, \dotsc , 0)$ , where $M \in \mathscr {M}({\mathbb {R}^d})$ is a martingale; then

and we see that the above estimate implies that

$$ \begin{align*} \max_{n=1, \dotsc, N} \left\Vert {\mathrm{Sig}(\mathbf{X})^{(n)}_{0, \cdot}} \right\Vert _{\mathscr{S}^{qN/n}}^{1/n} \le C \left\Vert {M} \right\Vert _{\mathscr{H}^{qN}}. \end{align*} $$

This estimate is already known and follows from the Burkholder-Davis-Gundy inequality for enhanced martingales, which was first proved in the continuous case in [Reference Friz and Victoir26] and for the general case in [Reference Chevyrev and Friz12].

Remark 3.4. When $q>1$ , the above estimate also holds true when the signature $\mathrm {Sig}(\mathbf {X})_{0,\cdot }$ is replaced by the conditional expected signature $\boldsymbol {\mu }(T)$ or the conditional signature cumulant $\boldsymbol {\kappa }(T)$ . This will be seen in the proof of Theorem 4.1 below (more precisely in Claim 7.12).

3.2 Moments and cumulants

We quickly discuss the development of a symmetric algebra valued semimartingale, more precisely, $\hat {\mathbf {X}} \in \mathscr {S}(\mathcal {S}_0)$ , in the group $\mathcal {S}_1$ . That is, we consider

(3.1) $$ \begin{align} \mathrm{d} \hat{\mathbf{S}} = \hat{\mathbf{S}}\,\circ \mathrm{d} \hat{\mathbf{X}}. \end{align} $$

It is immediate (validity of chain rule) that the unique solution to this equation, at time $t \ge s$ , started at $\hat {\mathbf {S}}_s = \hat {\mathbf {s}} \in \mathcal {S}_1$ is given by

$$\begin{align*}\hat{\mathbf{S}}_{t}:= \exp\left( \hat{\mathbf{X}}_t- \hat{\mathbf{X}}_s \right)\hat{\mathbf{s}} \in \mathcal{S}_1, \end{align*}$$

and we also write $\hat {\mathbf {S}}_{s,t} = \exp ( \hat {\mathbf {X}}_t-\hat {\mathbf {X}}_s )$ for this solution started at time s from $1\in \mathcal {S}_1$ . The relation to signatures is as follows. Recall that the $\pi _{\mathrm {Sym}}$ denotes the canonical projection from $\mathcal {T}$ to $\mathcal {S}$ .

Proposition 3.5. Let $\mathbf {X},\mathbf Y\in \mathscr {S}(\mathcal {T}\,)$ , and define $\hat {\mathbf {X}} := \pi _{\mathrm {Sym}}(\mathbf {X})$ and $\hat {\mathbf {Y}} := \pi _{\mathrm {Sym}}(\mathbf {Y})$ . Then it holds that

  1. (i) $\hat {\mathbf {X}}, \hat {\mathbf {Y}}\in \mathscr {S}(\mathcal {S})$ , and for the indefinite Itô integral, we have in the sense of indistinguishable processes

    (3.2) $$ \begin{align} \pi_{\mathrm{Sym}} \int \mathbf{X} \mathrm d\mathbf Y= \int \hat{\mathbf{X}}\,\mathrm d\hat{\mathbf Y}, \end{align} $$
  2. (ii) $\hat {\mathbf {S}} := \pi _{\mathrm {Sym}}{\mathrm {Sig}(\mathbf {X})_{s,\cdot }}$ solves equation (3.1) started at time s from $1\in \mathcal {S}_1$ and driven by $\hat {\mathbf {X}}$ . In particular,

    $$ \begin{align*}\hat{\mathbf{S}}_{s,t}=\exp(\hat{\mathbf{X}}_t-\hat{\mathbf{X}}_s)\end{align*} $$
    for all $0 \le s \le t \le T$ .

Proof. (i) That the projections $\hat {\mathbf {X}},\hat {\mathbf Y}$ define $\mathcal {S}$ -valued semimartingales follows from the componentwise definition and the fact that the canonical projection is linear. In particular, the right-hand side of equation (3.2) is well defined. To show equation (3.2), we apply the canonical projection $\pi _{\mathrm {Sym}}$ to both sides of equation (2.9) after choosing $Z_t\equiv \mathbf 1$ ; and using the explicit action of $\pi _{\mathrm {Sym}}$ on basis tensors, we obtain the identity

$$\begin{align*}\pi_{\mathrm{Sym}}\int\mathbf{X}\,\mathrm d\mathbf{Y}=\sum_{w\in\mathcal{W}^d}\left( \sum_{uv=w}\int \mathbf{X}^u\,\mathrm d\mathbf{Y}^v \right)\hat{e}_{\hat{w}}=\int\hat{\mathbf{X}}\,\mathrm d\hat{\mathbf{Y}} \end{align*}$$

by equation (2.4). Part (ii) is then immediate.

Assuming componentwise integrability, we then define symmetric moments and cumulants of the $\mathcal {S}$ -valued semimartingale $\hat {\mathbf {X}}$ by

$$ \begin{align*} \hat{\boldsymbol{\mu}}_t(T) & := \mathbb{E}_t\left(\exp\left( \hat{\mathbf{X}}_T- \hat{\mathbf{X}}_t\right)\right) = \sum_{v\in\hat{\mathcal{W}}_d} \mathbb{E}_t \left( \exp\left( \hat{\mathbf{X}}_T-\hat{\mathbf{X}}_t\right)^v_{t,T}\right )\hat{e}_v \in \mathcal{S}_1,\\ \hat{\boldsymbol{\kappa}}_t(T) & := \log\left( \hat{\boldsymbol{\mu}}_t(T) \right)\in \mathcal{S}_0, \end{align*} $$

for $0\le t \le T$ . If $\hat {\mathbf {X}} = \pi _{\mathrm {Sym}}(\mathbf {X})$ , for $ \mathbf {X} \in \mathscr {S}(\mathcal {T}\,)$ , with expected signature and signature cumulants $\boldsymbol {\mu }(T)$ and $\boldsymbol {\kappa }(T)$ , it is then clear that the symmetric moments and cumulants of $\hat {\mathbf {X}}$ are obtained by projection,

$$ \begin{align*}\hat{\boldsymbol{\mu}}(T) = \pi_{\mathrm{Sym}}( \boldsymbol{\mu}(T)), \quad \hat{\boldsymbol{\kappa}}(T) = \pi_{\mathrm{Sym}}(\boldsymbol{\kappa}(T)).\end{align*} $$

Example 3.6. Let $ X$ be an $\mathbb {R}^d$ -valued martingale in $\mathscr H^{\infty -}$ , and $\hat {\mathbf {X}}_t:=\sum _{i=1}^dX^i_t\hat {e}_i$ . Then

$$\begin{align*}\hat{\boldsymbol{\mu}}_t(T)=\sum_{n=0}^\infty\frac{1}{n!}\mathbb{E}_t(X_T-X_t)^n=1+\sum_{n=1}^\infty\frac{1}{n!}\sum_{i_1,\dotsc,i_n=1}^d\mathbb{E}_t\left[ (X_T^{i_1}-X_t^{i_1})\dotsm(X_T^{i_n}-X_t^{i_n}) \right]\hat{e}_{\widehat{i_1\dotsm i_n}} \end{align*}$$

consists of the (time-t conditional) multivariate moments of $X_T-X_t \in \mathbb {R}^d$ . Here, the series on the right-hand side is understood in the formal sense. It readily follows, also noted in [Reference Bonnier and Oberhauser7, Example 3.3], that $\hat {\boldsymbol {\kappa }}_t (T) = \log (\hat {\boldsymbol {\mu }}_t(T))$ consists precisely of the multivariate cumulants of $X_T-X_t$ . Note that the symmetric moments and cumulants of the scaled process $a X$ , $a \in \mathbb {R}$ , are precisely given by $\delta _a \hat {\boldsymbol {\mu }}$ and $\delta _a \hat {\boldsymbol {\kappa }}$ , where the linear dilation map is defined by $\delta _a\colon \hat {e}_w \mapsto a^{|w|} \hat {e}_w$ . The situation is similar for $a \cdot X=(a_1X^1,\dotsc ,a_dX^d)$ , $a \in \mathbb {R}^d$ , but now with $\delta _a\colon \hat {e}_w \mapsto a^w \hat {e}_1^{|w|}$ with $a^w = a_1^{n_1} \cdots a_d^{n_d}$ , where $n_i$ denotes the multiplicity of the letter $i \in \{1,\dots , d\}$ in the word w.

We next consider linear combinations, $\hat {\mathbf {X}} = a X + b \langle X \rangle $ , for general pairs $a,b \in \mathbb {R}$ , having already dealt with $b=0$ . The special case $b = - a^2/2$ , by scaling there is no loss in generality to take $(a,b) = (1,-1/2)$ , yields a (at least formally) familiar exponential martingale identity.

Example 3.7. Let $ X$ be an $\mathbb {R}^d$ -valued martingale in $\mathscr H^{\infty -}$ , and define

$$\begin{align*}\hat{\mathbf{X}}_t:=\sum_{i=1}^dX^i_t\hat{e}_i-\frac12\sum_{1\le i\le j\le d}\langle X^i,X^j\rangle_t\hat{e}_{ij}. \end{align*}$$

In this case, we have trivial symmetric cumulants $\hat {\boldsymbol {\kappa }}_t(T)=0$ for all $0\le t\le T$ . Indeed, Itô’s formula shows that $t\mapsto \exp (\hat {\mathbf {X}}_t)$ is an $\mathcal {S}_1$ -valued martingale, so that

$$\begin{align*}\hat{\boldsymbol{\mu}}_t(T)=\mathbb{E}_t\exp(\hat{\mathbf{X}}_T-\hat{\mathbf{X}}_t)=\exp(-\hat{\mathbf{X}}_t)\mathbb{E}_t\exp(\hat{\mathbf{X}}_T)=1. \end{align*}$$

While the symmetric cumulants of the last example carries no information, it suffices to work with

$$ \begin{align*}\hat{\mathbf{X}} = \sum_{i=1}^d a^i X^{i}\hat{e}_i + \sum_{1\le i\le j\le d} b_{jk} \langle X^j,X^k \rangle \hat{e}_{ij}, \end{align*} $$

in which case $\hat {\boldsymbol {\mu }} = \hat {\boldsymbol {\mu }} (a,b), \hat {\boldsymbol {\kappa }} = \hat {\boldsymbol {\kappa }}(a,b)$ contains full information of the joint moments of X and its quadratic variation process. A recursion of these was constructed as a diamond expansion in [Reference Friz, Gatheral and Radoičić23].

4 Main results

4.1 Functional equation for signature cumulants

Let $\mathbf {X}\in \mathscr {S}(\mathcal {T}_0)$ defined on a filtered probability space $(\Omega , \mathcal {F}, (\mathcal {F}_t)_{0\le t\le T<\infty },\mathbb {P})$ satisfying the usual conditions. For all $\mathbf {x}\in \mathcal {T}_0$ (or $\mathcal {T}_0^N$ ), define the following operators, with Bernoulli numbers $(B_k)_{k\ge 0} = (1, -\frac {1}{2}, \frac {1}{6}\dotsc )$ ,

(4.1) $$ \begin{align} \begin{aligned} G(\operatorname{\mathrm{ad}}{\mathbf{x}}) = \sum_{k = 0}^{\infty} \frac{(\operatorname{\mathrm{ad}}{\mathbf{x}})^k}{(k + 1) !}, \quad &Q(\operatorname{\mathrm{ad}}{\mathbf{x}}) = \sum_{m, n = 0}^{\infty}2\frac{(\operatorname{\mathrm{ad}}{\mathbf{x}})^n\odot (\operatorname{\mathrm{ad}}{\mathbf{x}})^m}{(n + 1) ! (m) ! (n + m + 2)},\\ H(\operatorname{\mathrm{ad}}{\mathbf{x}}) &:= \sum_{k=0}^{\infty}\frac{B_k}{k!}(\operatorname{\mathrm{ad}}{\mathbf{x}})^{k}, \end{aligned} \end{align} $$

noting $G(z) = (\exp (z)-1)/z$ , $H(z) = G^{-1}(z) = z/(\exp (z)-1)$ . Our main result is the following.

Theorem 4.1. Let $\mathbf {X} \in \mathscr {H}^{\infty -}(\mathcal {T}_0)$ ; then the signature cumulant $\boldsymbol {\kappa } =\boldsymbol {\kappa } (T) = (\log \mathbb {E}_t(\mathrm {Sig}(\mathbf {X})_{t,T}))_{0 \le t \le T}$ is the unique solution (up to indistinguishably) of the following functional equation: for all $0 \le t \le T$

(4.2)

Equivalently, $\boldsymbol {\kappa } =\boldsymbol {\kappa } (T) $ is the unique solution to

(4.3)

Furthermore, if $\mathbf {X}\in \mathscr {H}^{1,N}$ for some $N\in {\mathbb {N}_{\ge 1}}$ , then the identities in equations (4.2) and (4.3) still hold true for the truncated signature cumulant $\boldsymbol {\kappa } := (\log \mathbb {E}_t(\mathrm {Sig}(\mathbf {X}^{(0,N)})_{t,T}))_{0\le t \le T}$ .

Proof. We postpone the proof of the fact that $\boldsymbol {\kappa }$ satisfies equations (4.2) and (4.3) to section Section 7.2. The uniqueness part of the statement can be easily seen as follows. Regarding equation (4.2), we first note that it holds

$$ \begin{align*} \mathbb{E}_t \left\{ \int_{(t,T]} G(\operatorname{\mathrm{ad}} \boldsymbol{\kappa}_{u-})(\mathrm{d} \boldsymbol{\kappa}_u)\right\} = \mathbb{E}_t \left\{ \int_{(t,T]} (G(\operatorname{\mathrm{ad}} \boldsymbol{\kappa}_{u-})-\mathrm{Id})(\mathrm{d} \boldsymbol{\kappa}_u) \right\} - \boldsymbol{\kappa}_t, \quad 0 \le t \le T, \end{align*} $$

where we have used that $\boldsymbol {\kappa }_T \equiv 0$ (and the fact that the conditional expectation is well defined, which is shown in the first part of the proof). Hence, after subtracting the identity from G, we can bring $\boldsymbol {\kappa }_t$ to the left-hand side in equation (4.2). This identity is an equality of tensor series in $\mathcal {T}_0$ and can be projected to yield an equality for each tensor level of the series. As presented in more detail in the following subsection, we see that projecting the latter equation to tensor level, say $n\in {\mathbb {N}_{\ge 1}}$ , the right-hand side only depends on $\boldsymbol {\kappa }^{(k)}$ for $k < n$ , hence giving an explicit representation $\boldsymbol {\kappa }^{(n)}$ in terms of $\mathbf {X}$ and strictly lower tensor levels of $\boldsymbol {\kappa }$ . Therefore equation (4.2) characterizes $\boldsymbol {\kappa }$ up to a modification and then due to right-continuity up to indistinguishably. The same argument applies to equation (4.3), referring to the following subsections for details on the recursion.

Diamond formulation: The functional equations given in Theorem 4.1 above can be phrased in terms of the diamond product between $\mathcal {T}_0$ -valued semimartingales. Writing $\mathbf {J}_t(T) = \sum _{t < u \le T} (\dots )$ for the last (jump) sum in equation (4.2), this equation can be written, thanks to Lemma 2.2, which applies just the same with outer diamonds,

and a similar form may be given for equation (4.3). While one may or may not prefer this equation to equation (4.2), diamonds become very natural in $d=1$ (or upon projection to the symmetric algebra; see also Section 5.2). In this case, $G = \mathrm {Id}$ , $Q = \mathrm {Id} \odot \mathrm {Id}$ ; and with identities of the form

some simple rearrangement, using bilinearity of the diamond product, gives

(4.4) $$ \begin{align} \boldsymbol{\kappa}_t (T) = \mathbb{E}_t \{ \mathbf{X}_{t,T} \} + \frac{1}{2} ((\mathbf{X} + \boldsymbol{\kappa}) \diamond (\mathbf{X} + \boldsymbol{\kappa}))_t(T) + \mathbb{E}_t \{ \mathbf{J}_t(T) \}. \end{align} $$

If we further impose martingality and continuity, we arrive at

$$ \begin{align*}\boldsymbol{\kappa}_t (T) = \frac{1}{2} ((\mathbf{X} + \boldsymbol{\kappa}) \diamond (\mathbf{X} + \boldsymbol{\kappa}))_t(T). \end{align*} $$

4.2 Recursive formulas for signature cumulants

Theorem 4.1 allows for an iterative computation of signature cumulants, trivially started from

$$ \begin{align*} \boldsymbol{\kappa}^{(1)}_t & = \boldsymbol{\mu}^{(1)}_t = \mathbb{E}_t\left(\mathbf{X}^{(1)}_{t, T}\right). \end{align*} $$

The second signature cumulant, obtained from Theorem 4.1 or first principles, reads

$$ \begin{align*} \boldsymbol{\kappa}^{(2)}_t & = \mathbb{E}_t \bigg\{\mathbf{X}^{(2)}_{t,T} + \frac{1}{2}\left\langle \mathbf{X}^{(1)c} \right\rangle_{t,T} +\frac12 \int_{(t,T]}\left[ \boldsymbol{\kappa}^{(1)}_{u-}, \mathrm{d}\boldsymbol{\kappa}^{(1)}_u \right] + \frac{1}{2}\left\langle \boldsymbol{\kappa}^{(1)c} \right\rangle_{t,T} + \left\langle \mathbf{X}^{(1)c}, \boldsymbol{\kappa}^{(1)c} \right\rangle_{t,T} \\ &\hspace{3em}+ \sum_{t<u\le T}\bigg(\frac{1}{2}\left(\Delta \mathbf{X}^{(1)}_u\right)^{2} + \Delta \mathbf{X}^{(1)}_u\Delta\boldsymbol{\kappa}^{(1)}_u + \frac{1}{2}\left(\Delta\boldsymbol{\kappa}^{(1)}_u\right)^{2} \bigg)\bigg\}. \end{align*} $$

For instance, consider the special case with vanishing higher-order components, $\mathbf {X}^{(i)} \equiv 0$ , for $i \ne 1$ , and $\mathbf {X} = \mathbf {X}^{(1)} \equiv M$ , a d-dimensional continuous square-integrable martingale. In this case, $\boldsymbol {\kappa }^{(1)} = \boldsymbol {\mu }^{(1)} \equiv 0$ , and from the very definition of the logarithm relating $\boldsymbol {\kappa }$ and $\boldsymbol {\mu }$ , we have $\boldsymbol {\kappa }^{(2)} = \boldsymbol {\mu }^{(2)} - \frac 12 \boldsymbol {\mu }^{(1)} \boldsymbol {\mu }^{(1)} = \boldsymbol {\mu }^{(2)}$ . It then follows from Stratonovich-Ito correction that

$$ \begin{align*}\boldsymbol{\kappa}^{(2)}_t = \mathbb{E}_t \int_t^T (M_u - M_t) \circ \mathrm{d} M_u = \frac12 \mathbb{E}_t \left\langle M \right\rangle_{t,T} = \frac12 \mathbb{E}_t \left\langle \mathbf{X}^{(1)} \right\rangle_{t,T}, \end{align*} $$

which is indeed a (very) special case of the general expression for $\boldsymbol {\kappa }^{(2)}$ . We now treat general higher-order signature cumulants.

Corollary 4.2. Let $\mathbf {X}\in \mathscr {H}^{1,N}$ for some $N\in \mathbb {N}_{\ge 1}$ ; then we have

$$ \begin{align*} \boldsymbol{\kappa}^{(1)}_t & = \mathbb{E}_t\left(\mathbf{X}^{(1)}_{t, T}\right), \end{align*} $$

for all $0 \le t \le T$ and for $n \in \{2, \dotsc , N\}$ , we have recursively (the right-hand side only depends on $\boldsymbol {\kappa }^{(j)},j<n$ )

(4.5) $$ \begin{align} \boldsymbol{\kappa}^{(n)}_t &= \mathbb{E}_t\left(\mathbf{X}^{(n)}_{t,T }\right) + \frac12\sum_{k = 1}^{n-1}\mathbb{E}_t\left( \left\langle \mathbf{X}^{(k)c}, \mathbf{X}^{(n-k)c} \right\rangle_{t,T}\right) \nonumber\\ & \quad +\sum_{|\ell|\ge 2, \; \|\ell\|=n}\mathbb{E}_t\Big(\mathrm{Mag}(\boldsymbol{\kappa}; \ell)_{t,T} + \mathrm{Qua}(\boldsymbol{\kappa}; \ell)_{t,T} + \mathrm{Cov}(\mathbf{X}, \boldsymbol{\kappa}; \ell)_{t,T} + \mathrm{Jmp}(\mathbf{X},\boldsymbol{\kappa}; \ell)_{t,T}\Big) \end{align} $$

with $\ell = (l_1, \dotsc , l_k)$ , $l_i \in {\mathbb {N}_{\ge 1}}$ , $|\ell |:= k\in {\mathbb {N}_{\ge 1}}$ , $\|\ell \|:= l_1 + \dotsb + l_k$ and

Proof. As in the proof of Theorem 4.1 above, in equation (4.2), we can separate the identity from G and bring the resulting $\boldsymbol {\kappa }_t$ to the left-hand side. Projecting to the tensor level n, and using that the projection can be interchanged with taking the expectation, we obtain the following equation

$$ \begin{align*} \boldsymbol{\kappa}^{(n)}_t = \mathbb{E}_t\left\{\mathbf{X}^{(n)}_{t,T} + \frac{1}{2} \left\langle \mathbf{X}^{c} \right\rangle^{(n)}_{t,T} + \mathbf{Y}^{(n)}_{t,T} + \mathbf{V}^{(n)}_{t,T} + \mathbf{C}^{(n)}_{t,T} + \mathbf{J}^{(n)}_{t,T}\right\}, \end{align*} $$

for all $0 \le t \le T$ , where $\mathbf {Y}\in \mathscr {S}(\mathcal {T}_0^{N})$ and $\mathbf {V}, \mathbf {C}, \mathbf {J} \in \mathscr {V}(\mathcal {T}_0^{N})$ are defined in equation (7.9). Note that we can take the conditional expectation of each term separately, as $\mathbf {X}^{(n)}$ has sufficient integrability by the assumption, and the integrability of the remaining terms is shown in the Proof of Theorem 4.1 below, more precisely in equation (7.27). The recursion then follows by spelling out the explicit composition for each of the terms appearing in the above equation. For the quadratic variation term, we may easily verify from the bilinearity of the covariation bracket that it holds

$$ \begin{align*} \left\langle \mathbf{X}^{c} \right\rangle^{(n)}_{t,T} = \pi_n\sum_{k,j = 1}^{n}\left\langle \mathbf{X}^{(k)c}, \mathbf{X}^{(j)c} \right\rangle_{t,T} = \sum_{k=1}^{n}\left\langle \mathbf{X}^{(k)c}, \mathbf{X}^{(n-k)c} \right\rangle_{t,T}. \end{align*} $$

We can also take the conditional expectation of each term in the above right-hand side separately, as the integrability of these terms follows from the assumptions on $\mathbf {X}$ and Lemma 7.1. The composition of the terms $\mathbf {Y}^{(n)}, \mathbf {V}^{(n)}$ and $\mathbf {C}^{(n)}$ follows from the explicit form of the stochastic Itô integral of a power series of adjoint operations with respect to a tensor valued semimartingale in Section 2.4, more specifically in equation (2.10). We will demonstrate this for the term $\mathbf {Y}^{(n)}$ in more detail:

for all $0 \le t \le T$ . The composition of the terms $\mathbf {V}^{(n)}$ and $\mathbf {C}^{(n)}$ , respectively, in terms of $\mathrm {Qua}(\mathbf {X}, \boldsymbol {\kappa })$ and $\mathrm {Cov}(\mathbf {X}, \boldsymbol {\kappa })$ follows analogously. It then remains to show that the term $\mathbf {J}^{(n)}$ can be composed in terms of $\mathrm {Jmp}(\mathbf {X}, \boldsymbol {\kappa })$ , which is, however, a simple combinatorial exercise.

We obtain another recursion for the signature cumulants from projecting the functional equation (4.3). Note that, apart from the first two levels, it is far from trivial to see that the following recursion is equivalent to the recursion in Corollary 4.2.

Corollary 4.3. Let $\mathbf {X}\in \mathscr {H}^{1,N}$ for some $N\in \mathbb {N}_{\ge 1}$ ; then we have

(4.6) $$ \begin{align} \boldsymbol{\kappa}^{(n)}_t = \mathbb{E}_t\left(\mathbf{X}^{(n)}_{t,T}\right) + \sum_{ |\ell|\ge 2,\; ||\ell||=n} \mathbb{E}_t\bigg( \mathrm{HMag}^{1}(\mathbf{X}, \boldsymbol{\kappa}; \ell)_{t,T} + \frac{1}{2}\mathrm{HMag}^{2}(\mathbf{X}, \boldsymbol{\kappa}; \ell)_{t,T} + \mathrm{HQua}(\boldsymbol{\kappa}; \ell)_{t,T}\nonumber\\ {} + \mathrm{HCov}(\mathbf{X}, \boldsymbol{\kappa}; \ell)_{t,T}+ \mathrm{HJmp}(\mathbf{X}, \boldsymbol{\kappa}; \ell)_{t,T}\bigg) \end{align} $$

with $\ell = (l_1, \dotsc , l_k)$ , $l_i \ge 1$ , $|\ell |=k$ , $||\ell ||=l_1 + \dotsb + l_k$ and

Proof. The recursion follows from projecting equation (4.3) to each tensor level, analogously to the way that the recursion of Corollary 4.2 follows from equation (4.2) (see the proof of Corollary 4.2).

Diamonds. All recursions here can be rewritten in terms of diamonds. In a first step, by definition, the second term in Corollary 4.2 can be rewritten as

$$\begin{align*}\frac12\sum_{k=1}^n(\mathbf{X}^{(k)}\diamond\mathbf{X}^{(n-k)})_t(T). \end{align*}$$

Thanks to Lemma 2.2, we may also write

Similarly,

Inserting these expressions into Equation (4.6), we may obtain a ‘diamond’ form of the recursions in H form.

When $d=1$ (or in the projection onto the symmetric algebra; see also Section 5.2), the recursions take a particularly simple form, since $\operatorname {\mathrm {ad}}\mathbf {x}\equiv 0$ for all $\mathbf {x}\in \mathcal {T}_0$ , for $d=1$ a commutative algebra. Equation (4.5) then becomes

$$ \begin{align*}\boldsymbol{\kappa}^{(n)}_t (T) = \mathbb{E}_t\left(\mathbf{X}_{t,T}^{(n)}\right)+\frac12\sum_{k=1}^{n-1} ( (\mathbf{X}^{(k)} + \boldsymbol{\kappa}^{(k)} ) \diamond (\mathbf{X}^{(n-k)} + \boldsymbol{\kappa}^{(n-k)}))_t(T) + \mathbb{E}_t\left( \mathbf{J}^{(n)}_t (T) \right), \end{align*} $$

where $\mathbf {J}^{(n)}_t(T) = \sum _{|\ell |\ge 2, \; \|\ell \|=n} \mathrm {Jmp}(X,\boldsymbol {\kappa }; \ell )_{t,T}$ contains the nth tensor component of the jump contribution. The above diamond recursion can also be obtained by projecting the functional relation (4.4) to the nth tensor level. We shall revisit this in a multivariate setting and comment on related works in Section 5.2.

5 Two special cases

5.1 Variations on Hausdorff, Magnus and Baker–Campbell–Hausdorff

We now consider a deterministic driver $\mathbf {X}$ of finite variation. This includes the case when $\mathbf {X}$ is absolutely continuous, in which case we recover, up to a harmless time reversal, $t \leftrightarrow T-t$ , Hausdorff’s ODE and the classical Magnus expansion for the solution to a linear ODE in a Lie group [Reference Hausdorff32, Reference Magnus48, Reference Chen11, Reference Iserles and Nørsett33]. Our extension with regard to discontinuities seems to be new and somewhat unifies Hausdorff’s equation with multivariate Baker–Campbell–Hausdorff integral formulas.

Theorem 5.1. Let $\mathbf {X} \in \mathscr {V} (\mathcal {T}_0)$ , and more specifically $\mathbf {X}\colon [0,T] \to \mathcal {T}_0$ deterministic, càdlàg of bounded variation. The log-signature $\Omega _t=\Omega _{t}(T):= \log (\mathrm {Sig}(\mathbf {X})_{t,T})$ satisfies the integral equation

(5.1) $$ \begin{align} \Omega_{t}(T) &= \int_t^{T}H(\operatorname{\mathrm{ad}}{\Omega_{u-}})(\mathrm{d}\mathbf{X}^c_u) + \sum_{t<u\le T} \int_0^1\Psi(\exp(\operatorname{\mathrm{ad}} \theta \Delta \mathbf{X}_u)\circ\exp(\operatorname{\mathrm{ad}}\Omega_u))(\Delta \mathbf{X}_u)\,\mathrm d\theta, \end{align} $$

with $\Psi (z):= H(\log z)={\log z}/{(z-1)}$ as in the introduction. The sum in equation (5.1) is absolutely convergent, over (at most countably many) jump times of $\mathbf {X}$ , and vanishes when $\mathbf {X} \equiv \mathbf {X}^c$ , in which case equation (1.4) reduces to Hausdorff’s ODE.

(i) The accompanying Jump Magnus expansion becomes $\Omega ^{(1)}_{t}(T) = \mathbf {X}^{(1)}_{t,T}$ followed by

$$ \begin{align*}\Omega^{(n)}_{t}(T) = \mathbf{X}^{(n)}_{t,T} + \sum_{|\ell|\ge 2, \Vert\ell\Vert=n}\left(\mathrm{HMag}^{1}(\mathbf{X}, \Omega; \ell)_{t,T} + \mathrm{HJmp}(\mathbf{X}, \Omega; \ell)_{t,T}\right) \end{align*} $$

where the right-hand side only depends on $\Omega ^{(k)}, k<n$ .

(ii) If $\mathbf {X} \in \mathscr {V} (V)$ for some linear subspace $V \subset \mathcal {T}_0 = T_0\mathopen {(\mkern -3mu(}\mathbb {R}^d\mathclose {)\mkern -3mu)}$ , it follows that, for all $t\in [0,T]$ ,

$$ \begin{align*}\Omega_{t}(T) \in \mathcal{L} := \mathrm{Lie}\mathopen{(\mkern-3mu(} V\mathclose{)\mkern-3mu)} \subset \mathcal{T}_0, \qquad \mathrm{Sig}(\mathbf{X})_{t,T} \in \exp (\mathcal{L}) \subset \mathcal{T}_1, \end{align*} $$

and we say that $\Omega _{t}(T)$ is Lie in V. In case $V=\mathbb {R}^d$ , one speaks of (free) Lie series; see also [Reference Lyons45, Def. 6.2].

Proof. Since we are in a purely deterministic setting, the signature cumulant coincides with the log-signature $\boldsymbol {\kappa }_t(T) = \Omega _{t}(T)$ , and Theorem 4.1 applies without any expectation and angle brackets.

Using $\Delta \Omega _u = \Omega _{u} - \Omega _{u-} = \Omega _u - \log (\mathrm e^{\Delta \mathbf {X}_u}\mathrm e^{\Omega _u})$ , we see that

$$ \begin{align*} \Omega_{t}(T) &= \int_t^{T}H(\operatorname{\mathrm{ad}}{\Omega_{u-}})(\mathrm{d}\mathbf{X}^c_u) - \sum_{t<u\le T} \Delta\Omega_u \\ &= \int_t^{T}H(\operatorname{\mathrm{ad}}{\Omega_{u-}})(\mathrm{d}\mathbf{X}^c_u) - \sum_{t<u\le T} \left( \Omega_u - \operatorname{BCH}(\Delta \mathbf{X}_u,\Omega_u) \right) \\ & = \int_t^{T}H(\operatorname{\mathrm{ad}}{\Omega_{u-}})(\mathrm{d}\mathbf{X}^c_u) + \sum_{t<u\le T} \int_0^1\Psi(\exp(\theta \operatorname{\mathrm{ad}}\Delta \mathbf{X}_u)\circ\exp(\operatorname{\mathrm{ad}}\Omega_u))(\Delta\mathbf{X}_u)\,\mathrm d\theta, \end{align*} $$

where we used the identity

(5.2) $$ \begin{align} \operatorname{BCH}(\mathbf{x}_1,\mathbf{x}_2) \!-\! \mathbf{x}_2 \!=\! \log(\exp(\mathbf{x}_1)\exp(\mathbf{x}_2))\!-\! \mathbf{x}_2 \!=\! \int_0^1\Psi(\exp(\theta\operatorname{\mathrm{ad}}\mathbf{x}_1)\circ\exp(\operatorname{\mathrm{ad}}\mathbf{x}_2))(\mathbf{x}_1)\,\mathrm d\theta. \end{align} $$

Remark 5.2 (Baker–Campbell–Hausdorff)

The identity equation (5.2) is well-known but also easy to obtain en passant, thereby rendering the above proof self-contained. We treat directly the n-fold case. Given $\mathbf {x}_1,\dotsc ,\mathbf {x}_n\in \mathcal {T}_0$ , one defines a continuous piecewise affine linear path $(\mathbf {X}_t: 0 \le t \le n)$ with $\mathbf {X}_i - \mathbf {X}_{i-1} = \mathbf {x}_i$ . Then $\mathrm {Sig} ( \mathbf {X} |_{[i-1,i]})=\mathrm {Sig} ( \mathbf {X})_{i-1,i}=\exp (\mathbf {x}_i)$ and by Lemma 2.5 have

$$\begin{align*}\Omega_{0}=\log\left( \exp(\mathbf{x}_1)\dotsm\exp(\mathbf{x}_n) \right)=:\operatorname{BCH}(\mathbf{x}_1,\dotsc,\mathbf{x}_n). \end{align*}$$

A computation based on Theorem 5.1, but now applied without jumps, reveals the general form

$$ \begin{align*} \operatorname{BCH}(\mathbf{x}_1,\dotsc,\mathbf{x}_n) &= \mathbf{x}_n + \sum_{k=1}^{n-1}\int_0^1\Psi(\exp(\theta\operatorname{\mathrm{ad}}\mathbf{x}_k)\circ\exp(\operatorname{\mathrm{ad}}\mathbf{x}_{k+1})\circ\dotsm\circ\exp(\operatorname{\mathrm{ad}}\mathbf{x}_n))(\mathbf{x}_k)\,\mathrm d\theta \\ &=\sum_i\mathbf{x}_i+\frac12\sum_{i<j}[\mathbf{x}_i,\mathbf{x}_j]+\frac{1}{12}\sum_{i<j}([\mathbf{x}_i,[\mathbf{x}_i,\mathbf{x}_j]]+[\mathbf{x}_j,[\mathbf{x}_j,\mathbf{x}_i]])\\&\quad+\frac16\sum_{i<j<k}([\mathbf{x}_i,[\mathbf{x}_j,\mathbf{x}_k]]-[\mathbf{x}_k,[\mathbf{x}_i,\mathbf{x}_j]])-\frac1{24}\sum_{i<j}[\mathbf{x}_i,[\mathbf{x}_j,[\mathbf{x}_i,\mathbf{x}_j]]]\dotsb \end{align*} $$

The flexibility of our Theorem 5.1 is then nicely illustrated by the fact that this n-fold BCH formula is an immediate consequence of equation (5.1), applied to a piecewise constant càdlàg path $(\mathbf {X}_t: 0 \le t \le n)$ with $\mathbf {X}_\cdot - \mathbf {X}_{i-1} \equiv \mathbf {x}_i$ on $[i-1,i)$ .

5.2 Diamond relations for multivariate cumulants

As in Section 2.3, we write $\mathcal {S}$ for the symmetric algebra over $\mathbb {R}^d$ , and $\mathcal {S}_0,\mathcal {S}_1$ for those elements with scalar component $0,1$ , respectively. Recall the exponential map $\exp : \mathcal {S}_0\to \mathcal {S}_1$ with global defined inverse $\log $ . Following Definition 2.1, the diamond product for $\mathcal {S}_0$ -valued semimartingales $\hat {\mathbf {X}}, \hat {\mathbf {Y}}$ is another $\mathcal {S}_0$ -valued semimartingale given by

$$ \begin{align*}(\hat{\mathbf{X}} \diamond \hat{\mathbf{Y}})_t(T) = \mathbb{E}_t \big( \langle \hat{\mathbf{X}}^c, \hat{\mathbf{Y}}^c \rangle_{t,T} \big) = \sum ( \mathbb{E}_t \langle \hat{\mathbf{X}}^{w_1}, \hat{\mathbf{Y}}^{w_2} \rangle_{t,T}) \hat{e}_{w_1} \hat{e}_{w_2}, \end{align*} $$

with summation over all $w_1,w_2 \in \widehat {\mathcal {W}}_d$ , provided all brackets are integrable. This trivially adapts to $\mathcal {S}^N$ -valued semimartingales, $N\in \mathbb {N}_{\ge 1}$ , in which case all words have length less equal N; the summation is restricted accordingly to $|w_1|+|w_2| \le N$ .

Theorem 5.3. (i) Let $\Xi = (0, \Xi ^{(1)},\Xi ^{(2)},\ldots )$ be an $\mathcal {F}_T$ -measurable random variable with values in $\mathcal {S}_0 (\mathbb {R}^d)$ , componentwise in $\mathcal {L}^{\infty -}$ . Then

$$\begin{align*}\mathbb{K}_t(T) := \log \mathbb{E}_t \exp (\Xi) \end{align*}$$

satisfies the following functional equation, for all $0 \le t \le T$ ,

(5.3) $$ \begin{align} \mathbb{K}_t(T) = \mathbb{E}_t \Xi + \frac{1}{2} (\mathbb{K} \diamond \mathbb{K})_t(T) + \mathbb{J}_t(T) \end{align} $$

with jump component,

$$ \begin{align*} \mathbb{J}_t(T) = \mathbb{E}_t \left( \sum_{t < u \le T} \left( e^{\Delta \mathbb{K}_u} - 1 - \Delta \mathbb{K}_u \right)\right) =\mathbb{E}_t\left( \sum_{t < u \le T} \left( \frac{1}{2!}(\Delta \mathbb{K}_u)^2 + \frac{1}{3!} (\Delta\mathbb{K}_u)^3 + \dotsb \right)\right). \end{align*} $$

Furthermore, if $N\in \mathbb {N}_{\ge 1}$ , and $\Xi =(\Xi ^{(1)},\ldots ,\Xi ^{(N)})$ is $\mathcal {F}_T$ -measurable with graded integrability condition

(5.4) $$ \begin{align} \left\Vert {\Xi^{(n)}} \right\Vert _{\mathcal{L}^{N/n}} < \infty, \qquad n=1,\ldots,N, \end{align} $$

then the identity equation (5.3) holds for the cumulants up to level N: that is, for $\mathbb {K}^{(0,N)} := \log (\mathbb {E}_t\exp (\Xi ^{(0,N)}))$ with values in $\mathcal {S}^{(N)}_0 (\mathbb {R}^d)$ .

Remark 5.4. Identity equation (5.3) is reminiscent to the quadratic form of the generalized Riccati equations for affine jump diffusions. The relation will be presented more explicitly in Remark 6.9 of Section 6.2.2, when the involved processes are assumed to have a Markov structure and the functional signature cumulant equation reduces to a PIDE system. The framework described here, however, requires neither Markov nor affine structure. We will show in Section 6.3 that such computations are also possible in the fully non-commutative setting: that is, to obtain signature cumulants of affine Volterra processes.

Proof. We first observe that since $\Xi \in \mathcal L^{\infty -}$ , by Doob’s maximal inequality and the BDG inequality, we have that $\hat {\mathbf {X}}_t:=\mathbb {E}_t\Xi $ is a martingale in $\mathscr {H}^{\infty -}(\mathcal {S}_0)$ . In particular, thanks to Theorem 3.2, the signature moments are well defined. According to Section 3.2, the signature is then given by

$$\begin{align*}\mathrm{Sig}(\hat{\mathbf{X}})_{t,T}=\exp(\Xi-\mathbb{E}_t\Xi), \end{align*}$$

hence $\hat {\boldsymbol {\kappa }}_t(T)=\mathbb {K}_t(T)-\hat {\mathbf {X}}_t$ .

Projecting equation (4.3) onto the symmetric algebra yields

$$ \begin{align*} \hat{\boldsymbol{\kappa}}_t(T)&=\mathbb{E}_t\Bigg\{\hat{\mathbf{X}}_{t,T}+\frac12\langle\hat{\mathbf{X}}^c\rangle_{t,T}+\frac12\langle\boldsymbol{\kappa}(T)^c\rangle_{t,T}+\langle\hat{\mathbf{X}}^c,\boldsymbol{\kappa}(T)^c\rangle_{t,T}\Bigg.\\ \Bigg.&\quad+\sum_{t<u\le T}\left( e^{\Delta\hat{\mathbf{X}}_u+\Delta\boldsymbol{\kappa}_u(T)}-1-\Delta\hat{\mathbf{X}}_u-\Delta\boldsymbol{\kappa}_u(T) \right)\Bigg\} \\&= \mathbb{E}_t\left\{\Xi+ \frac12\langle\mathbb{K}(T)^c\rangle_{t,T} +\sum_{t<u\le T}\left( e^{\Delta\mathbb{K}_u(T)}-1-\Delta\mathbb{K}_u(T) \right) \right\}-\hat{\mathbf{X}}_t, \end{align*} $$

and equation (5.3) follows upon recalling that $(\mathbb {K}\diamond \mathbb {K})_t(T)=\mathbb {E}_t\langle \mathbb {K}(T)^c\rangle _{t,T}$ . The proof of the truncated version is left to the reader.

As a corollary, we provide a general view on recent results of [Reference Alos, Gatheral and Radoičić3, Reference Lacoin, Rhodes and Vargas42, Reference Friz, Gatheral and Radoičić23]. Note that we also include jump terms in our recursion.

Corollary 5.5. The conditional multivariate cumulants $(\mathbb {K}_t)_{0\le t\le T}$ of a random variable $\Xi $ with values in $\mathcal {S}_0(\mathbb {R}^d)$ , componentwise in $\mathcal L^{\infty -}$ satisfy the recursion

(5.5) $$ \begin{align} \mathbb{K}^{(1)}_t = \mathbb{E}_t(\Xi^{(1)}) \quad \text{and} \quad \mathbb{K}^{(n)}_t = \mathbb{E}_t(\Xi^{(n)})+\frac{1}{2}\sum_{k=1}^{n}\left( \mathbb{K}^{(k)} \diamond \mathbb{K}^{(n-k)}\right)_t(T)+\mathbb J^{(n)}_t(T) \quad \text{ for } \quad n \ge 2, \end{align} $$

with

$$\begin{align*}\mathbb J^{(n)}_t(T)=\mathbb{E}_t\left( \sum_{t<u\le T}\sum_{k=2}^n\frac{1}{k!}\sum_{\|\ell\|=n,|\ell|=k}\Delta\mathbb{K}^{(\ell_1)}_u(T)\dotsm\Delta\mathbb{K}_u^{(\ell_k)}(T) \right). \end{align*}$$

The analogous statement holds true in the N-truncated setting: that is, as a recursion for $n=1,..,N$ under the condition in equation (5.4).

Example 5.6 (Continuous setting)

In case of an absence of jumps and higher-order information (i.e., $\mathbb {J} \equiv 0,\Xi ^{(2)} = \Xi ^{(3)} = \ldots \equiv 0$ ), this type of cumulant recursion appears in [Reference Lacoin, Rhodes and Vargas42] and under optimal integrability conditions $\Xi ^{(1)}$ with finite N.th moments [Reference Friz, Gatheral and Radoičić23]. (This requires a localization argument that is avoided here by directly working in the correct algebraic structure.)

Example 5.7 (Discrete filtration)

As the opposite of the previous continuous example, we consider a purely discrete situation, starting from a discretely filtered probability space with filtration $(\mathcal {F}_t\colon t = 0,1,\dotsc ,T \in {\mathbb {N}})$ . For $\Xi $ as in Corollary 5.5, a discrete martingale is defined by $\mathbb {E}_t \exp (\Xi )$ , which we may regard as a càdlàg semimartingale with respect to $\mathcal {F}_t := \mathcal {F}_{[t]}$ , and similar for $\mathbb {K}_t(T) = {\log \mathbb {E}_t \exp (\Xi ) \in \mathcal {S}_0}$ : that is, the conditional cumulants of $\Xi $ . Clearly, the continuous martingale part of $\mathbb {K}(T)$ vanishes, as does any diamond product with $\mathbb {K}(T)$ . What remains is the functional equation

$$ \begin{align*}\mathbb{K}_t(T) = \mathbb{E}_t (\Xi) + \mathbb{J}_t(T) = \mathbb{E}_t (\Xi) + \mathbb{E}_t \bigg( \sum_{u=t+1}^T \big( \exp(\Delta \mathbb{K}_u) - 1 - \Delta \mathbb{K}_u \big)\bigg). \end{align*} $$

As before, the resulting expansions are of interest. On the first level, trivially, $\mathbb {K}^{(1)}_t = \mathbb {E}_t(\Xi ^{(1)})$ , whereas on the second level we see

$$ \begin{align*}\mathbb{K}_t^{(2)}(T) = \mathbb{E}_t(\Xi^{(2)})+ \mathbb{E}_t \bigg( \sum_{u=t+1}^T (\mathbb{E}_u (\Xi^{(1)})-\mathbb{E}_{u-1} (\Xi^{(1)}))^2 \bigg), \end{align*} $$

which one can recognize, in case $\Xi ^{(2)} = 0$ as an energy identity for the discrete square-integrable martingale $\ell _u := \mathbb {E}_u \Xi ^{(1)}$ . Going further in the recursion yields increasingly non-obvious relations. Taking $\Xi ^{(2)} = \Xi ^{(3)} = \ldots \equiv 0$ for notational simplicity gives

$$ \begin{align*}\mathbb{K}_t^{(3)}(T) = \mathbb{E}_t \left( \sum_{u = t + 1}^T (\ell_u - \ell_{u - 1})^3 + 3 (\ell_u - \ell_{u - 1}) \{ \mathbb{E}_u \kappa (\ell, \ell)_{u, T} -\mathbb{E}_{u - 1} \kappa (\ell, \ell)_{u - 1, T} \} \right). \end{align*} $$

It is interesting to note that related identities have appeared in the statistics literature under the name Bartlett identities; see also Mykland [Reference Mykland53] and the references therein.

5.3 Remark on tree representation

As illustrated in the previous section, in the case where $d=1$ , or when projecting onto the symmetric algebra, our functional equation takes a particularly simple form (see Theorem 5.3). If one further specializes the situation, in particular discards all jumps, we are from an algebraic perspective in the setting of Friz, Gatheral and Radoiçić [Reference Friz, Gatheral and Radoičić23], which give a tree series expansion of cumulants using binary trees. This representation follows from the fact that the diamond product of semimartingales is commutative but not associative. As an example (with notations taken from Section 5.2), in case of a one-dimensional continuous martingale, the first terms are

This expansion is organized (graded) in terms of the number of leaves in each tree, and each leaf represents the underlying martingale.

In the deterministic case, tree expansions are also known for the Magnus expansion [Reference Iserles and Nørsett33] and the BCH formula [Reference Casas and Murua9]. These expansions are also, in terms of binary trees, different from the ones above as they are required to be non-planar to account for the non-commutativity of the Lie algebra. As an example (with the notations of Section 5.1), we have

In this expansion, the nodes represent the underlying vector field and edges represent integration and application of the Lie bracket, coming from the $\operatorname {\mathrm {ad}}$ operator.

Since our functional equation and the associated recursion puts both contexts into a single common framework. We suspect that our general recursion, Corollary 4.2 and thereafter, allows for a sophisticated tree representation, at least in absence of jumps, and propose to return to this question in future work.

6 Applications

6.1 Brownian and stopped Brownian signature cumulants

6.1.1 Time dependent Brownian motion

Let B be a m-dimensional standard Brownian motion defined on a portability space $(\Omega , \mathcal {F}, \mathbb {P})$ with the canonical filtration $(\mathcal {F}_t)_{t\ge 0}$ , and define the continuous (Gaussian) martingale $X = (X_t)_{0\le t\le T}$ by

$$ \begin{align*} X_t = \int_0^{t} \sigma(u)\,\mathrm{d} B_u, \quad 0 \le t \le T, \end{align*} $$

with $\sigma \in L^2 ([0, T],\mathbb {R}^{m\times d})$ . The quadratic variation of X is finite and deterministic, and therefore we immediately see that the integrability condition $\mathbf {X} = (0, X, 0, \dots )\in \mathscr {H}^{\infty -}$ is trivially satisfied, and thus Theorem 4.1 applies. The Brownian signature cumulants $\boldsymbol {\kappa }_t(T) = \log (\mathbb {E}_t(\mathrm {Sig}(\mathbf {X})_{t,T}))$ satisfies the functional equation, with $\mathbf {a}(t) := \sigma (t)\sigma (t)^T \in \mathrm {Sym}({\mathbb {R}^d} \otimes {\mathbb {R}^d}),$

(6.1) $$ \begin{align} \boldsymbol{\kappa}_t(T) = \frac{1}{2}\int_t^{T} H(\operatorname{\mathrm{ad}}{\boldsymbol{\kappa}_u(T)})(\mathbf{a}(u)) \mathrm{d} u, \quad 0 \le t \le T. \end{align} $$

Therefore the tensor levels are precisely given by the Magnus expansion, starting with

$$ \begin{align*}\boldsymbol{\kappa}^{(1)}_t(T) = 0,\quad \boldsymbol{\kappa}^{(2)}_t(T) = \frac{1}{2}\int_t^T \mathbf{a}(u) \mathrm{d} u,\end{align*} $$

and the general term

$$ \begin{align*} \boldsymbol{\kappa}^{(2n-1)}_t(T) \equiv 0, \quad \boldsymbol{\kappa}^{(2n)}_t(T) &= \frac{1}{2} \sum_{|\ell|\ge2, \Vert\ell\Vert=2n} \mathrm{HMag}^{2}(\mathbf{X}, \boldsymbol{\kappa}; \ell)_{t,T} \\ &= \frac{1}{2}\sum_{\Vert\ell\Vert=n-1} \frac{B_{k}}{k!} \int_t^{T} \operatorname{\mathrm{ad}}{\boldsymbol{\kappa}^{(2\cdot l_1)}_{u}} \cdots \operatorname{\mathrm{ad}}{\boldsymbol{\kappa}^{(2\cdot l_{k})}_{u}} \left(\mathbf{a}(u)\right)\mathrm{d} u. \end{align*} $$

Note that $\boldsymbol {\kappa }_t(T)$ is Lie in $\mathrm {Sym}({\mathbb {R}^d} \otimes {\mathbb {R}^d}) \subset \mathcal {T}_0$ , but, in general, not a Lie series. In the special case $X=B$ , that is, $m=d$ and identity matrix $\sigma = \mathbf {I}_d= \sum _{i=1}^{d}e_{ii}\in \mathrm {Sym}({\mathbb {R}^d}\otimes {\mathbb {R}^d})$ , all commutators vanish, and we obtain what is known as Fawcett’s formula [Reference Fawcett21, Reference Friz and Hairer24]:

$$ \begin{align*} \boldsymbol{\kappa}_t(T) = \tfrac{1}{2} (T-t)\mathbf{I}_d \,. \end{align*} $$

Example 6.1. Consider $B^1,B^2$ two Brownian motions on the filtered space $(\Omega ,\mathcal F,\mathbb P)$ , with correlation $\mathrm {d}\langle B^1,B^2\rangle _t=\rho \,\mathrm {d} t$ for some fixed constant $\rho \in [-1,1]$ . Suppose that $K^1,K^2\colon [0,\infty )^2\to \mathbb {R}$ are two kernels such that $K^i(t,\cdot )\in L^2([0,t])$ for all $t\in [0,T]$ , and set

$$\begin{align*}X^i_t:= X_0^i+\int_0^tK^i(t,s)\,\mathrm{d} B^i_s,\quad i=1,2 \end{align*}$$

for some fixed initial values $X^1_0,X^2_0$ . Note that neither process is a semimartingale in general. However, for each $T>0$ , the process $\xi ^i_t(T):=\mathbb {E}_t[X^i_T]$ is a martingale, and we have

$$\begin{align*}\xi^i_t(T)=X^i_0+\int_0^tK^i(T,s)\,\mathrm{d} B^i_s, \end{align*}$$

that is, $(\xi ^1,\xi ^2)$ is a time-dependent Brownian motion as defined above. In particular, one sees that

$$\begin{align*}\mathbf{a}(t)=\begin{pmatrix}\int_0^tK^1(T,u)^2\,\mathrm{d} u&\rho\int_0^tK^1(T,u)K^2(T,u)\,\mathrm{d} u\\\rho\int_0^tK^1(T,u)K^2(T,u)\,\mathrm{d} u&\int_0^tK^2(T,u)^2\,\mathrm{d} u\end{pmatrix}. \end{align*}$$

Equation (6.1) and the paragraph below it then give an explicit recursive formula for the signature cumulants, the first of which are given by

$$ \begin{align*} \boldsymbol{\kappa}_t^{(1)}(T)&= 0,\\ \boldsymbol{\kappa}_t^{(2)}(T)&= \frac12\begin{pmatrix}\int_t^T\int_0^uK^1(T,r)^2\,\mathrm{d} r\mathrm{d} u&\rho\int_t^T\int_0^uK^1(T,r)K^2(T,r)\,\mathrm{d} r\mathrm{d} u\\[1ex]\rho\int_t^T\int_0^uK^1(T,r)K^2(T,r)\,\mathrm{d} r\mathrm{d} u&\int_t^T\int_0^uK^2(T,r)^2\,\mathrm{d} r\mathrm{d} u\end{pmatrix},\\ \boldsymbol{\kappa}_t^{(3)}(T)&= 0,\\ \boldsymbol{\kappa}_t^{(4)}(T)&= \frac1{2}\sum_{i,j,i',j'=1}^2\left[\int_t^T\int_u^T\left( \mathbf{a}^{ij}(u)\mathbf{a}^{i'j'}(r)-\mathbf{a}^{i'j'}(u)\mathbf{a}^{ij}(r)\right)\,\mathrm{d} r\mathrm{d} u\right]e_{iji'j'}. \end{align*} $$

We notice that in the particular case when $K^1=K^2\equiv K$ , the matrix $\mathbf {a}$ has the form

$$\begin{align*}\mathbf{a}(t)=\int_0^tK(T,u)^2\,\mathrm{d} u\times\begin{pmatrix}1&\rho\\\rho&1\end{pmatrix}. \end{align*}$$

Therefore, we have $ \mathbf {a}(t)\otimes \mathbf {a}(t')- \mathbf {a}(t')\otimes \mathbf {a}(t)=0$ for any $t,t'\in [0,T]$ . Hence, in this case, our recursion shows that for any $\rho \in [-1,1]$ ,

$$\begin{align*}\boldsymbol{\kappa}_t^{(1)}(T)=0,\quad\boldsymbol{\kappa}_t^{(2)}(T)=\frac12\int_t^T\int_0^uK(T,r)^2\,\mathrm{d} r\,\mathrm{d} u\times\begin{pmatrix}1&\rho\\\rho&1\end{pmatrix}, \end{align*}$$

and $\boldsymbol {\kappa }_t^{(n)}(T)=0$ for all $0\le t\le T$ and $n\ge 3$ .

6.1.2 Brownian motion up to the first exit time from a domain

Let $B=(B_t)_{t\ge 0}$ be a d-dimensional Brownian motion defined on a probability space $(\Omega , \mathcal {F}, \mathbb {P})$ with a possibly random starting value $B_0$ . Assume also that there is a family of probability measures $\{\mathbb {P}^x\}_{x\in {\mathbb {R}^d}}$ on $(\Omega , \mathcal {F})$ such that $\mathbb {P}^x(B_0 = x) = 1$ , and denote by $\mathbb {E}^x$ the expectation with respect to $\mathbb {P}^x$ . We define the canonical Brownian filtrationFootnote 8 by $(\mathcal {F}_t)_{t\ge 0} = (\mathcal {F}^{B}_t)_{t\ge 0}$ . Further let $\Gamma \subset {\mathbb {R}^d}$ be a bounded domain, and define the stopping time $\tau _\Gamma $ of the first exit of B from the domain $\Gamma $ : that is,

$$ \begin{align*} {\tau_\Gamma} = \inf\{t\ge0 \;\vert\; B_t \in \Gamma^{c}\}. \end{align*} $$

In [Reference Lyons and Ni46], Lyons–Ni exhibit an infinite system of partial differential equations for the expected signature of the Brownian motion until the exit time as a functional of the starting point. The following result can be seen as the corresponding result for the signature cumulant, which follows directly from the expansion in Theorem 1.1. Recall that a boundary point $x \in \partial \Gamma $ is called regular if and only if

(6.2) $$ \begin{align} \mathbb{P}^x\big( \inf\{t> 0 \;\vert\; B_t \in \Gamma^{c}\} = 0\big) = 1. \end{align} $$

The domain $\Gamma $ is called regular if all points on the boundary are regular. For example, domains with smooth boundary are regular; and see [Reference Karatzas and Shreve38, Section 4.2.C] for a further characterization of regularity.

Corollary 6.2. Let $\Gamma \subset {\mathbb {R}^d}$ be a regular domain, such that

(6.3) $$ \begin{align} \sup_{x\in\Gamma}\mathbb{E}^x(\tau_\Gamma^n)<\infty, \quad n\in{\mathbb{N}_{\ge1}}. \end{align} $$

The signature cumulant $\boldsymbol {\kappa }_t = \log (\mathbb {E}(\mathrm {Sig}(B)_{t\wedge {\tau _\Gamma }, {\tau _\Gamma }}))$ of the Brownian motion B up to the first exit from the domain $\Gamma $ has the following form

$$ \begin{align*} \boldsymbol{\kappa}_t = \mathbf{1}_{\{t<\tau_\Gamma\}} \mathbf{F}(B_t), \quad t\ge0, \end{align*} $$

where $\mathbf {F} = \sum _{|w|\ge 2} e_w F^{w}$ with $F^{w}\in C^{0}(\overline {\Gamma },\mathbb {R})\cap C^{2}(\Gamma ,\mathbb {R})$ is the unique bounded classical solution to the elliptic PDE

(6.4) $$ \begin{align} -\Delta \mathbf{F} (x) &= \sum_{i=1}^{d}H(\operatorname{\mathrm{ad}}{\mathbf{F} (x)})\Big(e_{ii} + Q(\operatorname{\mathrm{ad}}{\mathbf{F} (x)})(\partial_i \mathbf{F} (x)^{\otimes 2}) + 2e_i G(\operatorname{\mathrm{ad}}{\mathbf{F} (x)})(\partial_i \mathbf{F} (x)) \Big), \end{align} $$

for all $x\in \Gamma $ with the boundary condition $\mathbf {F}\vert _{\partial \Gamma } \equiv 0$ .

Proof. Define the martingale $\mathbf {X} = ((0, B_{t\wedge {\tau _\Gamma }}, 0, \dotsc ) )_{t\ge 0}\in \mathscr {S}(\mathcal {T}_0)$ , and note that . It then follows from the integrability of ${\tau _\Gamma }$ that $\mathbf {X} \in \mathscr {H}^{\infty -}(\mathcal {T}_0)$ and thus by Theorem 3.2 that $(\mathrm {Sig}(\mathbf {X})_{0,t})_{t\ge 0} \in \mathscr {H}(\mathcal {T}_1)^{\infty -}$ . This implies that the signature cumulant $\boldsymbol {\kappa }_t(T):= \log (\mathbb {E}_t(\mathrm {Sig}(\mathbf {X})_{t,T}))$ is well defined for all $0 \le t \le T < \infty $ , and furthermore under (component-wise) application of the dominated convergence theorem that it holds

$$ \begin{align*} \boldsymbol{\kappa}_t = \lim_{T\to\infty} \boldsymbol{\kappa}_t(T) = \lim_{T\to\infty}\log(\mathbb{E}_t(\mathrm{Sig}(\mathbf{X})_{t,T})) = \log(\mathbb{E}_t (\mathrm{Sig}(B)_{t\wedge {\tau_\Gamma}, {\tau_\Gamma}})), \quad t\ge 0. \end{align*} $$

Again by $\mathbf {X} \in \mathscr {H}^{\infty -}(\mathcal {T}_0)$ , it follows that Theorem 1.1 applies to the martingale $(\mathbf {X}_t)_{0\le t \le T}$ for any $T>0$ and therefore $\boldsymbol {\kappa }(T)$ satisfies the functional equation (4.3). It follows from the Itô’s representation theorem [Reference Revuz and Yor59, Theorem 3.4] that all local martingales with respect to the Brownian filtration $(\mathcal {F}_t)_{0 \le t \le T}$ are continuous, and therefore it is easy to see that also $\boldsymbol {\kappa }(T)\in \mathscr {S}^c(\mathcal {T}_0)$ . Therefore equation (4.3) simplifies to the following equation

(6.5)

where we have already used the martingality of $\mathbf {X}$ and the explicit form of the quadratic variation $\left \langle \mathbf {X} \right \rangle _t = \mathbf {I}_d(t \wedge {\tau _\Gamma })$ with $\mathbf {I}_d = \sum _{i=1}^{d}e_{ii} \in ({\mathbb {R}^d})^{\otimes 2}$ . It follows that $\boldsymbol {\kappa }^{(1)} \equiv \boldsymbol {\kappa }(T)^{(1)} \equiv 0$ , and for the second level, we have from the integrability of ${\tau _\Gamma }$ and the strong Markov property of Brownian motion that

$$ \begin{align*} \boldsymbol{\kappa}_t^{(2)} =\frac{1}{2}\mathbf{I}_d \lim_{T\to\infty}\mathbb{E}_t\left( \mathbf{1}_{\{t<\tau_\Gamma\}}({\tau_\Gamma}\wedge T -t) \right)= \frac{1}{2}\mathbf{I}_d \mathbf{1}_{\{t<\tau_\Gamma\}}\left.\mathbb{E}^{x}({\tau_\Gamma})\right\vert_{x=B_t}, \quad t\ge0. \end{align*} $$

Now note that the function $u(x) := \mathbb {E}^x({\tau _\Gamma })$ for $x \in \Gamma $ is in $C^{0}(\overline {\Gamma }, \mathbb {R})\cap C^{2}(\Gamma , \mathbb {R})$ and solves the Poisson equation $ -(1/2)\Delta u = g$ with boundary condition $u\vert _{\partial \Gamma } = 0$ and data $g \equiv 1$ . Indeed, since $\Gamma $ is regular and g is bounded and differentiable, this follows from Theorem 9.3.3 (and the remark thereafter) in [Reference Øksendal55]. Moreover, from the assumption in equation (6.3), we immediately see that u is bounded on $\overline {\Gamma }$ , and it follows from Theorem 9.3.2 in [Reference Øksendal55] that u is the unique bounded classical such solution. Thus we have shown that the statement holds true up to the second tensor level with $\mathbf {F}^{(1)}\equiv 0$ and $\mathbf {F}^{(2)}(u) =\frac {1}{2} \mathbf {I}_d u$ under the usual notation $\mathbf {F}^{(n)} = \sum _{|w|=n}e_wF^{w}$ .

Now assume that the statement of the corollary holds true up to the tensor level $(N-1)$ for some $N \ge 3$ . Then, for any $n, k < N$ , we have by applying Itô’s formula

and

Further define the function $\mathbf {G}^{(N)}$ by the projection under $\pi _N$ of the right-hand side of equation (6.4) multiplied by the factor $1/2$ . Then applying Theorem 4.1 to $\mathbf {X}^{(0,N)}$ on the probability space $(\Omega , \mathcal {F}, (\mathcal {F}_t),\mathbb {P}^{x})$ , we see that it follows from the estimate in equation (7.30) that there exists a constant $c>0$ such that

$$ \begin{align*} \sup_{x\in\Gamma}\mathbb{E}^x \left\{ \int_0^{\tau_\Gamma} \big\vert \mathbf{G}^{(N)}(B_u) \big\vert\,\mathrm{d} u\right\} \le c \sup_{x\in\Gamma} \vert\mkern-2.4mu\vert\mkern-2.5mu\vert {\mathbf{X}^{(0,N)}} \vert\mkern-2.4mu\vert\mkern-2.5mu\vert _{\mathscr{H}^{1,N}(\mathbb{P}^{x})} = c \sup_{x\in\Gamma} \mathbb{E}_x(\tau_\Gamma^{N}) <\infty. \end{align*} $$

Therefore it follows, from projecting equation (6.5) to level N and using the dominated convergence theorem to pass to the $T\to \infty $ limit, that $\boldsymbol {\kappa }^{(N)}$ is of the form

$$ \begin{align*} \boldsymbol{\kappa}_t^{(N)} = \mathbf{1}_{\{t<\tau_\Gamma\}} \mathbf{F}^{(N)}(B_t) \quad\text{with}\quad \mathbf{F}^{(N)}(x):= \mathbb{E}^x \left\{ \int_0^{{\tau_\Gamma}} \mathbf{G}^{(N)}(B_u)\,\mathrm{d} u\right\},\quad x\in\overline{\Gamma}. \end{align*} $$

Furthermore, by the assumption, it also holds that $G^{w} \in C^1(\Gamma )$ for all $w\in \mathcal {W}_d$ , $|w|=N$ . Therefore we can conclude again with Theorem 9.3.3 in [Reference Øksendal55] that $F^{w} \in C^{0}(\overline {\Gamma }, \mathbb {R})\cap C^{2}(\overline {\Gamma }, \mathbb {R})$ solves the Poisson equation with data $g=G^{w}$ for all words w with $|w|=N$ . The statement then follows by induction.

Example 6.3. For $n \in \{1,\dotsc ,d\}$ , let $\mathbb {D}^n $ be the open unit ball in $\mathbb {R}^n$ , and define the (regular) domain $\Gamma = \mathbb {D}^n \times \mathbb {R}^{d-n}\subset \mathbb {R}^d$ . Further note that it holds

$$ \begin{align*} {\tau_\Gamma} = \inf\{t \ge 0\;\vert\; B_t \notin \Gamma\} = \inf\{t \ge 0\;\vert\; \vert(B^1_t, \dots, B^n_t)\vert \ge 1\}. \end{align*} $$

Hence we readily see that ${\tau _\Gamma }$ satisfies the condition in equation (6.3). Applying Corollary 6.2, it follows that the signature cumulant of the Brownian motion B up to the exit of the domain $\Gamma $ is of the form $\boldsymbol {\kappa }_t = \mathbf {1}_{\{t<\tau _\Gamma \}} \mathbf {F}(B_t)$ , where $\mathbf {F}$ satisfies the PDE (6.4). Recall that $\mathbf {F}^{(1)}\equiv 0$ and projecting to the second level we see that

$$ \begin{align*} -\Delta\mathbf{F}^{(2)}(x)= \mathbf{I}_d, \quad x\in\Gamma; \qquad \mathbf{F}^{(2)}\vert_{\partial\Gamma} \equiv 0. \end{align*} $$

The unique bounded solution of the above Poisson equation is given by

$$ \begin{align*}\mathbf{F}^{(2)}(x) = \frac{1}{2}\mathbf{I}_d \left(1-\sum_{i=1}^n x_i^2\right), \quad x \in \Gamma.\end{align*} $$

More generally, we see that the Poisson equation $\Delta u = -g$ on $\Gamma $ with zero boundary condition, where $g\colon \Gamma \to \mathbb {R}$ is a polynomial in the first n-variables, has a unique bounded solution u, which is also a polynomial of the first n-variables of degree $\mathrm {deg}(u) = \mathrm {deg}(g)+2$ and has the factor $(1-\sum _{i=1}^n x_i^2)$ (see Lemma 3.10 in [Reference Lyons and Ni46]). Hence it follows inductively that each component of $\mathbf {F}^{(n)}$ is a polynomial of degree n with the factor $(1-\sum _{i=1}^n x_i^2)$ . The precise coefficients of the polynomial can be obtained as the solution to a system of linear equations recursively derived from the forcing term in (6.4). This is similar to [Reference Lyons and Ni46, Theorem 3.5]; however, we note that a direct conversion of the latter result for the expected signature to signature cumulants is not trivially seen to yield the same recursion and requires combinatorial relations as studied in [Reference Bonnier and Oberhauser7].

6.2 Lévy and diffusion processes

Let $X\in \mathscr {S}(\mathbb {R}^d)$ , and throughout this section assume that the filtration $(\mathcal {F}_t)_{0 \le t \le T}$ is generated by X. Denote by $\varepsilon _a$ the Dirac measure at point $a\in {\mathbb {R}^d}$ ; the random measure $\mu ^{X}$ associated to the jumps of X is an integer-valued random measure of the form

$$ \begin{align*}\mu^{X}(\omega; \mathrm{d} t, \mathrm{d} x) := \sum_{s\geq0} \mathbf 1_{\{\Delta X_{s}(\omega) \neq 0\}} \varepsilon_{(s, \Delta X_{s}(\omega))} (\mathrm{d} t, \mathrm{d} x). \end{align*} $$

There is a version of the predictable compensator of $\mu ^X$ , denoted by $\nu $ , such that the $\mathbb {R}^d$ -valued semimartingale X is quasi-left continuous if and only if $\nu (\omega , \{t\} \times \mathbb {R}^d) = 0$ for all $\omega \in \Omega $ ; see [Reference Jacod and Shiryaev36, Corollary II.1.19]. In general, $\nu $ satisfies $(|x|^{2} \wedge 1) \ast \nu \in \mathscr {A}_{\mathrm {loc}}$ : that is, locally of integrable variation. The semimartingale X admits a canonical representation (using the usual notation for stochastic integrals with respect to random measures as introduced, for example, in [Reference Jacod and Shiryaev36, II.1])

(6.6) $$ \begin{align} X=X_{0}+B(h)+X^{c}+(x-h(x)) \ast \mu^{X} + h(x) \ast (\mu^{X}-\nu), \end{align} $$

where $h(x) = x 1_{|x| \le 1}$ is a truncation function (other choice are possible). Here $B(h)$ is a predictable $\mathbb {R}^d$ -valued process with components in $\mathscr {V}$ and $X^{c}$ is the continuous martingale part of X.

Denote by C the predictable $\mathbb {R}^{d} \otimes \mathbb {R}^d$ -valued covariation process defined as $C^{ij}:=\langle X^{i,c}, X^{j,c} \rangle $ . Then the triplet $(B(h), C, \nu )$ is called the triplet of predictable characteristics of X (or simply the characteristics of X). In many cases of interest, including the case of Lévy and diffusion processes discussed in the subsection below, we have differential characteristics $(b,c,K)$ such that

$$ \begin{align*}\mathrm{d} B_t = b_t (\omega) \mathrm{d} t, \ \mathrm{d} C_t = c_t (\omega) \mathrm{d} t,\ \nu(\mathrm{d} t,\mathrm{d} x) = K_{t}(\mathrm{d} x;\omega) \mathrm{d} t,\end{align*} $$

where b is a d-dimensional predictable process, c is a predictable process taking values in the set of symmetric non-negative definite $d\times d$ -matrices and K is a transition kernel from $(\Omega \times \mathbb {R}_{+}, \mathcal {B}^d)$ into $(\mathbb {R}^d, \mathcal {B}^d)$ . We call such a process Itô semimartingale and the triplet $(b,c,K)$ its differential (or local) characteristics. This extends mutatis mutandis to an $\mathcal {T}_0^N$ (and then $\mathcal {T}_0$ ) valued semimartingale $\mathbf {X}$ , with local characteristics $(\mathbf {b},\mathbf {c},\mathbf {K})$ .

While every Itô semimartingale is quasi-left continuous, it is in general not true that $\boldsymbol {\kappa }$ is continuous (with the notable exception of time-inhomogeneous Lévy processes discussed below), and therefore there is no significant simplification of the functional equation (4.3) in these general terms. The following example illustrates this point in more detail.

Example 6.4. Take $X \in \mathscr {S}(\mathbb {R}^d)$ and then $d=1$ , so that we are effectively in the symmetric setting. In this case $\exp (\boldsymbol {\kappa }_t (T)) = \mathbb {E}_t(\exp ({X_T -X_t}))$ , in the power series sense of enlisting all moments with factorial factors. These can also be obtained by taking higher-order derivatives at $u=0$ of $\mathbb {E}_t (\exp ({u (X_T -X_t)}))$ , now with the classical calculus interpretation of the exponential. The important class of affine models satisfies

$$ \begin{align*}\mathbb{E}_t (\exp({u X_T-u X_t})) = \exp ( \phi (T-t,u) +( \Psi (T-t,u)- u)X_t ) \end{align*} $$

In the Levy-case, we have the trivial situation $\Psi (\cdot ,u) \equiv u$ , but otherwise $(\phi ,\Psi )$ solve (generalized) Riccati equations and are in particular continuous in $T-t$ . We see that, in non-trivial situations, the log of $ \mathbb {E}_t(\exp ({u X_T-u X_t}))$ and any of its derivatives will jump when X jumps. In particular, $\boldsymbol {\kappa }_t(T)$ will not be continuous in t, even if X is quasi-left continuous: that is, when $\Delta X_\tau = 0$ a.s. for all predictable times $\tau $ (see [Reference Jacod and Shiryaev36, p. 22]). Let us note in this context that, in the general non-commutative setting and directly from definition of $\boldsymbol {\kappa }$ ,

$$ \begin{align*}\exp(\boldsymbol{\kappa}_{t-}) = \mathbb{E}_{t-} ( \exp ( \Delta X_t) \exp(\boldsymbol{\kappa}_{t}) ) = \mathbb{E}_{t-} ( \exp \boldsymbol{\kappa}_{t} ), \end{align*} $$

where the second equality holds true under the assumption of quasi-left continuity of X. If we assume for a moment $\mathcal {F}_{t-} = \mathcal {F}_t$ , then we could conclude that $\boldsymbol {\kappa }_{t-} = \boldsymbol {\kappa }_t$ and hence (right-continuity is clear) that $\boldsymbol {\kappa }_t$ is continuous in t. Since we know that this fails beyond Lévy processes, such left continuity of filtrations is not a good assumption, at least not beyond Lévy processes.

6.2.1 The case of time-inhomogeneous Lévy processes

We consider now a d-dimensional time-inhomogeneous Lévy processes of the form

(6.7) $$ \begin{align} X_t = \int_0^{t}\mathbf{b}(u)\,\mathrm{d} u + \int_0^{t}\sigma(u)\,\mathrm{d} B_u + \int_{(0,t]} \int_{|x|\le 1} x \; (\mu^X-\nu)(\mathrm{d} s, \mathrm{d} x) + \int_{(0,t]} \int_{|x|> 1} x \; \mu^X(\mathrm{d} s, \mathrm{d} x), \end{align} $$

for all $0\le t\le T$ , with $b \in L^{1}([0,T],\mathbb {R}^d)$ , $\sigma \in L^2 ([0, T],\mathbb {R}^{m\times d})$ , B a d-dimensional Brownian motion and $\mu ^X$ is an independent inhomogeneous Poisson random measure with the intensity measure $\nu $ on $[0,T] \times \mathbb {R}^d$ , such that $\nu (\mathrm {d} t, \mathrm {d} x) = K_t(\mathrm {d} x)\mathrm {d} t$ with Lévy measures $K_t$ : that is, $K_t(\{0\})=0$ , and

$$ \begin{align*} \int_0^T\int_{{\mathbb{R}^d}} ({|x|}^2 \wedge 1) K_t(\mathrm{d} x)\mathrm{d} t < \infty, \end{align*} $$

and measurability of $t \mapsto K_t (A) \in [0,\infty ]$ , any measurable $A \subset \mathbb {R}^d$ . Consider further the condition

(6.8)

for some integer $N\in {\mathbb {N}_{\ge 1}}$ . The Brownian case in equation (6.1) then generalizes as follows.

Corollary 6.5. Let X be an inhomogenous Lévy process of the form in equation (6.7), such that the family of Lévy measures $\{K_t\}_{t>0}$ satisfy the moment condition in equation (6.8) for all $N\in {\mathbb {N}_{\ge 1}}$ . Then $X\in \mathscr {H}^{\infty -}({\mathbb {R}^d})$ , and the signature cumulant $\boldsymbol {\kappa }_t := \log ( \mathbb {E}_t(\mathrm {Sig}(X)_{t,T}))$ satisfies the following integral equation

(6.9) $$ \begin{align} \boldsymbol{\kappa}_t = \int_t^{T}H(\operatorname{\mathrm{ad}}{\boldsymbol{\kappa}_{u}})(\mathfrak{y}(u))\,\mathrm{d} u,\quad 0 < t \le T, \end{align} $$

with $\mathbf {a}(t) = \sigma (t) \sigma (t)^{T}\in {\mathbb {R}^d}\otimes {\mathbb {R}^d} \subset \mathcal {T}_0$ and

(6.10) $$ \begin{align} \mathfrak{y}(t):= \mathbf{b}(t) + \frac{1}{2}\mathbf{a}(t) + \int_{{\mathbb{R}^d}}(\exp(\mathbf{x})-1-\mathbf{x}\,\mathbf{1}_{|x|\le1}) K_t(\mathrm{d} x) \in \mathcal{T}_0, \end{align} $$

where $\mathbf {x} = (0, x, 0, \dots )\in \mathcal {T}_0$ for $x\in {\mathbb {R}^d}$ . In case the Lévy measures $\{K_t\}_{t>0}$ satisfy the condition in equation (6.8) only up to some finite level $N\in {\mathbb {N}_{\ge 1}}$ , we have $X\in \mathscr {H}^{N}$ , and the identity equation (6.9) holds for the truncated signature cumulant in $\mathcal {T}_0^N$ .

Remark 6.6. Corollary 6.5 extends a main result of [Reference Friz and Shekhar25], where a Lévy-Kintchin type formula was obtained for the expected signature of time homogeneous Lévy processes with constant triplet $(\mathbf {b},\mathbf {a},K)$ . Now this is an immediate consequence of equation (6.9), with all commutators vanishing in time-homogeneous case and explicit solution

$$ \begin{align*} \boldsymbol{\kappa}_t(T) = (T-t)\left(\mathbf{b}+\frac{1}{2}\mathbf{a} + \int_{{\mathbb{R}^d}}(\exp(\mathbf{x})-1-\mathbf{x}\,\mathbf{1}_{|x|\le1}) K(\mathrm{d} x)\right). \end{align*} $$

Proof. Assume that the Lévy measures $\{ K_t \}_{t>0}$ satisfy the condition in equation (6.8) for some $N\in {\mathbb {N}_{\ge 1}}$ . We will first show that $X\in \mathscr {H}^{N}({\mathbb {R}^d})$ . Note that the decomposition in equation (6.7) naturally yields a semimartingale decomposition $X = M + A$ , where the local martingale M and the adapted bounded variation process A are defined by

Regarding the integrability of the $1$ -variation of A, we note that it holds

Define the increasing, piecewise constant process

. Since b is deterministic and integrable over the interval $[0,T]$ , it suffices to show that $V_T$ has finite Nth moment. To this end, note that it holds

Further, it holds that for any $n \in \{1, \dotsc ,N\}$ that

$$ \begin{align*} V_T^n = \sum_{0 < t \le T}\left(V_t^n - V_{t-}^n\right) = \sum_{0 < t \le T}\sum_{k=0}^{n-1} \binom{n}{k} V_{s-}^k(\Delta V_{s})^{n-k} \end{align*} $$

and by definition

. Now let $n = 2$ and $k \in \{0, \dotsc , n-1\}$ ; then we have

It then follows inductively that $\mathbb {E}(V_T^n)$ is finite for all $n = 1, \dotsc , N$ and hence that the 1-variation of A has finite Nth moment.

Concerning the integrability of the quadratic variation of M, let $w\in \{1, \dots , d\}$ ; then it is well known that (see, e.g., [Reference Jacod and Shiryaev36, Ch. II Theorem 1.33])

where $\left \langle M \right \rangle $ denotes the dual predictable projection (or compensator) of $\left [ M \right ]$ . Further, using that the compensated martingale

is orthogonal to continuous martingales, we have

Now let $q \in [1, \infty )$ ; then from Theorem 8.2.20 in [Reference Cohen and Elliott14], we have the following estimation

$$ \begin{align*} \mathbb{E}\left(\left[ M^w \right]^q_T\right) \le c\, \mathbb{E}\left(\left\langle M^w \right\rangle_T^q + \sup_{0 \le t \le T}(\Delta M^w_t)^{2q}\right) \le c \left( \left\langle M^w \right\rangle^q_T + 1 \right) <\infty, \end{align*} $$

where $c>0$ is a constant depending on q.

We have shown that $\mathbf {X} = (0, X, 0, \dotsc , 0) \in \mathscr {H}^{1,N}$ , and it follows from Theorem 4.1 that the signature cumulant $\boldsymbol {\kappa }_t = \log (\mathbb {E}_t(\mathrm {Sig}(\mathbf {X})_{t,T}))$ satisfies the functional equation (4.3). On the other hand, it follows from the condition in equation (6.8) that $\mathfrak {y}$ in equation (6.10) is well defined. Now define $\widetilde {\boldsymbol {\kappa }} = (\widetilde {\boldsymbol {\kappa }}_t)_{0 \le t \le T}$ by the identity equation (6.9). Noting that $\widetilde {\boldsymbol {\kappa }}$ is deterministic and has absolutely continuous components, it is easy to see that $\widetilde {\boldsymbol {\kappa }}$ also satisfies the functional equation (4.3) for the semimartingale X. It thus follows that $\boldsymbol {\kappa }$ and $\widetilde {\boldsymbol {\kappa }}$ are identical.

6.2.2 Markov jump diffusions

The generator of a general Markov jump diffusion is of the form

(6.11)

where the summations are over $i,j\in \{1, \dots , d\}$ , $\mathbf {b}\colon {\mathbb {R}^d} \to {\mathbb {R}^d}$ , $\mathbf {a}\colon {\mathbb {R}^d} \to {\mathbb {R}^d} \otimes {\mathbb {R}^d}$ (symmetric, positive definite) and K is a Borel transition kernel from ${\mathbb {R}^d}$ into ${\mathbb {R}^d}$ with $K(\cdot , \{0\}) \equiv 0$ . Denote by $\mathrm {Lip}(E)$ the space of real valued, bounded and globally Lipschitz continuous functions over a Banach space E. We pose the following further assumptions on the generator:

  • (A1) $\mathbf {b}^{i}, \mathbf {a}^{ij} \in \mathrm {Lip}({\mathbb {R}^d})$ for all $i,j = 1,\dots , d$ ,

  • (A2) There exists a constant $c>0$ such that it holds

  • (A3) There exists a bounded measure m on ${\mathbb {R}^d}$ and a measurable function $\delta :{\mathbb {R}^d}\times {\mathbb {R}^d}\to {\mathbb {R}^d}$ , such that the kernel K is of the form

    $$ \begin{align*} K(x, A) = \int_{{\mathbb{R}^d}}\mathbf 1_{A\setminus\{0\}}(\delta(x,z))m(\mathrm{d} z), \qquad\text{for all measurable }A\subset{\mathbb{R}^d} \text{ and } x\in{\mathbb{R}^d}. \end{align*} $$
    Further, we assume that there exists a function $\rho :{\mathbb {R}^d}\to \mathbb {R}_+$ , such that
    $$ \begin{align*} \int_{{\mathbb{R}^d}} (\rho(z)^{2}\vee\rho(z)^{n}) m(\mathrm{d} z) < \infty,\qquad \text{for all}\quad n\in{\mathbb{N}}, \end{align*} $$
    and for all $x, x^{\prime }, z \in {\mathbb {R}^d}$ , it holds that
    where and $h^{\prime }(y) = y - h(y)$ .

Note that under the above assumptions,Footnote 9 the martingale problem associated to the generator $\mathcal {L}$ admits a unique solution (see [Reference Jacod35, Theorem 13.58]). More precisely, for any probability measure $\eta $ on ${\mathbb {R}^d}$ , there exists a unique measure $\mathbb {P}$ on the canonical Skorohod space $(\Omega , \mathcal {F}, (\mathcal {F}_t))$ , such that the canonical process X satisfies $\mathbb {P}(X_0 \in A ) = \eta (A)$ for all measurable $A\subset {\mathbb {R}^d}$ , and the process

$$ \begin{align*} M_t:= f(X_t)-f(X_0) - \int_{0}^{t} \mathcal{L}f(X_{s})\mathrm{d} s \qquad 0 \le t \le T, \end{align*} $$

is a local martingale under $\mathbb {P}$ for all $f \in C^{2}_b({\mathbb {R}^d})$ , the space of real-valued, bounded and twice continuously differentiable functions, with bounded first and second-order partial derivatives. In general, we may call a semimartingale a Markov jump diffusion associated to the generator $\mathcal {L}$ if its law coincides with $\mathbb {P}$ : that is, it solves the martingale problem associated to $\mathcal {L}$ . The martingale problem can be equivalently formulated in terms of semimartingale characteristics (see [Reference Jacod35, XIII.3] and [Reference Jacod and Shiryaev36, III.2.c]). In particular, under the given assumptions, it holds that $\mathbb {P}$ is a solution to the martingale problem if and only if the canonical processes has the following semimartingale characteristics (see [Reference Jacod35, Theorem 13.58])

$$ \begin{align*} \mathrm{d} \mathbf{B}_t = \mathbf{b}(X_{t-}) \mathrm{d} t, \quad \mathrm{d} \mathbf{C}_t =\mathbf{a}(X_{t-})\mathrm{d} t, \quad \nu(\mathrm{d} t, \mathrm{d} x) = \mathrm{d} t K(X_{t-}, \mathrm{d} x). \end{align*} $$

The extensions to Markov processes with characteristics of the form $(\mathbf {b}(t,x),\mathbf {a}(t,x),K(t,x,\mathrm {d} y))$ with associated local Lévy generators [Reference Stroock60] is mostly notational. Apart from already presented references, the reader may also consult the abundant literature in the continuous case (e.g., [Reference Stroock and Varadhan61], [Reference Revuz and Yor59, VII.2] or [Reference Karatzas and Shreve38, 5.4]).

The expected signature of a Markov jump diffusion X was seen in [Reference Friz and Shekhar25] to satisfy a system of linear partial integro-differential equations (PIDEs); the continuous case was already presented [Reference Ni54] with a corresponding system of PDEs. The passage to signature cumulants amounts to taking the logarithm, which represents a non-commutative Cole–Hopf transform with resulting quadratic non-linearity, if viewed as a $\mathcal {T}_1$ -valued PIDE, resolved - thanks to the graded structure - into a system of linear PIDEs. In Corollary 6.7 below we will show how this PIDE can be derived directly from our Theorem 4.1.

Since we are mainly interested in exposing the algebraic structure of this PIDE system and we want to avoid dwelling much further on the solution theory of PIDEs associated to the operator $\mathcal {L}$ , we pose one further assumption:

  • (A4) For any $f\in \mathrm {Lip}([0,T]\times {\mathbb {R}^d})$ the Cauchy problem with zero terminal condition

    (6.12) $$ \begin{align} \begin{aligned} [\partial_t + \mathcal{L}]u(t,x) &= f(t,x), \qquad (t,x) \in [0, T)\times{\mathbb{R}^d}, \\ u(T, x) &= 0, \qquad x\in{\mathbb{R}^d}, \end{aligned} \end{align} $$
    has a unique solution u in the spaceFootnote 10
    $$ \begin{align*}C_{b}^{1,2}([0,T]\times {\mathbb{R}^d}) := \left\{ u\in C^{1,2}([0,T]\times{\mathbb{R}^d}):\; u,\,\partial_{x_1}u, \,\dots, \,\partial_{x_d}u \in \mathrm{Lip}([0,T]\times{\mathbb{R}^d})\right\},\end{align*} $$
    where $C^{1,2}([0,T]\times {\mathbb {R}^d})$ is the space of real valued functions on $[0,T]\times {\mathbb {R}^d}$ that are once continuously differentiable in t and twice continuously differentiable in x.

A result similar to the above assumption (A4) based on conditions similar to (A1)-(A3) can be found [Reference Pham56, Proposition 5.3]. The main difference is that in the latter result only a linear growth condition on the forcing f is assumed and hence only a growth property of the solution is obtained. Note that the proof in [Reference Pham56] is based on a viscosity solution approach, which then allows to recast the PIDE in terms of a PDE with modified drift and forcing terms. Hence the bounds on the solution of the Cauchy problem and its derivatives should follow as a consequence of the stronger assumptions on the forcing and classical estimates as they can be found in [Reference Friedman22, Sec. 9.4].

We now present the main result of this section.

Corollary 6.7. Assume that $\mathcal {L}$ in equation (6.11) satisfies (A1)–(A4), and let X be a d-dimensional Markov jump diffusion with this generator. Then $\mathbf {X} = (0, X, 0, \dots )\in \mathscr {H}^{\infty -}$ , and the signature cumulant is of the form

$$ \begin{align*} \boldsymbol{\kappa}_t(T) = \mathbf{v}(t, X_t; T) = \mathbf{v}(t,X_t), \end{align*} $$

where $\mathbf {v}=\sum _{w} \mathbf {v}^{w} e_w$ is the unique solution with $\mathbf {v}^{w} \in C^{1,2}_b([0,T] \times {\mathbb {R}^d})$ for all $w\in \mathcal {W}_d$ of the following partial integro-differential equation

(6.13)

on $[0,T]\times {\mathbb {R}^d}$ with terminal condition $\mathbf {v}(T, \cdot ) \equiv 0$ , where $\mathbf {y} := (0, y, 0 \dots )\in \mathcal {T}_0$ and $\tau _y(t, x) = (t, x + y)$ .

Remark 6.8. It is instructive to look at the one-dimensional case. As seen several times before, all adjoint operators vanish due to commutativity in this case. With the more classical notation $\partial _x = \partial _1 \partial _{xx} = \partial _{11}$ , $\varepsilon = e_1$ , $\mathbf {b} = b\varepsilon $ , $\mathbf {a} = a \varepsilon ^2$ and with $\mathbf {v}_\varepsilon (t,x) := \varepsilon x + \mathbf {v}(t,x)$ , we see that the PIDE (6.13) then simplifies to

(6.14) $$ \begin{align} -[\partial_t + \mathcal{L}]\mathbf{v}_\varepsilon = \frac{1}{2} a\, \left(\partial_x \mathbf{v}_\varepsilon\right)^{2} + \int_{\mathbb{R}} \big\{\exp(\mathbf{v}_\varepsilon\circ \tau_y -\mathbf{v}_\varepsilon) - 1 - (\mathbf{v}_\varepsilon\circ \tau_y -\mathbf{v}_\varepsilon) \big\}K(\cdot, \mathrm{d} y). \end{align} $$

Introducing the Cole-Hopf transformation $\mathbf {u}_\varepsilon (t,x) = \exp (\mathbf {v}_\varepsilon (t,x))$ , multiplying equation (6.14) with $\mathbf {u}_\varepsilon $ and considering the differential relations

$$ \begin{align*}\partial_t\mathbf{u}_\varepsilon = \mathbf{u}_\varepsilon\partial_t \mathbf{v}_\varepsilon, \quad \partial_{x}\mathbf{u}_\varepsilon = \mathbf{u}_\varepsilon\, \partial_x\mathbf{v}_\varepsilon, \quad \partial_{xx}\mathbf{u}_\varepsilon = \mathbf{u}_\varepsilon\,(\partial_x\mathbf{v}_\varepsilon)^{2} + \mathbf{u}_\varepsilon\partial_{xx} \mathbf{v}_\varepsilon,\end{align*} $$

we obtain the following PIDE for $\mathbf {u}_\varepsilon $ :

$$ \begin{align*} -\mathbf{u}_\varepsilon\left(\mathcal{L}\mathbf{u}_\varepsilon\right) = 0 \quad \Leftrightarrow \quad \mathcal{L}\mathbf{u}_\varepsilon = 0, \end{align*} $$

where the equivalence follows from the invertability of $\mathbf {u}_\varepsilon $ in $\mathbb {R}[[\varepsilon ]]\cong T((\mathbb {R}))$ . Conversely, the signature simplifies in the one-dimensional case to the exponential of the path increment, and therefore we have

$$ \begin{align*} \mathbf{u}_\varepsilon(t,X_t) = \exp(\mathbf{v}_\varepsilon(t,X_t)) = \exp(\varepsilon X_t)\mathbb{E}_t(\mathrm{Sig}(\mathbf{X})_{t,T}) = \mathbb{E}_t(\exp(\varepsilon X_T)). \end{align*} $$

Hence, the above PIDE for $\mathbf {u}_\varepsilon $ is already expected from the Feynman-Kac representation (see, e.g., [Reference Cont and Tankov15, Proposition 12.5]). The expansion $\mathbf {v}_\varepsilon = \varepsilon v^{1} + \varepsilon ^{2} v^{2} + \dots $ , with $\mathbf {v}_\varepsilon ^{(i)} = \varepsilon ^{i}v^{i}$ , is related to the Wild expansion used in Hairer’s approach for solving the KPZ equation [Reference Hairer30]. Indeed, as explained in [Reference Friz, Gatheral and Radoičić23, Section 4.5] for the continuous case, each term $v^{i}$ can be composed as a sum of solutions to linear PDEs, indexed by binary trees with i leaves. The jump case follows similarly; however, the root joining operation corresponds to the creation of a different forcing term, which apart from the carré du champ (compare with [Reference Friz, Gatheral and Radoičić23, Remark 4.1]), includes terms derived from the integral in equation (6.14) (the combinatorial relations from breaking apart the exponential are analogous to those in Corollary 5.5).

Remark 6.9. We are still in the one-dimensional case and use notation from the previous remark. Assume now that the underlying Markov jump diffusion X takes values in the domain $D=\mathbb {R}$ or $D=\mathbb {R}_{\ge 0}$ and has affine differential characteristics of the form

$$ \begin{align*} b(x) = \beta_0 + \beta_1 x, \qquad a(x) = \alpha_0 + \alpha_1 x, \qquad K(x,\mathrm{d} y) = (\lambda_0 + \lambda_1 x)m(\mathrm{d} y), \end{align*} $$

for all $x\in D$ with $\beta _0\in D$ , $\beta _1\in \mathbb {R} \alpha _0, \alpha _1, \lambda _0, \lambda _1 \ge 0$ , and m is a suitably integrable measure with $\mathrm {supp}(m)\subset D$ , where $\alpha _0=0$ if $D=\mathbb {R}_{\ge 0}$ and $\lambda _1 = \alpha _1=0$ if $D=\mathbb {R}$ . In this case, $\mathbf {b}$ and $\mathbf {a}$ clearly do not satisfy the conditions (A1) and (A2) in general. Nevertheless, we can start by considering the PIDE (6.14) and use the following affine ansatz

$$ \begin{align*} \mathbf{v}_\varepsilon(t,x) = \Phi(T-t) + \Psi(T-t)x, \qquad \Phi = \sum_{i=1}^{\infty} \phi^{i}\varepsilon^{i}, \quad \Psi = \sum_{i=1}^{\infty} \psi^{i}\varepsilon^{i} \end{align*} $$

for differentiable functions $\phi ^i, \psi ^i:[0,T]\to \mathbb {R}$ with $\psi ^1(0) = 1$ and $\phi ^i(0) =\psi ^{i+1}(0) = 0$ for all $i \in {\mathbb {N}_{\ge 1}}$ , which, after equating coefficients in x, yields the following formal Ricatti ODE

$$ \begin{align*} \dot\Psi &= \beta_1\Psi + \frac{1}{2}\alpha_1 (\Psi)^{2} + \lambda_1\int_{\mathbb{R}}\exp(\Psi y)m(\mathrm{d} y)\\ \dot\Phi &= \beta_0\Psi + \frac{1}{2}\alpha_0 (\Psi)^{2} + \lambda_0\int_{\mathbb{R}}\exp(\Psi y)m(\mathrm{d} y). \end{align*} $$

Equating coefficients in $\varepsilon $ this formal quadratic ODE turns into a system of linear ODEs

where $m_k$ is the kth moment of m; that can easily be solved explicitly. The resulting formulas relate to the cumulant formulas obtained in the more general setting of polynomial processes in [Reference Cuchiero, Keller-Ressel and Teichmann17]. We do not claim to provide any new results on the topic of affine processes, and therefore we also do not relate the above formal steps back to the cumulants of the process, which would require a more careful analysis. An interesting open question from this perspective is how to fully employ the affine structure in the non-commutative setting. In the next section, we present a result in this direction applied to affine Volterra processes.

Proof of Corollary 6.7

We can easily verify the boundedness of $\mathbf {b}$ and $\mathbf {a}$ , and the moment condition in assumption (A3) implies that $\mathbf {X} = (0, X, 0, \dots ) \in \mathscr {H}^{\infty -}$ (compare also with the proof of Corollary 6.5). It then follows from Theorem 4.1 that $\boldsymbol {\kappa }(T) = (\mathbb {E}_t(\mathrm {Sig}(\mathbf {X})_{t,T}))_{0 \le t \le T}$ is the unique solution to the functional equation (4.3).

Next we are going to discuss the existence and uniqueness of a solution to the PIDE. Therefore, note that projecting equation (6.13) to any tensor level $w \in \mathcal {W}_d$ , we obtain a Cauchy problem of the form in equation (6.12), where the forcing term f is given by the projection of the right-hand side in equation (6.13) to the tensor component w. Therefore the existence of a unique solution $\mathbf {v}^{w} \in C^{1,2}_b([0,T]\times {\mathbb {R}^d})$ follows by assumption (A4), provided we can show that the corresponding forcing term is in $\mathrm {Lip}([0,T]\times {\mathbb {R}^d})$ . We prove the latter assertion inductively: projecting the right-hand side of equation (6.13) to the tensor component $i = \{1, \dots , d\}$ , we obtain the following forcing term

Recall that $\mathbf {b}^{i}\in \mathrm {Lip}([0,T]\times {\mathbb {R}^d})$ by assumption (A1). Regarding the integral term, we see that boundedness and the global Lipschitz continuity is a consequence of assumption (A3). Hence it follows that the above forcing term is indeed in $\mathrm {Lip}([0,T]\times {\mathbb {R}^d})$ , which closes the base of the induction.

As the induction claim, assume that the projection of the right-hand side of equation (6.13) to all tensor components ${w^{\prime }}\in \mathcal {W}_d$ with $ \left \vert {{w^{\prime }}} \right \vert \le n$ for some $n\in {\mathbb {N}_{\ge 1}}$ yields a function in $\mathrm {Lip}([0,T]\times {\mathbb {R}^d})$ and hence, by assumption (A4), the corresponding Cauchy problem with generator $\mathcal {L}$ has unique solutions $\mathbf {v}^{{w^{\prime }}}\in C^{1,2}([0,T]\times {\mathbb {R}^d})$ .

For the induction step, let $w\in \mathcal {W}_d$ with

be arbitrary. Note that the right-hand side of equation (6.13) consists of two main terms, the second one of which is given by an integral. Projecting the first term to the tensor components yields a linear combination of (finite) products of the functions $\mathbf {a}^{ij},\mathbf {v}^{{w^{\prime }}}$ and $\partial _i \mathbf {v}^{{w^{\prime }}}$ with $ \left \vert {{w^{\prime }}} \right \vert \le n$ , $i\in \{1, \dots , d\}$ . By assumption (A1) and the induction hypothesis, each of these functions is in $\mathrm {Lip}([0,T]\times {\mathbb {R}^d})$ and therefore their products and linear combinations also are (clearly $\mathrm {Lip}([0,T]\times {\mathbb {R}^d})$ forms an algebra). Hence we are left with proving that the projection of the integral term in the right-hand side of equation (6.13) is also in $\mathrm {Lip}([0,T]\times {\mathbb {R}^d})$ . Using a Taylor expansion and the derivatives of the exponential map from Lemma 7.5 (compare also with Lemma 7.9), we obtain the following identity:

Hence, we see from the above right-hand side and the induction claim that the projection of the integral term in the forcing of equation (6.13) to the tensor component w is a linear combination of functions of the form

$$ \begin{align*} (t,x)\mapsto f(t,x):= \int_{{\mathbb{R}^d}}g(t,x)p(y)\left(\int_0^{1}q(y\theta)h(t, x + \theta y)(1-\theta)\mathrm{d}\theta\right)K(x, \mathrm{d} y), \end{align*} $$

where $g, h \in \mathrm {Lip}([0,T]\times {\mathbb {R}^d})$ , and where $p(y)$ and $q(y)$ are homogeneous polynomials in $(y^{1}, \dots , y^{d})$ with $\deg (p) = 2$ , $0 \le \deg (q)\le n-1$ and $q\neq 0$ . It follows from assumption (A3) that the above function and hence any linear combination of functions of the same form are in $\mathrm {Lip}([0,T]\times {\mathbb {R}^d})$ . Indeed, let $(t,x),(t^{\prime }, x^{\prime }) \in [0,T]\times {\mathbb {R}^d}$ ; then we have

where $c,c^{\prime }> 0$ are constants (which will change from line to line in what follows). Regarding the first integral term in the last line above, note that for all $y\in {\mathbb {R}^d}$ , it holds that

It then follows by assumption (A3) that for all $x\in {\mathbb {R}^d}$ , we have

Regarding the second integral term, we further note that for all $y,y^{\prime }\in {\mathbb {R}^d}$ , it holds

Therefore, again by assumption (A3), we have

where $c^{\prime \prime }>0$ is another constant. This finishes the proof of the assertion that $f\in \mathrm {Lip}([0,T]\times {\mathbb {R}^d})$ and hence also the proof of the induction step. We have thus concluded the proof about the existence and uniqueness of a solution to the PIDE.

Now define $\tilde {\boldsymbol {\kappa }}\in \mathscr {S}(\mathcal {T}_0)$ by $\tilde {\boldsymbol {\kappa }}_t := \mathbf {v}(t,X_t)$ for all $0 \le t \le T$ , and note that $\tilde {\boldsymbol {\kappa }}_{t-} = \mathbf {v}(t, X_{t-})$ . We are going to show that $\tilde {\boldsymbol {\kappa }}$ also satisfies the functional equation (4.3). Since X solves the martingale problem with generator $\mathcal {L}$ and $\mathbf {v}$ is sufficiently regular, an application of Itô’s formula yields that the processes

$$ \begin{align*} \mathbf{v}(t, X_t) - \mathbf{v}(0, X_0) -\int_0^{t}[\partial_t + \mathcal{L}] \mathbf{v} (u,X_{u-}) \mathrm{d} u, \qquad 0 \le t \le T, \end{align*} $$

is a local martingale: that is, in $\mathscr {M}_{\mathrm {loc}}(\mathcal {T}_0)$ . However, as $\mathbf {v}^{w} \in C^{1,2}_b([0,T]\times {\mathbb {R}^d})$ , it follows that the above process is bounded and therefore also a true martingale: that is, in $\mathscr {M}(\mathcal {T}_0)$ . Hence, using further the terminal condition $\mathbf {v}(T,\cdot )\equiv 0$ , we obtain the identity

(6.15) $$ \begin{align} \tilde{\boldsymbol{\kappa}}_t = -\mathbb{E}_t\big(\mathbf{v}(T, X_T)- \mathbf{v}(t, X_t) \big) = \mathbb{E}_t\bigg(-\int_t^{T}[\partial_t + \mathcal{L}] \mathbf{v} (u,X_{u-}) \mathrm{d} u \bigg). \end{align} $$

On the other hand, we can plug $\tilde {\boldsymbol {\kappa }}$ into the right-hand side of equation (4.3). We then obtain for the first integral inside the conditional expectation

$$ \begin{align*} \int_{(0,t]}H(\operatorname{\mathrm{ad}} \tilde{\boldsymbol{\kappa}}_{u-})(\mathrm{d}\mathbf{X}_u) = \int_{(0,t]}H(\operatorname{\mathrm{ad}} \tilde{\boldsymbol{\kappa}}_{u-})(\mathrm{d} \mathbf{B}_u + \mathrm{d} \mathbf{X}^{c}_u) + \mathbf{W}\ast(\mu^{X} - \nu)_t + \overline{\mathbf{W}}\ast\mu^{X}_t, \end{align*} $$

where $\mu ^{X}$ denotes the random measure associated with the jumps of X and

$$ \begin{align*} \mathbf{W}_t(y) := H(\operatorname{\mathrm{ad}} \tilde{\boldsymbol{\kappa}}_{t-})(h(y)), \quad\text{and}\quad \overline{\mathbf{W}}_t(y) := H(\operatorname{\mathrm{ad}} \tilde{\boldsymbol{\kappa}}_{t-})(y - h(y)), \end{align*} $$

for all $0 \le t \le T$ , $y\in {\mathbb {R}^d}$ and where

is the usual truncation function. Similarly, we have

$$ \begin{align*} \sum_{0 < u \le t}\bigg\{H(\operatorname{\mathrm{ad}}{\tilde{\boldsymbol{\kappa}}_{u-}})\Big(\exp(\Delta \mathbf{X}_u)\exp(\tilde{\boldsymbol{\kappa}}_u)\exp(-\tilde{\boldsymbol{\kappa}}_{u-}) - 1 -\Delta \mathbf{X}_u\Big) - \Delta \tilde{\boldsymbol{\kappa}}_u \bigg\} = \mathbf{J} \ast \mu^{X}_t, \end{align*} $$

where $0 \le t \le T$ and for $y\in {\mathbb {R}^d}$ with $\mathbf {y} = (0, y, 0, \dots )\in \mathcal {T}_0$

$$ \begin{align*} \mathbf{J}_t(y) := \bigg\{H(\operatorname{\mathrm{ad}}\mathbf{v})\Big(\exp(\mathbf{y})\exp(\mathbf{v}\circ\tau_y)\exp(-\mathbf{v}) - 1 - \mathbf{y}\Big) - (\mathbf{v}\circ\tau_y - \mathbf{v}) \bigg\}(t, X_{t-}). \end{align*} $$

Finally, for the quadratic variation terms with respect to continuous parts, we have

Provided that we can show the following integrability property holds for all words $w\in \mathcal {W}_d$

(6.16)

it follows that

$$ \begin{align*} &\mathbb{E}_t\bigg\{\int_{(t,T]}H(\operatorname{\mathrm{ad}} \tilde{\boldsymbol{\kappa}}_{u-})(\mathrm{d}\mathbf{X}_u) + \mathbf{U}_{t,T} + \mathbf{J} \ast \mu^{X}_{t,T}\bigg\} \\ &\quad= \mathbb{E}_t\bigg\{ \int_t^{T} H(\operatorname{\mathrm{ad}} \tilde{\boldsymbol{\kappa}}_{u-})(\mathrm{d} \mathbf{B}_u) + \mathbf{U}_{t,T} + (\mathbf{J} + \overline{\mathbf{W}}) \ast \nu_{t,T}\bigg\} \\ &\quad= \mathbb{E}_t\bigg\{ \int_{(t,T]}\bigg( H(\operatorname{\mathrm{ad}} \mathbf{v})\big(\mathbf{b}(X_{u-}) + \mathbf{u}(u, X_{u-})\big) + \int_{{\mathbb{R}^d}}(\mathbf{J}_{u}(y) + \overline{\mathbf{W}}_{u}(y)) K(X_{u-}, \mathrm{d} y)\bigg) \mathrm{d} u\bigg\} \\ &\quad= \mathbb{E}_t\bigg(-\int_t^{T}[\partial_t + \mathcal{L}] \mathbf{v} (u,X_{u-}) \mathrm{d} u \bigg), \end{align*} $$

where in the last line we have used $\mathbf {v}$ , which satisfies the PIDE. Since the above left-hand side is precisely the right-hand side of the functional equation (4.3), it follows together with equation (6.15) that $\tilde {\boldsymbol {\kappa }}$ satisfies the functional equation (4.3).

Note that in case the integrability condition in equation (6.16) is satisfied for all words $w\in \mathcal {W}_d$ with for some length $n\in {\mathbb {N}_{\ge 1}}$ , it follows that the above equality holds up to the projection with $\pi _{(0,n)}$ . For words with , the condition in equation (6.16) is an immediate consequence of $\mathbf {X}\in \mathscr {H}^{\infty -}$ . It then follows inductively by the same arguments as in the proof of Claim 7.12 that equation (6.16) is indeed satisfied for all words $w\in \mathcal {W}_d$ .

Since $\boldsymbol {\kappa }(T)$ is the unique solution to equation (4.3), it then follows that $\tilde {\boldsymbol {\kappa }} \equiv \boldsymbol {\kappa }(T)$ .

6.3 Affine Volterra processes

For $i =1, 2$ , let $K^{i}$ be an integration kernel such that $K^{i}(t, \cdot ) \in L^{2}([0, t])$ for all $0 \le t \le T$ , and let $V^{i}$ be the solution to the Volterra integral equation

$$ \begin{align*} V^{i}_t = V^{i}_0 + \int_0^{t} K^{i}(t,s)\sqrt{V^{i}_s}\mathrm{d} W^{i}_s, \quad 0 \le t\le T, \end{align*} $$

with $V^{i}_0> 0$ , where $W^{1}$ and $W^{2}$ are uncorrelated standard Brownian motions which generate the filtration $(\mathcal {F}_t)_{0 \le t \le T}$ . Note that in general, $V^{i}$ is not a semimartingale. In particular, this is not the case when $K^{i}$ is a power-law kernel of the form $K(t,s) \sim (t-s)^{H-1/2}$ for some $H\in (0, 1/2)$ , which is the prototype of a rough affine volatility model (see, e.g., [Reference Keller-Ressel, Larsson and Pulido39]). However, a martingale $\xi ^{i}(T)$ is naturally associated to $V^{i}$ by

$$ \begin{align*}\xi^{i}_t(T) = \mathbb{E}_t(V^{i}_T), \quad 0 \le t \le T.\end{align*} $$

In the financial context, $\xi ^{i}(T)$ is the central object of a forward variance model (see, e.g., [Reference Gatheral and Keller-Ressel29]). It was seen in [Reference Friz, Gatheral and Radoičić23] that the iterated diamond products of $\xi ^{1}(T)$ are of a particularly simple form and easily translated to a system of convolutional Riccati equations of the type studied in [Reference Abi Jaber, Larsson and Pulido1] for the cumulant generating function. We are interested in the signature cumulant of the two-dimensional martingale $X = (\xi ^{1}(T), \xi ^{2}(T))$ .

Corollary 6.10. It holds that $\mathbf {X} = (0,\, \xi ^{1}(T)e_1 + \xi ^{2}(T)e_2,\, 0,\, \dots )\in \mathscr {H}^{\infty -}$ and the signature cumulant $\boldsymbol {\kappa }_t(T) = \log \mathbb {E}_t(\mathrm {Sig}(\mathbf {X})_{t,T})$ is the unique solution to the functional equation: for all $0 \le t \le T$

Proof. Regarding the integrability statement, it suffices to check that $V^{i}_T$ has moments of all order for $i=1, 2$ . This is indeed the case, and we refer to [Reference Abi Jaber, Larsson and Pulido1, Lemma 3.1] for a proof. Hence we can apply Theorem 4.1, and we see that $\boldsymbol {\kappa }$ satisfies the functional equation (4.3). As described in Section 4.1, this equation can be reformulated with brackets replaced by diamonds. Further note that, due to the continuity, jump terms vanish and, due to the martingality, the Itô integrals with respect to $\mathbf {X}$ have zero expectation. The final step to arrive at the above form of the functional equation is to calculate the brackets $\langle \xi ^{i}(T), \xi ^{j}(T) \rangle $ . From the definition $\xi ^{i}(T)$ and $V^{i}$ , we have for all $0 \le t \le T$

$$ \begin{align*} \xi^{i}_t(T) &= \mathbb{E}_t \left( V^{i}_0 +\int_0^{t}K^{i}(T,s)\sqrt{V^{i}_s} \mathrm{d} W^{i}_s+ \int_t^{T}K^{i}(t,s)\sqrt{V^{i}_s} \mathrm{d} W^{i}_s\right) = V^{i}_0 + \int_0^{t}K^{i}(T,s)\sqrt{V^{i}_s} \mathrm{d} W^{i}_s. \end{align*} $$

Therefore, due to the independence, we have $\langle \xi ^{1}(T), \xi ^{2}(T) \rangle = 0$ ; and for the square bracket, we have $\mathrm {d} \langle \xi ^{i}(T), \xi ^{i}(T)\rangle _t = K^{i}(T,t)^{2}V^{i}_t \mathrm {d} t$ .

The recursion for the signature cumulants from Corollary 4.3 is easily simplified in analogy to the above corollary. In the rest of this section, we are going to demonstrate explicit calculations for the first four levels. Clearly, due to the martingality, the first-level signature cumulants are identically zero $\boldsymbol {\kappa }^{(1)}(T) \equiv 0$ . In the second level, we start to observe the type of simplifications that appear due to the affine structure

$$ \begin{align*} \boldsymbol{\kappa}^{(2)}_t(T) &= \frac{1}{2}\sum_{i=1,2}e_{ii}(\xi^{i}(T) \diamond \xi^{i}(T))_t(T) = \frac{1}{2}\sum_{i=1,2}e_{ii}\mathbb{E}_t\left( \int_t^{T}K^{i}(T,u)^{2}V_u^{i}\mathrm{d} u\right) \\ &=\frac{1}{2}\sum_{i=1,2}e_{ii} \int_t^{T}K^{i}(T,u)^{2} \xi^{i}_t(u) \mathrm{d} u, \end{align*} $$

where $\xi ^{i}_t(u) = \mathbb {E}_t(V^{i}_u)$ for all $0 \le t \le u \le T$ . The third level is of the same form

$$ \begin{align*} \boldsymbol{\kappa}^{(3)}_t(T) &= \frac{1}{2}\sum_{i=1,2}e_{i}(\xi^{i}(T) \diamond \boldsymbol{\kappa}^{(2)}(T))_t(T) \\ &=\frac{1}{2}\sum_{i=1,2}e_{iii}\int_t^{T}\left(\int_u^{T}K^{i}(T,s)^{2}K^{i}(T, u)K^{i}(s,u) \mathrm{d} s\right)\xi^{i}(u) \mathrm{d} u, \end{align*} $$

where we have used that for any suitable $h:[0, T]\to \mathbb {R}$ , it holds that for all $0 \le t \le T$

$$ \begin{align*} \int_t^{T}h(u)\xi^{i}_t(u)\mathrm{d} u = \int_0^{T}h(u) V^{i}_0 \mathrm{d} u - \int_0^{t}h(u)V^{i}_u\mathrm{d} u + \int_0^{t}\left(\int_u^{T} h(s)K^{i}(s, u)\mathrm{d} s \right)\sqrt{V_u^{i}}\mathrm{d} W^{i}_u. \end{align*} $$

The fourth level starts to reveal some of the structure that is not visible in the commutative setting

$$ \begin{align*} \boldsymbol{\kappa}^{(4)}_t(T) =& \sum_{i=1,2}\left\{\frac{1}{8}[e_{\bar i \bar i}, e_{ii}]\int_t^{T}\left(\int_u^{T}K^{\bar i}(T,s)\xi_t^{\bar i}(s)\mathrm{d} s\right) K^{i}(T, u)^{2} \xi_t^{i}(u) \mathrm{d} u + e_{iiii}\int_t^{T}h^{i}(T,u) \xi^{i}_t(u)\mathrm{d} u\right\}, \end{align*} $$

where $\{i, \bar {i}\} = \{1,2\}$ and $h^{i}$ is defined by

$$ \begin{align*} h^{i}(T, u) =& \frac{1}{8}\left(\int_u^{T}K^{i}(T,s)^{2}K^{i}(u,s)\mathrm{d} s\right)^{2} \\ &+ \frac{1}{2}\int_u^{T}\left(\int_s^{T}K^{i}(T,r)^{2}K^{i}(T, s)K^{i}(r,s) \mathrm{d} r\right)K^{i}(T,s)K^{i}(s,u)ds, \quad 0 \le u \le T. \end{align*} $$

7 Proofs

For ease of notation, we introduce a norm on the space of tensor valued finite variation process, which could have been introduced in Section 2.4 but was not needed until now. Let $q\in [1,\infty )$ and $\mathbf {A}\in \mathscr {V}(({\mathbb {R}^d})^{\otimes n})$ for some $n\in {\mathbb {N}_{\ge 1}}$ ; then we define

It is easy to see that it holds $ \left \Vert {\mathbf {A}} \right \Vert _{\mathscr {H}^{q}} \le \left \Vert {\mathbf {A}} \right \Vert _{\mathscr {V}^{q}}$ , and this inequality can be strict.

Further, for an element $\mathbb {A} \in T({\mathbb {R}^d}) \overline {\otimes } T({\mathbb {R}^d})$ , we introduce the following notation

$$\begin{align*}\mathbb{A}= \sum_{w_1, w_2 \in \mathcal{W}_d} \mathbb{A}^{w_1, w_2} \, e_{w_1}\!\otimes e_{w_2}, \quad \mathbb{A}^{w_1, w_2}\in\mathbb{R}, \end{align*}$$

and for $l_1, l_2 \in {\mathbb {N}_{\ge 1}}$

$$ \begin{align*} \mathbb{A}^{(l_1, l_2)} = \sum_{|w_1|=l_1, |w_2|=l_2} e_{w_1w_2}\otimes \mathbb{A}^{w_1,w_2} \in ({\mathbb{R}^d})^{\otimes l_1} \otimes ({\mathbb{R}^d})^{\otimes l_2} \subset T({\mathbb{R}^d}) \overline{\otimes} T({\mathbb{R}^d}). \end{align*} $$

Next we will prove two well-known lemmas translated to the setting of tensor valued semimartingales.

Lemma 7.1 (Kunita-Watanabe inequality)

Let $\mathbf {X} \in \mathscr {S}(({\mathbb {R}^d})^{\otimes n})$ and $\mathbf {Y} \in \mathscr {S}(({\mathbb {R}^d})^{\otimes m})$ ; then the following estimate holds a.s.

where $c>0$ is a constant that only depends on d, m and n.

Proof. From the definition of the quadratic variation of tensor valued semimartingales in Section 2.4, we have

where the first estimate follows from the triangle inequality, the second estimate from the (scalar) Kunita-Watanabe inequality [Reference Protter57, Ch. II, Theorem 25] and the last two estimates follow from the standard estimate between the $1$ -norm and the $2$ -norm on $({\mathbb {R}^d})^{\otimes m}\cong \mathbb {R}^{d^m}$ .

In order to prove the next well-known lemma (Emery’s inequality), we need the following technical lemma.

Lemma 7.2. Let $\mathbf {A}\in \mathscr {V}(({\mathbb {R}^d})^{\otimes n})$ , $\mathbf {Y}\in \mathscr {D}(({\mathbb {R}^d})^{\otimes l})$ , $\mathbf {Z}\in \mathscr {D}(({\mathbb {R}^d})^{\otimes m})$ ; then it holds that

where the integration with respect to

denotes the integration with respect to the increasing one-dimensional path

. Further, let $\mathbf {Y}^{\prime }\in \mathscr {D}(({\mathbb {R}^d})^{\otimes l^{\prime }})$ and $\mathbf {Z}^{\prime }\in \mathscr {D}(({\mathbb {R}^d})^{\otimes m^{\prime }})$ , and let $(\mathbb {A}_t)_{0 \le t \le T}$ be a process taking values in $({\mathbb {R}^d})^{\otimes n}\otimes ({\mathbb {R}^d})^{\otimes n^{\prime }}$ such that $A^{w_1, w_2} \in \mathscr {V}$ for all $w_1, w_2\in \mathcal {W}_d$ with $|w_1|= n$ and $|w_2| = n^{\prime }$ . Then it holds that

where $(\mathbf {Y} \mathrm {Id} \mathbf {Y}^{\prime })(\mathbf {A}) = \mathbf {Y}\mathbf {A} \mathbf {Y}^{\prime }$ is the left- respectively right-multiplication by $\mathbf {Y}$ , respectively $\mathbf {Y}^{\prime }$ .

Proof. Let $0 \le s \le t \le T$ ; then it holds that

Indeed, as follows, for example, from [Reference Young62, Theorem on Stieltjes integrability], we can approximate the integral in the left-hand side by Riemann sums. Then for a partition $(t_i)_{i=1, \dotsc , k}$ of the interval $[s,t]$ , we have

where the last inequality follows from the fact that for homogeneous tensors $\mathbf {x}\in ({\mathbb {R}^d})^{\otimes m}$ and $\mathbf {y}\in ({\mathbb {R}^d})^{\otimes n}$ , it holds that

. Regarding the $1$ -variation, we then have

Regarding the second statement, we see that for any $0 \le s \le t \le T$ , we have

Indeed, we approximate the integral in the right-hand side again by a Riemann sum. Then for a partition $(t_i)_{i=1, \dotsc , k}$ of the interval $[s,t]$ , we have

where the last inequality follows from the definition of the norm on (homogeneous) tensors and the definition of the multiplication map m. We conclude analogously to the proof of the first statement.

Lemma 7.3 (Emery’s inequality)

Let $\mathbf {X}\in \mathscr {S}(({\mathbb {R}^d})^{\otimes n})$ , $\mathbf {Y}\in \mathscr {D}(({\mathbb {R}^d})^{\otimes l})$ and $\mathbf {Z}\in \mathscr {D}(({\mathbb {R}^d})^{\otimes m})$ ; then for $p, q\in [1,\infty )$ and $1/r = 1/p + 1/q$ , it holds that

where $c>0$ is a constant that only depends on d and m.

Proof. Let $\mathbf {X} = \mathbf {X}_0 + \mathbf {M} + \mathbf {A}$ be a semimartingale decomposition with $\mathbf {M}_0 = \mathbf {A}_0 = 0$ . Then it follows by definition of the $\mathscr {H}^{r}$ -norm and the above Lemma 7.2

where we have used the generalized Hölder inequality and the Kunita-Watanabe inequality (Lemma 7.1) to get to the last line. Taking the infimum of over all semimartingale decomposition $\mathbf {M} + \mathbf {A}$ yields the statement.

The following technical lemma will be used in the proof of both Theorem 3.2 and Theorem 4.1.

Lemma 7.4. Let $\mathbf {X}, \mathbf {Y}\in \mathscr {S}(\mathcal {T\,}^{N})$ , $N\in {\mathbb {N}_{\ge 1}}$ , $q\in [1,\infty )$ , and assume that there exists a constant $c>0$ such that

$$ \begin{align*} \Vert {\mathbf{Y}^{(n)}} \Vert _{\mathscr{H}^{qN/n}} \le c \sum_{\Vert\ell\Vert = n} \Vert {\mathbf{X}^{(l_1)}} \Vert _{\mathscr{H}^{qN/l_1}}\cdots \Vert {\mathbf{X}^{(l_j)}} \Vert _{\mathscr{H}^{qN/l_j}}, \quad n=1, \dotsc, N, \end{align*} $$

where the summation is over $\ell = (l_1, \dots , l_j)\in ({\mathbb {N}_{\ge 1}})^{j}$ , $j\in {\mathbb {N}_{\ge 1}}$ , $\Vert \ell \Vert = l_1 + \dots + l_j$ . Then there exists a constant $C>0$ , depending only on c and N, such that

$$ \begin{align*} \vert\mkern-2.5mu\vert\mkern-2.5mu\vert{\mathbf{Y}}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert_{\mathscr{H}^{q,N}} \le C \vert\mkern-2.5mu\vert\mkern-2.5mu\vert{\mathbf{X}}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert_{\mathscr{H}^{q,N}}. \end{align*} $$

Proof. Note that for any $n\in \{1, \dotsc , N\}$ , it holds that

$$ \begin{align*} \bigg(\sum_{\Vert\ell\Vert = n} \Vert {\mathbf{X}^{(l_1)}} \Vert _{\mathscr{H}^{qN/l_1}}\cdots \Vert {\mathbf{X}^{(l_j)}} \Vert _{\mathscr{H}^{qN/l_j}}\bigg)^{1/n} \le& \sum_{\Vert\ell\Vert = n} ( \Vert {\mathbf{X}^{(l_1)}} \Vert _{\mathscr{H}^{qN/l_1}}^{1/l_1})^{l_1/n}\cdots( \Vert {\mathbf{X}^{(l_j)}} \Vert _{\mathscr{H}^{qN/l_j}}^{1/l_j})^{l_j/n} \\ \le& \sum_{\Vert\ell\Vert = n} \left(\frac{l_1}{n} \Vert {\mathbf{X}^{(l_1)}} \Vert _{\mathscr{H}^{qN/l_1}}^{1/l_1} + \dots + \frac{l_1}{n} \Vert {\mathbf{X}^{(l_j)}} \Vert _{\mathscr{H}^{qN/l_j}}^{1/l_j} \right) \\ \le& c_n \sum^{n}_{i=1} \Vert {\mathbf{X}^{(i)}} \Vert _{\mathscr{H}^{qN/i}}^{1/i}, \end{align*} $$

where $c_n>0$ is a constant depending only on n, and the second inequality follows from Young’s inequality for products. Hence by the above estimate and the assumption we have

$$ \begin{align*} \vert\mkern-2.5mu\vert\mkern-2.5mu\vert{\mathbf{Y}}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert_{\mathscr{H}^{q,N}} = \sum_{n=1}^{N} \left\Vert {\mathbf{Y}^{(n)}} \right\Vert _{\mathscr{H}^{qN/n}}^{1/n} \le c^{1/n} c_n \sum_{n=1}^{N} \sum_{i=1}^{n} \left\Vert {\mathbf{X}^{(i)}} \right\Vert _{\mathscr{H}^{qN/i}}^{1/i} \le C \vert\mkern-2.5mu\vert\mkern-2.5mu\vert{\mathbf{X}}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert_{\mathscr{H}^{q,N}}, \end{align*} $$

where $C>0$ is a constant depending only on c and N.

7.1 Proof of Theorem 3.2

Proof. Denote by $\mathbf {S} = (\mathrm {Sig}(\mathbf {X})_{0,t})_{0\le t \le T}$ the signature process. We will first prove the upper inequality: that is, that there exists a constant $C>0$ depending only on d, N and q such that

(7.1) $$ \begin{align} \vert\mkern-2.5mu\vert\mkern-2.5mu\vert{\mathbf{S}}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert_{\mathscr{H}^{q,N}} \le C\vert\mkern-2.5mu\vert\mkern-2.5mu\vert{\mathbf{X}}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert_{\mathscr{H}^{q,N}}. \end{align} $$

According to Lemma 7.4, it is sufficient to show that for all $n \in \{1, \dots , N\}$ , it holds that

(7.2) $$ \begin{align} c_n \left\Vert {\mathbf{S}^{(n)}} \right\Vert _{\mathscr{H}^{qN/n}} \le \sum_{\Vert\ell\Vert = n} \left\Vert {\mathbf{X}^{(l_1)}} \right\Vert _{\mathscr{H}^{qN/l_1}}\cdots \left\Vert {\mathbf{X}^{(l_j)}} \right\Vert _{\mathscr{H}^{qN/l_j}} =: \rho_{\mathbf{X}}^{n}, \end{align} $$

where $c_n>0$ is a constant (depending only on q, d and n). In the inequality above (and the rest of the proof), the summation is over $\ell = (l_1, \dots , l_j) \in ({\mathbb {N}_{\ge 1}})^{j}$ , $j\in {\mathbb {N}_{\ge 1}}$ with ; in particular, the variable j is reserved for the length $|\ell |$ of the multiindex $\ell $ inside such summations. Note that implies that $l_i \le n$ for all $i = 1, \ldots , j$ . It can easily be seen from the definition of the quantities $(\rho _{\mathbf {X}}^{1}, \dots , \rho _{\mathbf {X}}^{N})$ in equation (7.2) that they satisfy the following (‘cascading’) property for all $n = 1, \dots , N$

(7.3) $$ \begin{align} \rho_{\mathbf{X}}^{n} \le \sum_{\Vert\ell\Vert = n} \rho_{\mathbf{X}}^{l_j} \left\Vert {\mathbf{X}^{(l_{j-1})}} \right\Vert _{\mathscr{H}^{qN/l_{j-1}}} \cdots \left\Vert {\mathbf{X}^{(l_1)}} \right\Vert _{\mathscr{H}^{qN/l_1}} \le c^{\prime\prime}\sum_{\Vert\ell\Vert = n} \rho_{\mathbf{X}}^{l_j}\cdots \rho_{\mathbf{X}}^{l_1} \le c^{\prime}\rho_{\mathbf{X}}^{n}, \end{align} $$

where $c^{\prime }$ and $c^{\prime \prime }$ are constants depending only on n. We are going to prove equation (7.2) inductively.

For $n=1$ , we have $\mathbf {S}^{(1)} = \mathbf {X}^{(1)} - \mathbf {X}^{(1)}_0 = \mathbf {X}^{(1)} \in \mathscr {H}^{qN}$ , and therefore the estimate follows immediately. Now, assume that equation (7.2) holds for all tensor levels up to some level $n-1$ with $n \in \{2, \dotsc , N\}$ . We will denote by $c^{\prime }, c^{\prime \prime }> 0$ constants that only depend on n, d and q. Then we have from equation (2.11)

For the first term in the above right-hand side, we have by Emery’s inequality (Lemma 7.3) the following estimate

where the last inequality follows from the induction claim and equation (7.3). Further, from the Kunita-Watanabe inequality (Lemma 7.1) and the generalized Hölder inequality, it follows that for all $l_1, l_2\in {\mathbb {N}_{\ge 1}}$ with $l_1 + l_2 \le n$ , we have

Then we have again by Emery’s inequality, the induction base and equation (7.3) that it holds

Finally, we have for the summation term

where the last inequality follows again by the Kunita-Watanabe inequality, the induction basis and equation (7.3). Thus we have shown that equation (7.2) holds for all $n\in \{1, \dots , N\}$ .

Now we will prove the lower inequality: that is, that there exists a constant $c>0$ depending only on d, N and q such that

(7.4) $$ \begin{align} c \vert\mkern-2.5mu\vert\mkern-2.5mu\vert{\mathbf{X}}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert_{\mathscr{H}^{q,N}} \le \vert\mkern-2.5mu\vert\mkern-2.5mu\vert{\mathbf{S}}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert_{\mathscr{H}^{q,N}}. \end{align} $$

Therefore define ${\bar {\mathbf {X}}}^{n} := (0, \mathbf {X}^{(1)}, \dots , \mathbf {X}^{(n)}, 0, \dots , 0)\in \mathscr {H}^{q,N}$ , and note that it holds

$$ \begin{align*} \vert\mkern-2.5mu\vert\mkern-2.5mu\vert{{\bar{\mathbf{X}}}^{1}}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert_{\mathscr{H}^{q,N}} = \Vert {\mathbf{X}^{(1)}} \Vert _{\mathscr{H}^{qN}} = \Vert {\mathbf{S}^{(1)}} \Vert _{\mathscr{H}^{qN}} \le \vert\mkern-2.5mu\vert\mkern-2.5mu\vert{\mathbf{S}}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert_{\mathscr{H}^{q,N}}. \end{align*} $$

Now assume that it holds

$$ \begin{align*}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert{{\bar{\mathbf{X}}}^{n-1}}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert_{\mathscr{H}^{q,N}} \le c^{\prime} \vert\mkern-2.5mu\vert\mkern-2.5mu\vert{\mathbf{S}}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert_{\mathscr{H}^{q,N}}\end{align*} $$

for some $n \in \{1, \dotsc ,N\}$ . It follows from the definition of the signature that

(7.5) $$ \begin{align} \mathbf{S}^{(n)}_t = \mathbf{X}^{(n)}_{0,t} + \mathrm{Sig}({\bar{\mathbf{X}}}^{n-1})^{(n)}_{0,t}, \end{align} $$

and further, we have from the upper bound in equation (7.1), which was already proven above, that

(7.6) $$ \begin{align} \vert\mkern-2.5mu\vert\mkern-2.5mu\vert{\mathrm{Sig}({\bar{\mathbf{X}}}^{n-1})_{0,\cdot}}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert_{\mathscr{H}^{q,N}} \le C \vert\mkern-2.5mu\vert\mkern-2.5mu\vert{{\bar{\mathbf{X}}}^{n-1}}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert_{\mathscr{H}^{q,N}} \le C c^{\prime} \vert\mkern-2.5mu\vert\mkern-2.5mu\vert{\mathbf{S}}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert_{\mathscr{H}^{q,N}}. \end{align} $$

Then we have

$$ \begin{align*} \vert\mkern-2.5mu\vert\mkern-2.5mu\vert{{\bar{\mathbf{X}}}^{n}}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert_{\mathscr{H}^{q,N}} &= \vert\mkern-2.5mu\vert\mkern-2.5mu\vert{{\bar{\mathbf{X}}}^{n-1}}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert_{\mathscr{H}^{q,N}} + \left\Vert {\mathbf{X}^{(n)}} \right\Vert _{\mathscr{H}^{qN/n}}^{1/n} \\ &\le c^{\prime}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert{\mathbf{S}}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert_{\mathscr{H}^{q,N}} + \left\Vert {\mathbf{S}^{(n)}} \right\Vert _{\mathscr{H}^{qN/n}}^{1/n} + \left\Vert {\mathrm{Sig}({\bar{\mathbf{X}}}^{n-1})^{(n)}_{0,\cdot}} \right\Vert _{\mathscr{H}^{qN/n}}^{1/n} \\ &\le c^{\prime}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert{\mathbf{S}}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert_{\mathscr{H}^{q,N}} + \vert\mkern-2.5mu\vert\mkern-2.5mu\vert{\mathbf{S}}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert_{\mathscr{H}^{q,N}}+ \vert\mkern-2.5mu\vert\mkern-2.5mu\vert{\mathrm{Sig}({\bar{\mathbf{X}}}^{n-1})_{0,\cdot}}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert_{\mathscr{H}^{q,N}} \\ &\le c^{\prime \prime}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert{\mathbf{S}}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert_{\mathscr{H}^{q,N}}, \end{align*} $$

where we have used equation (7.5) in the second line and equation (7.6) in the last line. Therefore, noting that ${\bar {\mathbf {X}}}^{N}=\mathbf {X}$ , the inequality in equation (7.4) follows by induction.

7.2 Proof of Theorem 4.1

We prepare the proof of Theorem 4.1 with a few more lemmas. Recall from Section 2.3.2 that $\exp _N\colon \mathcal {T}_0^{N} \to \mathcal {T}_1^{N}$ denotes the exponential map in the truncated tensor algebra $\mathcal {T\,}^{N}$ for some $N\in {\mathbb {N}_{\ge 1}}$ , which is given by the power series

where all concatenation products are understood in the truncated tensor algebra $\mathcal {T\,}^{N}_1$ , and the summation is over all $\ell = (l_1, \dots , l_k) \in ({\mathbb {N}_{\ge 1}})^{k}$ , $k\in {\mathbb {N}_{\ge 1}}$ with

and

. Also recall that $\log _N: \mathcal {T}_1 \to \mathcal {T}_0$ denotes the logarithm in the truncated tensor algebra $\mathcal {T\,}^{N}$ , which is analogously given by the corresponding $\log $ -power series.

Lemma 7.5. Let $N \in {\mathbb {N}_{\ge 1}}$ ; then we have the following directional derivatives of the truncated exponential map $\exp _N\colon \mathcal {T}_0^{N} \to \mathcal {T}_1^{N}$

$$ \begin{align*} (\partial_w \exp_N)(\mathbf{x}) &=G(\operatorname{\mathrm{ad}}{\mathbf{x}})(e_w)\exp_N({\mathbf{x}}) = \exp_N({\mathbf{x}}) G(-\operatorname{\mathrm{ad}}{\mathbf{x}})(e_w), \quad \mathbf{x}\in \mathcal{T}_0^{N}, \\ (\partial_w\partial_{w'} \exp_N)(\mathbf{x}) &= \widetilde{Q}(\operatorname{\mathrm{ad}}{\mathbf{x}})(e_w\otimes e_{w'})\exp_N(\mathbf{x}), \quad \mathbf{x}\in\mathcal{T}_0^{N}, \end{align*} $$

for all words $w, w' \in \mathcal {W}_d$ with $1\le |w|, |w'| \le N$ , where G is defined in equation (4.1) and for

$$ \begin{align*} \widetilde Q(\operatorname{\mathrm{ad}}{\mathbf{x}})(a\otimes b) &=\; G(\operatorname{\mathrm{ad}}{\mathbf{x}})(b) G(\operatorname{\mathrm{ad}}{\mathbf{x}})(a) + \int_0^1 \tau [G (\tau \operatorname{\mathrm{ad}}{\mathbf{x}})(a), e^{\tau \operatorname{\mathrm{ad}}{\mathbf{x}}}(b)] \,\mathrm{d} \tau\\ &=\; \sum_{n,m = 0}^{N} \frac{(\operatorname{\mathrm{ad}}{\mathbf{x}})^n(b)}{(n + 1) !} \frac{(\operatorname{\mathrm{ad}}{\mathbf{x}})^m(a)}{(m + 1) !} + \sum_{n,m = 0}^{N} \frac{[(\operatorname{\mathrm{ad}}{\mathbf{x}})^n(a), (\operatorname{\mathrm{ad}}{\mathbf{x}})^m(b)]_{}}{(n + m + 2) (n + 1)!\,m!}, \quad \mathbf{x},a,b \in \mathcal{T}_0^{N}. \end{align*} $$

Proof. For all $w\in \mathcal {W}_d$ with $0 \le |w| \le N$ and $\mathbf {x}\in \mathcal {T}_0^{N}$ , the expression $\exp _N(\mathbf {x})^{w}$ is a polynomial in the tensor components $(\mathbf {x}^{v})_{1 \le |v| \le |w|}$ . Therefore the map $\exp _N: \mathcal {T\,}^{N}_1 \to \mathcal {T}_0^{N}$ is smooth, and in particular the first and second-order partial derivatives exist in all directions. For a proof of the explicit form of the first-order partial derivatives, we refer to [Reference Friz and Victoir27, Lemma 7.23]. For the second-order derivatives, we follow the proof of [Reference Kamm, Pagliarani and Pascucci37, Lemma A.1]. Therefore, let $\mathbf {x}\in \mathcal {T}_0^{N}$ and $w, {w^{\prime }}$ arbitrary with $1\le |w|, |{w^{\prime }}| \le N$ . Then we have by the definition of the partial derivatives in $\mathcal {T}_0^{N}$ and the product rule

$$ \begin{align*} \partial_w(\partial_{w^{\prime}}\exp_N(\mathbf{x})) &= \frac{\mathrm d}{\mathrm dt}\Big(G(\operatorname{\mathrm{ad}}{\mathbf{x} + t e_w})(e_{{w^{\prime}}})\exp_N(\mathbf{x} + t e_w)\Big) \Big\vert_{t=0} \\ &= \frac{\mathrm d}{\mathrm dt}G(\operatorname{\mathrm{ad}}{\mathbf{x} + t e_w})(e_{{w^{\prime}}})\Big\vert_{t=0} \exp_N(\mathbf{x}) + G(\operatorname{\mathrm{ad}}{\mathbf{x}})(e_{{w^{\prime}}})G(\operatorname{\mathrm{ad}}{\mathbf{x}})(e_{w})\exp_N(\mathbf{x}). \end{align*} $$

From [Reference Friz and Victoir27, Lemma 7.22], it holds that $\exp _N(\operatorname {\mathrm {ad}}{\mathbf {x}})({y}) = \exp _N(\mathbf {x}) y \exp _N(-\mathbf {x})$ for all $\mathbf {x},y\in \mathcal {T}_0^{N}$ , and it follows further by representing G in integral form that

$$ \begin{align*} \frac{\mathrm d}{\mathrm dt}G(\operatorname{\mathrm{ad}}{\mathbf{x} + t e_w})(e_{{w^{\prime}}})\Big\vert_{t=0} &= \frac{\mathrm d}{\mathrm dt}\bigg(\int_0^{1}\exp_N(\tau \operatorname{\mathrm{ad}}{\mathbf{x}+t e_w})(e_{w^{\prime}})\,\mathrm{d}\tau\bigg)\bigg\vert_{t=0}\\ &=\int_0^{1}\frac{\mathrm d}{\mathrm dt}\Big(\exp_N(\tau(\mathbf{x}+t e_w))e_{w^{\prime}}\exp_N(-\tau(\mathbf{x}+t e_w))\Big)\Big\vert_{t=0}\,\mathrm{d}\tau\\ &=\int_0^{1} \tau G(\operatorname{\mathrm{ad}}{\tau \mathbf{x}})(e_w) \exp_N(\tau \mathbf{x}) e_{w^{\prime}} \exp_N(-\tau \mathbf{x})\,\mathrm{d}\tau \\&\quad- \int_0^{1} \tau \exp_N(\tau \mathbf{x}) e_{w^{\prime}} \exp_N(-\tau \mathbf{x}) G(\operatorname{\mathrm{ad}}{\tau \mathbf{x}})(e_w)\,\mathrm{d}\tau\\ &= \int_0^{1} \tau \left[ G(\tau\operatorname{\mathrm{ad}}{\mathbf{x}})(e_w), \exp_N(\tau\operatorname{\mathrm{ad}}{\mathbf{x}})(e_{w^{\prime}}) \right] \,\mathrm{d}\tau. \end{align*} $$

Then the proof is finished after noting that

$$ \begin{align*} \int_0^{1} \tau G(\tau\operatorname{\mathrm{ad}}{\mathbf{x}})(e_w)\exp(\tau\operatorname{\mathrm{ad}}{\mathbf{x}})(e_{w^{\prime}})\,\mathrm{d}\tau &= \int_0^{t}\tau\sum_{n,m=0}^{N}\frac{(\tau \operatorname{\mathrm{ad}}{\mathbf{x}})^{n}(e_w)}{(n+1)!}\frac{(\tau \operatorname{\mathrm{ad}}{\mathbf{x}})^{m}(e_{w^{\prime}})}{m!}\,\mathrm{d}\tau\\ &= \int_0^{t}\sum_{n,m=0}^{N}\frac{(\operatorname{\mathrm{ad}}{\mathbf{x}})^{n}(e_w)}{(n+1)!}\frac{(\operatorname{\mathrm{ad}}{\mathbf{x}})^{m}(e_{w^{\prime}})}{m!}\tau^{1+m+n}\,\mathrm{d}\tau \\ &= \sum_{n,m=0}^{N}\frac{(\operatorname{\mathrm{ad}}{\mathbf{x}})^{n}(e_w)(\operatorname{\mathrm{ad}}{\mathbf{x}})^{m}(e_{w^{\prime}})}{(n+1)!m!(n+m+2)}. \end{align*} $$

Note that the operator Q defined in equation (4.1) differs from the operator $\tilde {Q}$ defined above. However, we have the following:

Lemma 7.6. Let $N\in {\mathbb {N}_{\ge 1}}$ and $\mathbf {x}\in \mathcal {T}_0^{N}$ ; then it holds that

$$\begin{align*}\widetilde Q(\operatorname{\mathrm{ad}}{\mathbf{x}})(\mathbb{A}) = Q(\operatorname{\mathrm{ad}}{\mathbf{x}})(\mathbb{A}),\end{align*}$$

for all $\mathbb {A}\in \mathcal {T}_0^{N}\otimes \mathcal {T}_0^{N}$ with symmetric coefficients $\mathbb {A}^{w_1, w_2} = \mathbb {A}^{w_2, w_1}$ for all $w_1, w_2 \in \mathcal {W}_d$ .

Proof. Let $N\in {\mathbb {N}_{\ge 1}}$ and $\mathbf {x}\in \mathcal {T}_0^N$ be arbitrary. Then from the bilinearity of $\tilde {Q}(\operatorname {\mathrm {ad}}{\mathbf {x}})$ and the symmetry of $\mathbb {A}$ , we have, with summation over all words $w_1, w_2$ with $1 \le |w_1|,|w_2|\le N$ ,

$$ \begin{align*} \widetilde Q(\operatorname{\mathrm{ad}}{\mathbf{x}})(\mathbb{A}) &=\sum_{w_1, w_2} \mathbb{A}^{w_1, w_2}\sum_{n,m = 0}^{N}\bigg( \frac{(\operatorname{\mathrm{ad}}{\mathbf{x}})^n(e_{w_2})}{(n + 1)!} \frac{(\operatorname{\mathrm{ad}}{\mathbf{x}})^m(e_{w_1})}{(m + 1)!} + \frac{[(\operatorname{\mathrm{ad}}{\mathbf{x}})^n(e_{w_1}), (\operatorname{\mathrm{ad}}{\mathbf{x}})^m(e_{w_2})]}{(n + m + 2) (n + 1)!\,m!} \bigg)\\ &=\sum_{w_1, w_2} \mathbb{A}^{w_1, w_2}\bigg( \sum_{n,m = 0}^{N} \frac{(\operatorname{\mathrm{ad}}{\mathbf{x}})^n(e_{w_2})}{(n + 1)!} \frac{(\operatorname{\mathrm{ad}}{\mathbf{x}})^m(e_{w_1})}{(m + 1) !} \\&\quad+ \frac{(\operatorname{\mathrm{ad}}{\mathbf{x}})^n(e_{w_1})(\operatorname{\mathrm{ad}}{\mathbf{x}})^m(e_{w_2}) - (\operatorname{\mathrm{ad}}{\mathbf{x}})^m(e_{w_2})(\operatorname{\mathrm{ad}}{\mathbf{x}})^n(e_{w_1})}{(n + m + 2) (n + 1)!\,m!} \bigg) \\ &=\sum_{w_1, w_2} \mathbb{A}^{w_1, w_2}\bigg( \sum_{n,m = 0}^{N} \frac{(\operatorname{\mathrm{ad}}{\mathbf{x}})^n(e_{w_1})}{(n + 1)!} \frac{(\operatorname{\mathrm{ad}}{\mathbf{x}})^m(e_{w_2})}{(m + 1) !} \\&\quad+ \frac{(\operatorname{\mathrm{ad}}{\mathbf{x}})^n(e_{w_1})(\operatorname{\mathrm{ad}}{\mathbf{x}})^m(e_{w_2})}{(n + m + 2) (n + 1)!\,m!} -\frac{(\operatorname{\mathrm{ad}}{\mathbf{x}})^n(e_{w_1})(\operatorname{\mathrm{ad}}{\mathbf{x}})^m(e_{w_2})}{(m + n + 2) (m + 1)!\,n!} \bigg) \\ &= \sum_{w_1, w_2} \mathbb{A}^{w_1, w_2} \sum_{n,m = 0}^{N} (2m+2)\frac{(\operatorname{\mathrm{ad}}{\mathbf{x}})^n(e_{w_1}) (\operatorname{\mathrm{ad}}{\mathbf{x}})^m(e_{w_2})}{(n + 1) ! (m+1) ! (n + m + 2)}(\mathbb{A}) = Q(\operatorname{\mathrm{ad}}{\mathbf{x}})(\mathbb{A}). \end{align*} $$

The following two applications of Itô’s formula in the non-commutative setting will be a key ingredient in the proof of Theorem 4.1.

Lemma 7.7 (Itô’s product rule)

Let $\mathbf {X}, \mathbf {Y} \in \mathscr {S}(\mathcal {T}_1^N)$ for some $N\in {\mathbb {N}_{\ge 1}}$ ; then it holds that

Proof. The statement is an immediate consequence of the one-dimensional Itô’s product rule for càdlàg semimartingales (e.g., [Reference Protter57, Ch. II, Corollary 2]) and the definition of the outer bracket and the multiplication map in Section 2.4.

Lemma 7.8. Let $\mathbf {X} \in \mathscr {S}(\mathcal {T}_0^{N})$ for some $N\in {\mathbb {N}_{\ge 1}}$ ; then it holds that

for all $0 \le t \le T$ .

Proof. As discussed in the proof of Lemma 7.5, it is clear that the map $\exp _N\colon \mathcal {T}_0^{N} \to \mathcal {T}_1^{N}$ is smooth. Further, $\mathcal {T}_0^{N}$ is isomorphic to $\mathbb {R}^{D}$ with $D = d + \dotsb + d^{N}$ , and we can apply the multidimensional Itô’s formula for càdlàg semimartingales (e.g., [Reference Protter57, Ch. II, Theorem 33]) to obtain

$$ \begin{align*} \exp_N({\mathbf{X}_t}) - \exp_N({\mathbf{X}_0}) &= \sum_{1 \le |w|\le N}\int_{(0,t]} (\partial_w \exp_N)(\mathbf{X}_{u-})\,\mathrm{d} \mathbf{X}^{w}_u \\&\quad+ \frac{1}{2}\sum_{1 \le |w_1|,|w_2|\le N}\int_0^{t}(\partial_{w_1}\partial_{w_2}\exp_N)(\mathbf{X}_{u-})\,\mathrm{d}\langle \mathbf{X}^{w_1c}, \mathbf{X}^{w_2c} \rangle_u \\&\quad+ \sum_{0<u\le t}\bigg(\exp_N(\mathbf{X}_u) - \exp_N(\mathbf{X}_{u-}) - \sum_{1 \le |w|\le N} (\partial_w \exp_N)(\mathbf{X}_{u-})(\Delta \mathbf{X}^{w}_u) \bigg) \end{align*} $$

for all $0 \le t \le T$ . From Lemma 7.5 we then have for the first integral term

$$ \begin{align*} \sum_{1 \le |w|\le N}\int_{(0,t]} (\partial_w \exp_N)(\mathbf{X}_{u-})\,\mathrm{d} \mathbf{X}^{w}_u &= \sum_{1 \le |w|\le N}\int_{(0,t]} G(\operatorname{\mathrm{ad}}{\mathbf{X}_{u-}})(e_w)\exp_N(\mathbf{X}_{u-})\,\mathrm{d} \mathbf{X}^{w}_u \\ &= \int_{(0,t]} G(\operatorname{\mathrm{ad}}{\mathbf{X}_{u-}})(\mathrm{d}\mathbf{X}_u)\exp_N(\mathbf{X}_{u-}), \end{align*} $$

and analogously

$$ \begin{align*} \sum_{1 \le |w|\le N} (\partial_w \exp_N)(\mathbf{X}_{u-})(\Delta \mathbf{X}^{w}_u) &= \sum_{1 \le |w|\le N} G(\operatorname{\mathrm{ad}}{\mathbf{X}_{u-}})(\Delta\mathbf{X}_u)\exp_N(\mathbf{X}_{u-}). \end{align*} $$

Moreover, from Lemma 7.8 and the definition of the outer bracket in Section 2.4,

Finally, the outer bracket

is symmetric in the sense of Lemma 7.6, and therefore we can replace $\tilde {Q}$ with Q in the above identity.

Lemma 7.9. Let $\mathbf {X} \in \mathscr {S}(\mathcal {T}_0)$ , and let $\mathbf {A}\in \mathscr {V}(\mathcal {T}_0)$ . For all $k \in {\mathbb {N}_{\ge 1}}$ and $\ell = (l_1, \dotsc , l_k)\in ({\mathbb {N}_{\ge 1}})^k$ , it holds that

for all $0 \le t \le T$ . Furthermore, let $(\mathbb {A}_t)_{0 \le t \le T}$ be a process taking values in $\mathcal {T}_0\otimes \mathcal {T}_0$ such that $\mathbb {A}^{w_1,w_2}\in \mathscr {V}$ for all $w_1, w_2 \in \mathcal {W}_d$ . Then it holds that for all $0 \le t \le T$

Proof. Recall from equation (2.3) that we expand iterated adjoined operations into a sum of left- and right tensor multiplications and apply Lemma 7.2. Note again that for homogeneous tensors $\mathbf {x}$ and $\mathbf {y}$ , it holds that . Therefore the statement follows by counting the terms in the expansion.

Lemma 7.10. Let $N\in {\mathbb {N}_{\ge 1}}$ , $\Delta \mathbf {x},\mathbf {y},\Delta \mathbf {y} \in \mathcal {T}_0^{N}$ , and define the function

$$ \begin{align*} f\colon[0,1]\times[0,1] \to \mathcal{T\,}^N_1, \quad (s,t) \mapsto f(s,t) = \exp_N({s \Delta \mathbf{x}})\exp_N({\mathbf{y} + t\Delta \mathbf{y}})\exp_N({-\mathbf{y}}). \end{align*} $$

Then $f(0,0) = 1$ , and the first-order partial derivatives of f at $(s,t)=(0,0)$ are given by

$$ \begin{align*} (\partial_s f)\vert_{(s,t)=(0,0)} = \Delta \mathbf{x}, \quad (\partial_t f)\vert_{(s,t)=(0,0)} = G(\operatorname{\mathrm{ad}}{\mathbf{y}})(\Delta \mathbf{y}). \end{align*} $$

Further, the following explicit bound for the second-order partial derivatives holds

for all $n \in \{2, \dotsc , N\}$ , where $c_n> 0$ is a constant depending only on n, $\ell =(l_1,\dotsc ,l_k)\in ({\mathbb {N}_{\ge 1}})^{k}$ with $|\ell | = k$ and $z_l := \max \{|\Delta \mathbf {x}^{(l)}|, |\mathbf {y}^{(l)}|, |(\mathbf {y}+\Delta \mathbf {y})^{(l)}|\}$ for all $l\in \{1, \dotsc , N-2\}$ .

Proof. The tensor components of $f(s,t)$ are polynomial in s and t, and it follows that f is smooth. From Lemma 7.5, we have that the first-order partial derivatives of f are given by

$$ \begin{align*} (\partial_s f)\vert_{(s,t)} &= G(\operatorname{\mathrm{ad}}{s\Delta \mathbf{x}})(\Delta \mathbf{x})\exp_N({s \Delta \mathbf{x}})\exp_N({\mathbf{y} + t\Delta \mathbf{y}})\exp_N({-\mathbf{y}}), \\ &= \Delta \mathbf{x}\exp_N({s \Delta \mathbf{x}})\exp_N({\mathbf{y} + t\Delta \mathbf{y}})\exp_N({-\mathbf{y}}), \\[.5em] (\partial_t f)\vert_{(s,t)} &= \exp_N({s \Delta \mathbf{x}})G(\operatorname{\mathrm{ad}}{\mathbf{y} + t\Delta \mathbf{y}})(\Delta \mathbf{y})\exp_N({\mathbf{y} + t\Delta \mathbf{y}})\exp_N({-\mathbf{y}}). \end{align*} $$

Evaluating at $s=t=0$ , we obtain the first result. Further, by Lemma 7.5 the second-order derivatives are given by

$$ \begin{align*} (\partial_{ss} f)\vert_{(s,t)} &= (\Delta \mathbf{x})^{2}\exp_N({s \Delta \mathbf{x}})\exp_N({\mathbf{y} + t\Delta \mathbf{y}})\exp_N({-\mathbf{y}}), \\[.5em] (\partial_{st} f)\vert_{(s,t)} &= \Delta \mathbf{x}\exp_N({s \Delta \mathbf{x}})G(\operatorname{\mathrm{ad}}{\mathbf{y} + t\Delta \mathbf{y}})(\Delta \mathbf{y})\exp_N({\mathbf{y} + t\Delta \mathbf{y}})\exp_N({-\mathbf{y}}), \\[.5em] (\partial_{tt} f)\vert_{(s,t)} &= \exp_N({s\Delta \mathbf{x}})Q(\operatorname{\mathrm{ad}}{\mathbf{y} + t \Delta \mathbf{y}})((\Delta \mathbf{y})^{\otimes 2})\exp_N({\mathbf{y} + t \Delta \mathbf{y}})\exp_N({-\mathbf{y}}). \end{align*} $$

Now let $n \in \{2, \dotsc , N\}$ . Then it follows from above and Lemma 7.9 that we can bound the second-order derivatives as follows

and

where $c_n^{\prime }, c_n^{\prime \prime }, c_n^{\prime \prime \prime }>0$ are constants depending only on n, and the second statement of the lemma follows.

Lemma 7.11. For all $N \in {\mathbb {N}_{\ge 1}}$ and all $\mathbf {x} \in \mathcal {T}_0^{N}$ , it holds that

$$ \begin{align*} H(\operatorname{\mathrm{ad}} \mathbf{x}) \circ G(\operatorname{\mathrm{ad}} \mathbf{x}) = \mathrm{Id}, \end{align*} $$

where G and H are defined in equation (4.1). Hence, the identity also holds for all $x\in \mathcal {T}_0$ .

Proof. Recall the exponent generating function of the Bernoulli numbers, for z near $0$ ,

$$ \begin{align*} \quad H(z) = \sum_{n=0}^{\infty}\frac{B_k}{k!} z^k = \frac{z}{e^z - 1}, \quad G(z) = \sum_{k=0}^{\infty} \frac{1}{k+1!} z^k = \frac{e^z-1}{z}. \end{align*} $$

Therefore $H(z)G(z) \equiv 1$ identically for all z in a neighbourhood of zero. Repeated differentiation in z then yields the following property of the Bernoulli numbers

$$ \begin{align*} \sum_{k=0}^{n}\frac{B_k}{k!}\frac{1}{(n-k+1)!} = 0, \quad n\in{\mathbb{N}_{\ge1}}. \end{align*} $$

Hence the statement of the lemma follows by projecting $H(\operatorname {\mathrm {ad}} \mathbf {x}) \circ G(\operatorname {\mathrm {ad}} \mathbf {x})$ to each tensor level.

We are now ready to give the

Proof of Theorem 4.1

Note that $\pi _{(0,N)}\mathrm {Sig}(\mathbf {X}) = \mathrm {Sig}(\mathbf {X}^{(0,N)})$ for any $\mathbf {X}\in \mathscr {S}(\mathcal {T}_0)$ and all truncation levels $N\in {\mathbb {N}_{\ge 1}}$ . Therefore it suffices to show that the identities in equations (4.2) and (4.3) hold for the signature cumulant of an arbitrary $\mathcal {T\,}^{N}_0$ -valued semimartingale $\mathbf {X}\in \mathscr {S}(\mathcal {T}_0^{N})$ that satisfies the integrability condition $\vert \mkern -2.5mu\vert \mkern -2.5mu\vert {\mathbf {X}}\vert \mkern -2.5mu\vert \mkern -2.5mu\vert _{\mathscr {H}^{1,N}} < \infty $ . Recall from Theorem 3.2 that this implies that $\vert \mkern -2.5mu\vert \mkern -2.5mu\vert {\mathrm {Sig}(\mathbf {X})}\vert \mkern -2.5mu\vert \mkern -2.5mu\vert _{\mathscr {H}^{1,N}} < \infty $ , and thus the truncated signature cumulant ${\boldsymbol {\kappa }} = (\log _N\mathbb {E}_t(\mathrm {Sig}(\mathbf {X})_{t,T}))_{0 \le t \le T} \in \mathscr {S}(\mathcal {T}_0^N)$ is well defined. Throughout the proof, we will use the symbol $\lesssim $ to denote an inequality that holds up to a multiplication of the right-hand side by a constant that may depend only on d and N.

Recall the definition of the signature in the Marcus sense from Section 2.6. Projecting equation (2.11) to the truncated tensor algebra, we see that the signature process ${S} = (\mathrm {Sig}(\mathbf {X})_{0,t})_{0 \le t \le T} \in \mathscr {S}(\mathcal {T}^N_1)$ satisfies the integral equation

(7.7) $$ \begin{align} {S}_{t} &= 1 + \int_{(0,t]} {S}_{u-} \mathrm{d}\mathbf{X}_u + \frac{1}{2} \int_0^{t} {S}_{u} \mathrm{d}\langle \mathbf{X}^{c} \rangle_u + \sum_{0<u\le t} {S}_{u-}\big(\exp_N({\Delta \mathbf{X}_u}) - 1 - \Delta \mathbf{X}_u\big), \end{align} $$

for $0 \le t \le T$ . Then by Chen’s relation in equation (2.12), we have

$$ \begin{align*} \mathbb{E}_t ({S}_{T} \exp_N({\boldsymbol{\kappa}}_T)) = \mathbb{E}_t (\mathrm{Sig}(\mathbf{X})_{0,T}) = {S}_{t} \mathbb{E}_t(\mathrm{Sig}(\mathbf{X})_{t,T}) = {S}_{t} \exp_N({{\boldsymbol{\kappa}}_t}), \quad 0 \le t \le T. \end{align*} $$

It then follows from the above identity and the integrability of ${S}_{T}$ that the process ${S}\exp _N({{\boldsymbol {\kappa }}})$ is a $\mathcal {T\,}^{N}_1$ -valued martingale in the sense of Section 2.4. On the other hand, we have by applying Itô’s product rule in Lemma 7.7

Further, by applying the Itô’s rule for the exponential map from Lemma 7.8 to the $\mathcal {T}_0^{N}$ -valued semimartingale ${\boldsymbol {\kappa }}$ and using equation (7.7), we have the following form of the continuous covariation term

and for the jump covariation term

$$ \begin{align*} \sum_{0<u\le t}\Delta{S}_{u}\,\Delta\exp_N({{\boldsymbol{\kappa}}_u}) = \sum_{0<u\le t}{S}_{u-}\big(\exp_N({\Delta \mathbf{X}_u}) -1 \big)\big(\exp_N({{\boldsymbol{\kappa}}_{u}})\exp_N({{-} {\boldsymbol{\kappa}}_{u-}}) - 1\big)\exp_N({{\boldsymbol{\kappa}}_{u-}}). \end{align*} $$

From the above identities and again with Lemma 7.8 and in equation (7.7), we have

(7.8) $$ \begin{align} {S}_{t} \exp_N({{\boldsymbol{\kappa}}_t}) - 1 = \int_{(0,t]} {S}_{u-} \mathrm{d}(\mathbf{L}_u + {\boldsymbol{\kappa}}_u)\exp_N({{\boldsymbol{\kappa}}_{u-}}), \quad 0 \le t \le T, \end{align} $$

where $\mathbf {L} \in \mathscr {S}(\mathcal {T}_0^N)$ is defined by

with $\mathbf {Y}\in \mathscr {S}(\mathcal {T}_0^{N})$ , $\mathbf {V}, \mathbf {C}, \mathbf {J} \in \mathscr {V}(\mathcal {T}_0^{N})$ given by

(7.9)

Note that we have explicitly separated the identity operator $\mathrm {Id}$ from G in the above definition of $\mathbf {L}$ .

Since the left-hand side in equation (7.8) is a martingale, and since ${S}_t$ and $\exp ({\boldsymbol {\kappa }}_t)$ have the multiplicative left- respectively right-inverse ${S}_t^{-1}$ and $\exp (-{\boldsymbol {\kappa }}_t)$ , respectively, for all $0\le t \le T$ , it follows that $\mathbf {L} + {\boldsymbol {\kappa }}$ is a $\mathcal {T}_0^N$ -valued local martingale. Let $(\tau _k)_{k\ge 1}$ be a sequence of increasing stopping times with $\tau _k \to T$ a.s. for $k \to \infty $ , such that the stopped process $(\mathbf {L}_{t \wedge \tau _k} + {\boldsymbol {\kappa }}_{t\wedge \tau _k})_{0\le t\le T}$ is a true martingale. Then we have

(7.10) $$ \begin{align} {\boldsymbol{\kappa}}_{t \wedge \tau_k, T \wedge \tau_k} = \mathbb{E}_{t}\big\{ \mathbf{L}_{t\wedge\tau_k,T\wedge\tau_k}\big\},\quad 0 \le t \le T, \; k\in{\mathbb{N}_{\ge1}}. \end{align} $$

The estimate in equation (7.11) below shows that $\mathbf {L}$ has sufficient integrability in order to use the dominated convergence theorem to pass to the $k \to \infty $ limit in the above identity equation (7.10), which yields precisely the identity equation (4.2) (recall that ${\boldsymbol {\kappa }}_T = 0$ ) and hence concludes the first part of the proof of Theorem 4.1.

Claim 7.12. It holds that

(7.11) $$ \begin{align} \vert\mkern-2.5mu\vert\mkern-2.5mu\vert{\mathbf{L}}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert_{\mathscr{H}^{1,N}} \lesssim \vert\mkern-2.5mu\vert\mkern-2.5mu\vert{\mathbf{X}}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert_{\mathscr{H}^{1,N}}. \end{align} $$

Proof of Claim 7.12

According to Lemma 7.4, it suffices to show that for all $n\in \{1, \dotsc , N\}$ , it holds that

$$ \begin{align*} \left\Vert {\mathbf{L}^{(n)}} \right\Vert _{\mathscr{H}^{qN/n}} \lesssim \sum_{\Vert\ell\Vert = n} \left\Vert {\mathbf{X}^{(l_1)}} \right\Vert _{\mathscr{H}^{qN/l_1}}\cdots \left\Vert {\mathbf{X}^{(l_j)}} \right\Vert _{\mathscr{H}^{qN/l_j}} =: \rho^{n}_{\mathbf{X}}, \end{align*} $$

where the summation above (and in the rest of the proof) is over multi-indices $\ell = (l_1, \dotsc , l_j)\in ({\mathbb {N}_{\ge 1}})^j$ , with $|\ell |=j$ and $\Vert \ell \Vert = l_1 + \dotsb + l_j$ . For $\mathbf {M}\in \mathscr {M}_{\mathrm {loc}}(\mathcal {T}_0^{N})$ and $\mathbf {A}\in \mathscr {V}(\mathcal {T}_0^{N})$ define

$$ \begin{align*} {\rho}_{\mathbf{M}, \mathbf{A}}^{n} := \sum_{\Vert\ell\Vert = n} \zeta^{l_1/N}(\mathbf{M}^{(l_1)}, \mathbf{A}^{(l_1)}) \cdots \zeta^{l_j/N}(\mathbf{M}^{(l_j)}, \mathbf{A}^{(l_j)}) < \infty, \quad n = 1, \dotsc, N, \end{align*} $$

where for any $q\in [1,\infty )$

Note that it holds

(7.12) $$ \begin{align} {\rho}_{\mathbf{M}, \mathbf{A}}^{n} \;\le\; \sum_{\Vert\ell\Vert = n}\left({{\rho}_{\mathbf{M}, \mathbf{A}}}^{l_1} \cdots {{\rho}_{\mathbf{M}, \mathbf{A}}}^{l_j}\right) \;\lesssim\; {\rho}_{\mathbf{M}, \mathbf{A}}^{n}. \end{align} $$

Furthermore, it follows from the definition of the $\mathscr {H}^{q}$ -norm that

(7.13) $$ \begin{align} \rho_{\mathbf{X}}^{n} = \inf_{\mathbf{X} = \mathbf{M} + \mathbf{A}} {\rho}_{\mathbf{M}, \mathbf{A}}^{n}, \end{align} $$

where the infimum is taken over all semimartingale decomposition of $\mathbf {X}$ .

Now fix $\mathbf {M}\in \mathscr {M}_{\mathrm {loc}}(\mathcal {T}_0^{N})$ and $\mathbf {A}\in \mathscr {V}(\mathcal {T}_0^{N})$ arbitrarily, such that $\mathbf {X} = \mathbf {M} + \mathbf {A}$ and ${{\rho }_{\mathbf {M}, \mathbf {A}}}^{n}<\infty $ for all $n\in \{1, \dotsc , N\}$ (such a decomposition always exists since $\mathbf {X} \in \mathscr {H}^{1,N}$ ). In particular, it holds that $\mathbf {M}$ is a true martingale. Next we will prove the following.

Claim 7.13. For all $n \in \{1, \dots , N\}$ , it holds that

(7.14) $$ \begin{align} \Vert {\mathbf{L}^{(n)}} \Vert _{\mathscr{H}^{N/n}}\lesssim {\rho}_{\mathbf{M}, \mathbf{A}}^{n}, \end{align} $$

and further, there exists a semimartingale decomposition $\boldsymbol {\kappa }^{(n)} = \boldsymbol {\kappa }_0^{(n)} + \mathbf {m}^{(n)} + \mathbf {a}^{(n)}$ , with $\mathbf {m}^{(n)}\in \mathscr {M}(({\mathbb {R}^d})^{\otimes n})$ and $\mathbf {a}^{(n)}\in \mathscr {V}(({\mathbb {R}^d})^{\otimes n})$ such that $ \left \Vert {\mathbf {a}^{(n)}} \right \Vert _{\mathscr {V}^{{N/n}}} < {{\rho }_{\mathbf {M}, \mathbf {A}}}^{n}$ ; and in case $n \le N-1$ , it holds that

(7.15) $$ \begin{align} \zeta^{n/N}(\mathbf{m}^{(n)}, \mathbf{a}^{(n)}) \lesssim {\rho}_{\mathbf{M}, \mathbf{A}}^{n}. \end{align} $$

Proof of Claim 7.13

We are going to prove inductively over $n \in \{1, \dots , N\}$ . Let $n=1$ , and note that we have $\mathbf {L}^{(1)} = \mathbf {X}^{(1)}$ , and therefore

$$ \begin{align*} \left\Vert {\mathbf{L}^{(1)}} \right\Vert _{\mathscr{H}^{N}} \le \zeta^{N}(\mathbf{M}^{(1)}, \mathbf{A}^{(1)}) = {{\rho}_{\mathbf{M}, \mathbf{A}}}^{1}. \end{align*} $$

Using that $\mathbf {M}^{(1)}$ is a martingale, we can identify a semimartingale decomposition of $\boldsymbol {\kappa }^{(1)}$ by

$$ \begin{align*} \mathbf{m}^{(1)}_t := \mathbb{E}_t\left(\mathbf{A}^{(1)}_T\right) - \mathbb{E}\left(\mathbf{A}^{(1)}_T\right), \quad \mathbf{a}^{(1)}_t := -\mathbf{A}_t, \quad 0 \le t \le T. \end{align*} $$

In case $N\ge 2$ , we further have from the BDG-inequality and Doob’s maximal inequality that

and this shows the second part of the induction claim.

Now assume that $N \ge 2$ and that the induction claim in equations (7.14) and (7.15) holds true up to level $n-1$ for some $n \in \{2, \dotsc , N\}$ . Note that $\mathbf {L}^{(n)}$ has the following decomposition:

(7.16) $$ \begin{align} \mathbf{L}^{(n)}=\;& \left\{\mathbf{M}^{(n)} + \mathbf{N}^{(n)}\right\} + \left\{\mathbf{A}^{(n)} + \frac{1}{2} \left\langle \mathbf{X}^{c} \right\rangle^{(n)} + \mathbf{B}^{(n)}+ \mathbf{V}^{(n)} + \mathbf{C}^{(n)} + \mathbf{J}^{(n)}\right\}, \end{align} $$

where $\mathbf {N}^{(n)} \in \mathscr {M}_{\mathrm {loc}}((\mathbb {R}^{d})^{\otimes n})$ and $\mathbf {B}^{(n)} \in \mathscr {V}((\mathbb {R}^{d})^{\otimes n})$ is a decomposition of $\mathbf {Y}^{(n)} \in \mathscr {S}((\mathbb {R}^{d})^{\otimes n})$ defined by

$$ \begin{align*} \mathbf{N}^{(n)} &= \pi_{n}\int_{(0, t]} (G-\mathrm{Id})(\operatorname{\mathrm{ad}}{{\boldsymbol{\kappa}}_{u-}})(\mathrm{d} \overline{\mathbf{m}}_u), \\ \mathbf{B}^{(n)} &= \pi_{n}\int_{(0, t]}(G-\mathrm{Id})(\operatorname{\mathrm{ad}}{{\boldsymbol{\kappa}}_{u-}})(\mathrm{d} \overline{\mathbf{a}}_u) \end{align*} $$

with $\overline {\mathbf {a}} = \pi _{(0,N)}(\mathbf {a}^{(1)} + \dots + \mathbf {a}^{(n-1)})\in \mathscr {V}(\mathcal {T}_0^{N})$ and $\overline {\mathbf {m}} = \pi _{(0,N)}(\mathbf {m}^{(1)} + \dotsb + \mathbf {m}^{(n-1)})\in \mathscr {M}(\mathcal {T}_0^{N})$ .

From Lemma 7.1 and the generalized Hölder inequality, we have

(7.17)

It follows from equation (7.15) and the induction basis that for all $l \in \{1, \dotsc , n-1\}$ , it holds that

(7.18) $$ \begin{align} \left\Vert {\boldsymbol{\kappa}^{(l)}} \right\Vert _{\mathscr{H}^{N/l}} = \left\Vert {\boldsymbol{\kappa}^{(l)} - \boldsymbol{\kappa}^{(l)}_0} \right\Vert _{\mathscr{H}^{N/l}} \lesssim {\rho}_{\mathbf{M}, \mathbf{A}}^{l}, \end{align} $$

and further that

(7.19) $$ \begin{align} \kappa^{l*}_{T} := \sup_{0\le t \le T} |\boldsymbol{\kappa}^{(l)}_t |, \quad \left\Vert {\boldsymbol{\kappa}^{(l)}} \right\Vert _{\mathscr{S}^{N/l}} = \left\Vert {\kappa_T^{l\ast}} \right\Vert _{\mathcal{L}^{N/l}} \le |\boldsymbol{\kappa}^{(l)}_0| + \left\Vert {\boldsymbol{\kappa}^{(l)}-\boldsymbol{\kappa}^{(l)}_0} \right\Vert _{\mathscr{S}^{N/l}} \lesssim {\rho}_{\mathbf{M}, \mathbf{A}}^{l}. \end{align} $$

From the definition and linearity of $Q(\operatorname {\mathrm {ad}}\mathbf {x})$ ( $\mathbf {x}\in \mathcal {T}_0^N$ ), Lemmas 7.1 and 7.9, we have the following estimate:

It then follows from the generalized Hölder inequality

(7.20)

where the second inequality follows from the induction basis and the estimates in equations (7.19) and (7.15), noting that $\Vert \ell \Vert =n$ and

implies that $l_1, \dotsc , l_j \le n-1$ , and the third inequality follows from equation (7.12). From similar arguments, we see that the following two estimates also hold:

(7.21)

and

(7.22)

For the local martingale $\mathbf {N}^{(n)}$ , we use Lemmas 7.1 and 7.9 to estimate its quadratic variation as follows:

Then it follows once again by the generalized Hölder inequality and the induction basis that

(7.23)

Finally, let us treat the term $\mathbf {J}^{(n)}$ . First define

And from equation (7.19), it follows that for all $l=\{1, \dotsc , n-1\}$ , it holds that

(7.24) $$ \begin{align} \left\Vert {Z^{l}} \right\Vert _{\mathcal{L}^{N/l}} &\le 2 \left\Vert {\mathbf{X}^{(l)}} \right\Vert _{\mathscr{S}^{N/l}} + \left\Vert {\boldsymbol{\kappa}^{(l)}_T} \right\Vert _{\mathscr{S}^{N/l}} \le 2 \left\Vert {\mathbf{X}^{(l)}} \right\Vert _{\mathscr{H}^{N/l}} + \left\Vert {\boldsymbol{\kappa}^{(l)}_T} \right\Vert _{\mathscr{S}^{N/l}} \lesssim {\rho}_{\mathbf{M}, \mathbf{A}}^{l, N}. \end{align} $$

Then by Taylor’s theorem and Lemma 7.10, we have

Hence it follows by the generalized Hölder inequality that

(7.25)

where the last estimate follows from equations (7.24), (7.18) and (7.12).

Summarizing the estimates in equations (7.20), (7.21), (7.22), (7.23) and (7.25), we have

(7.26)

which proves the first part of the induction claim in equation (7.14). Then it follows from dominated convergence theorem that projecting equation (7.10) to the tensor level n and passing to the $k\to \infty $ limit yields

$$ \begin{align*} \boldsymbol{\kappa}_t^{(n)} = \mathbb{E}_t\left(\mathbf{L}^{(n)}_{T,t}\right), \quad 0 \le t \le T. \end{align*} $$

Since $\mathbf {M}^{(n)}$ and $\mathbf {N}^{(n)}$ are true martingales (for the latter, this follows from equation (7.23)), we are able to identify a decomposition $\boldsymbol {\kappa }^{(n)} = \boldsymbol {\kappa }^{(n)}_0 + \mathbf {m}^{(n)} + \mathbf {a}^{(n)}$ by

$$ \begin{align*} \mathbf{a}^{(n)} &= - \left\{ \mathbf{A}^{(n)} + \frac{1}{2}\left\langle \mathbf{X}^{c} \right\rangle^{(n)} + \mathbf{B}^{(n)} + \mathbf{V}^{(n)} + \mathbf{C}^{(n)} + \mathbf{J}^{(n)}\right\}\\ \mathbf{m}^{(n)}_t &= \mathbb{E}\left(\mathbf{a}^{(n)}_T\right) - \mathbb{E}_{t}\left(\mathbf{a}^{(n)}_T\right), \quad 0 \le t \le T. \end{align*} $$

Again from the estimates in equations (7.20), (7.21), (7.23) and (7.25), it follows that

$$ \begin{align*} \left\Vert {\mathbf{a}^{(n)}} \right\Vert _{\mathscr{V}^{N/n}} \lesssim {\rho}_{\mathbf{M}, \mathbf{A}}^{n}, \end{align*} $$

and in case $n \le N - 1$ , it follows from the BDG-inequality and Doob’s maximal inequality that

which proves the second part of the induction claim in equation (7.15).

The estimate in equation (7.11) immediately follows from equations (7.13) and (7.14), which finishes the proof of Claim 7.13.

Note that since $\mathbf {Y} = \mathbf {N} + \mathbf {B}$ and $\left \langle \mathbf {X}^{c} \right \rangle $ , $\mathbf {V}$ , $\mathbf {C}, \mathbf {J}$ are independent of the decomposition $\mathbf {X} = \mathbf {M} + \mathbf {A}$ , it follows from taking the infimum over all such decompositions in the inequality in equation (7.26) that

(7.27)

for all $n \in \{1, \dotsc , N\}$ . The same argument applies to ${\boldsymbol {\kappa }}$ and the estimate in equation (7.18) and we obtain

(7.28)

for all $n\in \{1,\dots , N-1\}$ .

Next we are going to show that ${\boldsymbol {\kappa }}$ satisfies the functional equation (4.3). Recall that $\mathbf {L} + {\boldsymbol {\kappa }} \in \mathscr {M}_{\mathrm {loc}}(\mathcal {T}_0^{N})$ . From Lemma 7.11, we have the following equality

$$ \begin{align*} \int_{(0,t]}H(\operatorname{\mathrm{ad}}{{\boldsymbol{\kappa}}_{u-}})\left(\mathrm{d}(\mathbf{L}_u + {\boldsymbol{\kappa}}_u)\right) = {\boldsymbol{\kappa}}_{t} - {\boldsymbol{\kappa}}_0 + \widetilde{\mathbf{L}}_t \end{align*} $$

for all $0\le t \le T$ , where

(7.29) $$ \begin{align} \widetilde{\mathbf{L}}_t =\;& \int_{(0,t]}H(\operatorname{\mathrm{ad}}{{\boldsymbol{\kappa}}_{u-}})\left\{\mathrm{d}\mathbf{X}_u + \frac{1}{2}\mathrm{d}\left\langle \mathbf{X}^{c} \right\rangle_{u} + \mathrm{d}\mathbf{V}_u + \mathrm{d}\mathbf{C}_u + \mathrm{d}\mathbf{J}_u \right\}. \end{align} $$

From Lemma 7.3 (Emery’s inequality) and the estimates in equations (7.27) and (7.28), it follows

Hence, by Lemma 7.4, it holds that

(7.30) $$ \begin{align} \vert\mkern-2.5mu\vert\mkern-2.5mu\vert{\widetilde{\mathbf{L}}}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert_{\mathscr{H}^{1,N}} \lesssim \vert\mkern-2.5mu\vert\mkern-2.5mu\vert{\mathbf{X}}\vert\mkern-2.5mu\vert\mkern-2.5mu\vert_{\mathscr{H}^{1,N}}. \end{align} $$

Now note that we have already shown in Claim 7.13 that ${\boldsymbol {\kappa }} = {\boldsymbol {\kappa }}_0 + \mathbf {m} + \mathbf {a}$ , where $\mathbf {m}\in \mathscr {M}(\mathcal {T}_0^{N})$ and $\mathbf {a}\in \mathscr {V}(\mathcal {T}_0^{N})$ , which satisfies that $ \left \Vert {\mathbf {a}^{(n)}} \right \Vert _{\mathscr {V}^{N/n}} <\infty $ for all $n \in \{1, \dotsc , N\}$ . Together with the above estimate, it then follows that $\boldsymbol {\kappa } + \widetilde {\mathbf {L}}$ is indeed a true martingale and therefore

$$ \begin{align*} {\boldsymbol{\kappa}}_t = \mathbb{E}\left( \widetilde{\mathbf{L}}_{T,t} \right), \quad 0 \le t \le T, \end{align*} $$

which is precisely the identity equation (4.3).

Acknowledgments

The authors thank the anonymous referees for the careful reading.

Funding statement

PKF has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement No. 683164) and the DFG Research Unit FOR 2402. PH and NT have been funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy – The Berlin Mathematics Research Center MATH+ (EXC-2046/1, project ID: 390685689).

Conflicts of Interest

None

Footnotes

1 Also known as a tensor product; see Section 2.1 for a precise definition.

2 Continuous from the right, limits from the left.

3 At the time of writing, available as a free sample chapter at https://www.princeton.edu/~yacine/research.

4 Diamond notation for Marcus SDEs, $\mathrm d S = S\,{\diamond \mathrm d} \mathbf {X}$ (see also [Reference Applebaum4]) will not be used here to avoid notational clash with the diamond product, introduced and studied in [Reference Alos, Gatheral and Radoičić3] and [Reference Friz, Gatheral and Radoičić23], respectively.

5 We thank the participants of Cumulants in Stochastic Analysis, TU Berlin (online), February 2021, for many discussions.

6 Here, $\circ $ denotes composition, not to be confused with Stratonovich integration ${\circ \mathrm {d}} \mathbf {X}$ .

7 All such series, seen as sequences of their partial sums, ‘converge’ if one equips the space of tensor series with the minimal topology that renders all finite-dimensional projections continuous.

8 The smallest right-continuous, $\mathbb {P}$ and $\{\mathbb {P}^{x}\}$ -complete filtration in $(\Omega ,\mathcal {F})$ that the Brownian motion B is adapted to.

9 In fact, as seen in [Reference Jacod35, Theorem 13.58], the existence of a unique solution to the martingale problem holds also under fairly weaker assumptions than (A1)–(A3), where Lipschitz continuity can essentially be relaxed to continuity and the moment conditions can be replaced to the integrability of the kernel against . Note, however, that the representation of the kernel as in assumption (A3), with the corresponding Lipschitz conditions on $\delta $ , is standard when constructing a jump diffusion as a strong solution to an SDE driven by a Wiener processes and a Poisson random measure (see, e.g., [Reference Jacod and Shiryaev36, XIII.3]).

10 Note that there is not much difference in defining these spaces with half-open time intervals $[0,T)$ , as the global Lipschitz assumptions always allow for a continuous extension.

References

Abi Jaber, E., Larsson, M., and Pulido, S., Affine volterra processes , Ann. Appl. Probab. 29 (2019), no. 5, 31553200.CrossRefGoogle Scholar
Aït-Sahalia, Y. and Jacod, J., High-frequency financial econometrics, Princeton University Press, 2014.Google Scholar
Alos, E., Gatheral, J., and Radoičić, R., Exponentiation of conditional expectations under stochastic volatility , Quantitative Finance; SSRN (2017) 20 (2020), no. 1, 1327.CrossRefGoogle Scholar
Applebaum, D., Lévy processes and stochastic calculus, Cambridge University Press, 2009.CrossRefGoogle Scholar
Arribas, I. P., Salvi, C., and Szpruch, L., Sig-sdes model for quantitative finance, 2020, arXiv:2006.00218 [q-fin.CP], 1st ACM International Conference on AI in Finance (ICAIF 2020).CrossRefGoogle Scholar
Blanes, S., Casas, F., Oteo, J., and Ros, J., The magnus expansion and some of its applications , Phys. Rep. 470 (2009), no. 5-6, 151238.CrossRefGoogle Scholar
Bonnier, P. and Oberhauser, H., Signature cumulants, ordered partitions, and independence of stochastic processes , Bernoulli 26 (2020), no. 4, 27272757.CrossRefGoogle Scholar
Bruned, Y., Curry, C., and Ebrahimi-Fard, K., Quasi-shuffle algebras and renormalisation of rough differential equations , Bulletin of the London Mathematical Society 52 (2020), no. 1, 4363.CrossRefGoogle Scholar
Casas, F. and Murua, A., An efficient algorithm for computing the Baker-Campbell-Hausdorff series and some of its applications , J. Math. Phys. 50 (2009), no. 3, 033513, 23.CrossRefGoogle Scholar
Celestino, A., Ebrahimi-Fard, K., Patras, F., and Perales, D., Cumulant–cumulant relations in free probability theory from Magnus’ expansion, (2021).CrossRefGoogle Scholar
Chen, K.-T., Iterated integrals and exponential homomorphisms†, Proc. London Math. Soc. s3-4 (1954), no. 1, 502512.CrossRefGoogle Scholar
Chevyrev, I. and Friz, P. K., Canonical rdes and general semimartingales as rough paths , Ann. Probab. 47 (2019), no. 1, 420463.CrossRefGoogle Scholar
Chevyrev, I. and Lyons, T., Characteristic functions of measures on geometric rough paths , Ann. Probab. 44 (2016), no. 6, 40494082.CrossRefGoogle Scholar
Cohen, S. and Elliott, R. J., Stochastic calculus and applications, 2nd ed., Birkhäuser, Basel, 2015.CrossRefGoogle Scholar
Cont, R. and Tankov, P., Financial modelling with jump processes, Financial Mathematics Series, Chapman & Hall/CRC, 2004.Google Scholar
Cuchiero, C., Filipović, D., Mayerhofer, E., and Teichmann, J., Affine processes on positive semidefinite matrices , The Annals of Applied Probability 21 (2011), no. 2, 397463.CrossRefGoogle Scholar
Cuchiero, C., Keller-Ressel, M., and Teichmann, J., Polynomial processes and their applications to mathematical finance , Finance and Stochastics 16 (2009).Google Scholar
Cuchiero, C., Svaluto-Ferro, S., and Teichmann, J., Signature sdes from an affine and polynomial perspective, 2021, In preparation.Google Scholar
Duffie, D., Filipović, D., Schachermayer, W., et al., Affine processes and applications in finance , Ann. Appl. Probab. 13 (2003), no. 3, 9841053.CrossRefGoogle Scholar
Estrade, A., Exponentielle stochastique et intégrale multiplicative discontinues , Ann. Inst. Henri Poincaré Probab. Stat. 28 (1992), no. 1, 107129.Google Scholar
Fawcett, T., Problems in stochastic analysis: connections between rough paths and non-commutative harmonic analysis, Ph.D. thesis, University of Oxford, 2002.Google Scholar
Friedman, A., Partial differential equations of parabolic type, R.E. Krieger Publishing Company, 1983.Google Scholar
Friz, P. K., Gatheral, J., and Radoičić, R., Forests, cumulants, martingales, to appear in The Annals of Probability (2022).CrossRefGoogle Scholar
Friz, P. K. and Hairer, M., A course on rough paths, 2nd ed., Universitext, Springer International Publishing, 2020.CrossRefGoogle Scholar
Friz, P. K. and Shekhar, A., General rough integration, Lévy rough paths and a Lévy–Kintchine-type formula , Ann. Probab. 45 (2017), no. 4, 27072765.CrossRefGoogle Scholar
Friz, P. K. and Victoir, N. B., The burkholder-davis-gundy inequality for enhanced martingales , Lecture Notes in Mathematics 1934 (2006).Google Scholar
Friz, P. K. and Victoir, N. B., Multidimensional stochastic processes as rough paths: Theory and applications, Cambridge Studies in Advanced Mathematics, Cambridge University Press, 2010.CrossRefGoogle Scholar
Fukasawa, M. and Matsushita, K., Realized cumulants for martingales , Electronic Communications in Probability 26 (2021), 110.CrossRefGoogle Scholar
Gatheral, J. and Keller-Ressel, M., Affine forward variance models , Finance Stoch. 23 (2019), no. 3, 501533.CrossRefGoogle Scholar
Hairer, M., Solving the kpz equation , Annals of Mathematics 178 (2013), no. 2, 559664.CrossRefGoogle Scholar
Hakim-Dowek, M. and Lépingle, D., L’exponentielle stochastique des groupes de Lie, Séminaire de Probabilités XX 1984/85, Springer, 1986, pp. 352374.Google Scholar
Hausdorff, F., Die symbolische Exponentialformel in der Gruppentheorie , Ber. Verh. Kgl. Sächs. Ges. Wiss. Leipzig., Math.-phys. Kl. 58 (1906), 1948.Google Scholar
Iserles, A. and Nørsett, S. P., On the solution of linear differential equations in lie groups , Philos. Trans. Roy. Soc. A 357 (1999), no. 1754, 9831019.CrossRefGoogle Scholar
Iserles, A., Munthe-Kaas, H., Nørsett, S., and Zanna, A., Lie-group methods , Acta numerica (2005).Google Scholar
Jacod, J., Calcul stochastique et problèmes de martingales, Lecture Notes in Mathematics, vol. 714, Springer Berlin Heidelberg, Berlin, Heidelberg, 1979 (eng).CrossRefGoogle Scholar
Jacod, J. and Shiryaev, A. N., Limit theorems for stochastic processes., Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], no. 488, Springer Berlin, 2003.CrossRefGoogle Scholar
Kamm, K., Pagliarani, S., and Pascucci, A., The stochastic magnus expansion, Journal of Scientific Computing 89 (2021), 56. https://doi.org/10.1007/s10915-021-01633-6.CrossRefGoogle Scholar
Karatzas, I. and Shreve, S., Brownian motion and stochastic calculus, 2 ed., Graduate Texts in Mathematics, vol. 113, Springer, New York, NY, 1998.CrossRefGoogle Scholar
Keller-Ressel, M., Larsson, M., and Pulido, S., Affine Rough Models, arXiv e-prints (2018), arXiv:1812.08486.Google Scholar
Keller-Ressel, M., Schachermayer, W., and Teichmann, J., Affine processes are regular , Probab. Theory Related Fields 151 (2011), no. 3-4, 591611.CrossRefGoogle Scholar
Kurtz, T. G., Pardoux, E., and Protter, P., Stratonovich stochastic differential equations driven by general semimartingales , Ann. Inst. Henri Poincaré Probab. Stat. 31 (1995), no. 2, 351377.Google Scholar
Lacoin, H., Rhodes, R., and Vargas, V., A probabilistic approach of ultraviolet renormalisation in the boundary sine-gordon model, to appear in Probability Theory and Related Fields (2022).Google Scholar
Le Gall, J.-F., Brownian motion, martingales, and stochastic calculus, Springer, 2016.CrossRefGoogle Scholar
LeJan, Y. and Qian, Z., Stratonovich’s signatures of brownian motion determine brownian sample paths , Probability Theory and Related Fields 157 (2011).Google Scholar
Lyons, T., Rough paths, signatures and the modelling of functions on streams , Proceedings of the International Congress of Mathematicians—Seoul 2014. Vol. IV, Kyung Moon Sa, Seoul, 2014, pp. 163184.Google Scholar
Lyons, T. and Ni, H., Expected signature of brownian motion up to the first exit time from a bounded domain , Ann. Probab. 43 (2015), no. 5, 27292762.CrossRefGoogle Scholar
Lyons, T. and Victoir, N., Cubature on wiener space, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences 460 (2004), no. 2041, 169198.CrossRefGoogle Scholar
Magnus, W., On the exponential solution of differential equations for a linear operator , Commun. Pure Appl. Math. 7 (1954), no. 4, 649673.CrossRefGoogle Scholar
Marcus, S. I., Modeling and analysis of stochastic differential equations driven by point processes , IEEE Trans. Inform. Theory 24 (1978), no. 2, 164172.CrossRefGoogle Scholar
Marcus, S. I., Modeling and approximation of stochastic differential equations driven by semimartingales , Stochastics 4 (1981), no. 3, 223245.CrossRefGoogle Scholar
McKean, H. P., Stochastic integrals, AMS Chelsea Publishing Series, no. 353, American Mathematical Society, 1969.Google Scholar
Miller, W. Jr., Symmetry groups and their applications, Pure and Applied Mathematics, vol. 50, Academic Press, New York-London, 1972.Google Scholar
Mykland, P. A., Bartlett type identities for martingales , Ann. Statist. 22 (1994), no. 1, 2138.CrossRefGoogle Scholar
Ni, H., The expected signature of a stochastic process, Ph.D. thesis, University of Oxford, 2012.Google Scholar
Øksendal, B., Stochastic differential equations: An introduction with applications, 6 ed., Springer Berlin / Heidelberg, Berlin, Heidelberg, 2014 (eng).Google Scholar
Pham, H., Optimal stopping of controlled jump diffusion processes: A viscosity solution approach , J. Math. Syst. Est. Control 8 (1998), 127.Google Scholar
Protter, P. E., Stochastic integration and differential equations, 2 ed., Stochastic Modelling and Applied Probability, Springer-Verlag Berlin Heidelberg, 2005.CrossRefGoogle Scholar
Reutenauer, C., Free lie algebras , Handbook of Algebra, vol. 3, Elsevier, 2003, pp. 887903.Google Scholar
Revuz, D. and Yor, M., Continuous martingales and brownian motion, Grundlehren der mathematischen Wissenschaften, Springer Berlin Heidelberg, 2004.Google Scholar
Stroock, D. W., Diffusion processes associated with lévy generators , Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 32 (1975), no. 3, 209244.CrossRefGoogle Scholar
Stroock, D. W. and Varadhan, S. R. S., Diffusion processes with continuous coefficients, i , Communications on Pure and Applied Mathematics 22 (1969), no. 3, 345400.CrossRefGoogle Scholar
Young, L. C., An inequality of the hölder type, connected with stieltjes integration , Acta Math. 67 (1936), 251282.CrossRefGoogle Scholar
Figure 0

Figure 1 FunctEqu $\mathscr {S}$-SigCum (Theorem 4.1) and implications. $\mathscr {S}$ (respectively, $\mathscr {S}^{c}$) stands for general (respectively, continuous) semimartingales and $\mathscr {V}$ (respectively, $\mathscr {V}^{c}$) stands for finite variation (respectively, finite variation and continuous) processes.

Figure 1

Figure 2 Computational consequence: accompanying recursions.