Hostname: page-component-7bb8b95d7b-2h6rp Total loading time: 0 Render date: 2024-09-13T02:32:55.336Z Has data issue: false hasContentIssue false

CHAD for expressive total languages

Published online by Cambridge University Press:  14 July 2023

Fernando Lucatelli Nunes
Affiliation:
Department of Information and Computing Sciences, Utrecht University, Utrecht, Netherlands
Matthijs Vákár*
Affiliation:
Department of Information and Computing Sciences, Utrecht University, Utrecht, Netherlands
*
Corresponding author: Matthijs Vákár; Email: matthijsvakar@gmail.com
Rights & Permissions [Opens in a new window]

Abstract

We show how to apply forward and reverse mode Combinatory Homomorphic Automatic Differentiation (CHAD) (Vákár (2021). ESOP, 607–634; Vákár and Smeding (2022). ACM Transactions on Programming Languages and Systems 44 (3) 20:1–20:49.) to total functional programming languages with expressive type systems featuring the combination of

  • tuple types;

  • sum types;

  • inductive types;

  • coinductive types;

  • function types.

We achieve this by analyzing the categorical semantics of such types in $\Sigma$-types (Grothendieck constructions) of suitable categories. Using a novel categorical logical relations technique for such expressive type systems, we give a correctness proof of CHAD in this setting by showing that it computes the usual mathematical derivative of the function that the original program implements. The result is a principled, purely functional and provably correct method for performing forward- and reverse-mode automatic differentiation (AD) on total functional programming languages with expressive type systems.

Type
Special Issue: Differences and Metrics in Programs Semantics: Advances in Quantitative Relational Reasoning
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press

1. Introduction

Automatic differentiation (AD) is a popular technique for computing derivatives of functions implemented by computer programs, essentially by applying the chain rule across the program code. It is typically the method of choice for computing derivatives in machine learning and scientific computing because of its efficiency and numerical stability. AD has two main variants: forward-mode AD, which calculates the derivative of a function, and reverse-mode AD, which calculates the (matrix) transpose of the derivative. Roughly speaking, for a function $f:{\mathbb{R}}^n\to {\mathbb{R}}^m$ , reverse mode is the more efficient technique if $n\gg m$ and forward mode is if $n\ll m$ . Seeing that we are usually interested in computing derivatives (or gradients) of functions $f:{\mathbb{R}}^n\to\mathbb{R}$ with very large n, reverse AD tends to be the more important algorithm in practice (Baydin et al. Reference Baydin, Pearlmutter, Radul and Siskind2017).

While the study of AD has a long history in the numerical methods community, which we will not survey (see, e.g., Griewank and Walther Reference Griewank and Walther2008), there has recently been a proliferation of work by the programming languages community examining the technique from a new angle. New goals pursued by this community include

  • giving a concise, clear, and easy-to-implement definition of various AD algorithms;

  • expanding the languages and programming techniques that AD can be applied to;

  • relating AD to its mathematical foundations in differential geometry and proving that AD implementations correctly calculate derivatives;

  • performing AD at compile time through source code transformation, to maximally expose optimization opportunities to the compiler and to avoid interpreter overhead that other AD approaches can incur;

  • providing formal complexity guarantees for AD implementations.

We provide a brief summary of some of this more recent work in Section 16. The present paper adds to this new body of work by advancing the state of the art of the first four goals. We leave the fifth goal when applied to our technique mostly to future work (with the exception of Corollary 130). Specifically, we extend the scope of the Combinatory Homomorphic Automatic Differentiation (CHAD) method of forward and reverse AD (Vákár Reference Vákár2021, Vákár and Smeding Reference Vákár and Smeding2022) (from the previous state of the art: a simply typed $\lambda$ -calculus) to apply to total functional programming languages with expressive type systems, that is, the combination of:

  • tuple types, to enable programs that return or take as an argument more than one value;

  • sum types, to enable programs that define and branch on variant data types;

  • inductive types, to include programs that operate on labeled-tree-like data structures;

  • coinductive types, to deal with programs that operate on lazy infinite data structures such as streams;

  • function types, to encompass programs that use popular higher-order programming idioms such as maps and folds.

This conceptually simple extension requires a considerable extension of existing techniques in denotational semantics. The payoffs of this challenging development are surprisingly simple AD algorithms as well as reusable abstract semantic techniques.

The main contributions of this paper are as follows:

  • developing an abstract categorical semantics (Section 3) of such expressive type systems in suitable $\Sigma$ -types of categories (Section 6);

  • presenting, as the initial instantiation of this abstract semantics, an idealized target language for CHAD when applied to such type systems (Section 7);

  • deriving the forward and reverse CHAD algorithms (Section 8) when applied to expressive type systems as the uniquely defined homomorphic functors (Section 4) from the source (Section 5) to the target language (Section 7);

  • introducing (categorical) logical relations techniques (aka sconing) for reasoning about expressive functional languages that include both inductive and coinductive types (Section 11);

  • using such a logical relations construction over the concrete denotational semantics (Section 10) of the source and target languages (Section 9) that demonstrates that CHAD correctly calculates the usual mathematical derivative (Section 12), even for programs between inductive types (Section 13);

  • discussing examples (Section 14) and applied considerations around implementing this extended CHAD method in practice (Section 15).

We start by giving a high-level overview of the key insights and theorems in this paper in Section 2.

2. Key Ideas

2.1 Origins in semantic derivatives and chain rules

CHAD starts from the observation that for a differentiable function:

$$f: {\mathbb{R}}^n\to {\mathbb{R}}^m$$

it is useful to pair the primal function value f(x) with f’s derivative Df(x) at x if we want to calculate derivatives in a compositional way (where we underline the spaces $\underline{\mathbb{R}}^n$ of tangent vectors to emphasize their algebraic structure and we write a linear function type for the derivative to indicate its linearity in its tangent vector argument):

\begin{align*}\mathcal{T}{f} : & \ {\mathbb{R}}^n \to {\mathbb{R}}^m\times (\underline{\mathbb{R}}^n\multimap \underline{\mathbb{R}}^m)\\&x\mapsto (f(x),Df(x)).\end{align*}

Indeed, the chain rule for derivatives teaches us that we compute the derivative of a composition $g\circ f$ of functions as follows, where we write $\mathcal{T}_{1}{f}\stackrel {\mathrm{def}}= \pi_1\circ \mathcal{T}{f}$ and $\mathcal{T}_{2}{f}\stackrel {\mathrm{def}}= \pi_2\circ \mathcal{T}{f}$ for the first and second components of $\mathcal{T}{f}$ , respectively:

$$\mathcal{T}\!\!{(g\circ f)}(x) = (\mathcal{T}_{1}{g}(\mathcal{T}_{1}{f}(x)),\mathcal{T}_{2}{g}(\mathcal{T}_{1}{f}(x))\circ \mathcal{T}_{2}{f}(x)).$$

We make two observations:

(1) the derivative of $g\circ f$ does depend not only on the derivatives of g and f but also on the primal value of f;

(2) the primal value of f is used twice: once in the primal value of $g\circ f $ and once in its derivative; we want to share these repeated subcomputations.

Insight 1. This shows that it is wise to pair up computations of primal function values and derivatives and to share computation between both if we want to calculate derivatives of functions compositionally and efficiently.

Similar observations can be made for f’s transposed (adjoint) derivative ${Df}^{t}$ , which propagates not tangent vectors but cotangent vectors and which we can pair up as:

\begin{align*}\mathcal{T}^*f : & \ {\mathbb{R}}^n \to {\mathbb{R}}^m\times (\underline{\mathbb{R}}^m\multimap \underline{\mathbb{R}}^n)\\ &x\mapsto (f(x),{Df}^{t}(x)) \end{align*}

to get the following chain rule:

$$\mathcal{T}^*{(g\circ f)}(x) = (\mathcal{T}^*_{1}{g}(\mathcal{T}^*_{1}{f}(x)),\mathcal{T}^*_{2}{f}(x)\circ\mathcal{T}^*_{2}{g}(\mathcal{T}^*_{1}{f}(x))).$$

CHAD directly implements the operations $\mathcal{T}_{}$ and $\mathcal{T}^*$ as source code transformations $\overrightarrow{\mathcal{D}}$ and $\overleftarrow{\mathcal{D}}$ on a functional language to implement forward- and reverse-mode AD, respectively. These code transformations are defined compositionally through structural induction on the syntax, by making use of the chain rules above.

2.2 CHAD on a first-order functional language

We first discuss what the technique looks like on a standard typed first-order functional language. Despite our different presentation in terms of a $\lambda$ -calculus rather than Elliott’s categorical combinators, this is essentially the algorithm of Elliott (Reference Elliott2018). Types $\tau,\sigma,\rho$ are either statically sized arrays of n real numbers ${\mathbf{real}}^n$ or tuples $\tau\boldsymbol{\mathop{*}}\sigma$ of types $\tau,\sigma$ . We consider programs t of type $\sigma$ in typing context $\Gamma=x_1:\tau_1,\ldots,x_n:\tau_n$ , where $x_i$ are identifiers. We write such a typing judgment for programs in context as $\Gamma\vdash t:\sigma$ . As long as our language has certain primitive operations (which we represent schematically)

$$\frac{\Gamma \vdash t_1 : {\mathbf{real}}^{n_1}\quad\cdots\quad \Gamma \vdash t_k : {\mathbf{real}}^{n_k}}{\Gamma \vdash \mathrm{op}(t_1,\ldots,t_k) : {\mathbf{real}}^m}$$

such as constants (as nullary operations), (elementwise) addition and multiplication of arrays, inner products and certain nonlinear functions like sigmoid functions, we can write complex programs by sequencing together such operations. For example, writing $\mathbf{real}$ for ${\mathbf{real}}^1$ , we can write a program $x_1:\mathbf{real},x_2:\mathbf{real},x_3:\mathbf{real},x_4:\mathbf{real}\vdash s:\mathbf{real}$ by:

\begin{align*}&\mathbf{let}\,{y}=\,{x_1 * x_4 + 2 * x_2 }\,\mathbf{in}\,{}\\&\mathbf{let}\,{z}=\,{y* x_3}\,\mathbf{in}\,{}\\&\mathbf{let}\,w=\,{z+ x_4}\,\mathbf{in}\,{\sin{(w)}},\end{align*}

where we indicate shared subcomputations with $\mathbf{let}$ -bindings.

CHAD observes that we can define for each language type $\tau$ associated types of

  • forward-mode primal values $\overrightarrow{\mathcal{D}}(\tau)_{1}$ ;

we define $\overrightarrow{\mathcal{D}}({\mathbf{real}}^n)={\mathbf{real}}^n$ and $\overrightarrow{\mathcal{D}}(\tau\boldsymbol{\mathop{*}}\sigma)_1=\overrightarrow{\mathcal{D}}(\tau)_1\boldsymbol{\mathop{*}}\overrightarrow{\mathcal{D}}(\sigma)_1$ , that is, for now $\overrightarrow{\mathcal{D}}(\tau)_1=\tau$ ;

  • reverse-mode primal values $\overleftarrow{\mathcal{D}}(\tau)_1$ ;

we define $\overleftarrow{\mathcal{D}}({\mathbf{real}}^n)={\mathbf{real}}^n$ and $\overleftarrow{\mathcal{D}}(\tau)\boldsymbol{\mathop{*}}(\sigma)_1=\overleftarrow{\mathcal{D}}(\tau)_1\boldsymbol{\mathop{*}}\overleftarrow{\mathcal{D}}(\sigma)_1$ ; that is, for now $\overleftarrow{\mathcal{D}}(\tau)_1=\tau$ ;

  • forward-mode tangent values $\overrightarrow{\mathcal{D}}(\tau)_2$ ;

we define $\overrightarrow{\mathcal{D}}({\mathbf{real}}^n)_2=\underline{\mathbf{real}}^n$ and $\overrightarrow{\mathcal{D}}(\tau\boldsymbol{\mathop{*}}\sigma)=\overrightarrow{\mathcal{D}}(\tau)_2\boldsymbol{\mathop{*}}\overrightarrow{\mathcal{D}}(\sigma)_2$ ;

  • reverse-mode cotangent values $\overleftarrow{\mathcal{D}}(\tau)_2$ ;

we define $\overleftarrow{\mathcal{D}}({\mathbf{real}}^n)_2=\underline{\mathbf{real}}^n$ and $\overleftarrow{\mathcal{D}}(\tau\boldsymbol{\mathop{*}}\sigma)=\overleftarrow{\mathcal{D}}(\tau)_2\boldsymbol{\mathop{*}}\overleftarrow{\mathcal{D}}(\sigma)_2$ .

Indeed, the justification for these definitions is the crucial observation that a (co)tangent vector to a product of spaces is precisely a pair of tangent (co)vectors to the two spaces. Put differently, the space $\mathcal{T}_{(x,y)}{(X\times Y)}$ of (co)tangent vectors to $X\times Y$ at a point (x,y) equals the product space $(\mathcal{T}_{x}X) \times (\mathcal{T}_{y} Y)$ (Tu Reference Tu2011).

We write the (co)tangent types associated with ${\mathbf{real}}^n$ as $\underline{\mathbf{real}}^n$ to emphasize that it is a linear type and to distinguish it from the cartesian type ${\mathbf{real}}^n$ . In particular, we will see that tangent and cotangent values are elements of linear types that come equipped with a commutative monoid structure $(\underline{0},+)$ . Indeed, (transposed) derivatives are linear functions: homomorphisms of this monoid structure1. We extend these operations $\overrightarrow{\mathcal{D}}$ and $\overleftarrow{\mathcal{D}}$ to act on typing contexts $\Gamma$ :

\begin{align*}\overrightarrow{\mathcal{D}}(x_1:\tau_1,\ldots,x_n:\tau_n)_1&=x_1:\overrightarrow{\mathcal{D}}(\tau_1)_1,\ldots, x_n:\overrightarrow{\mathcal{D}}(\tau_n)_1 \\\overleftarrow{\mathcal{D}}(x_1:\tau_1,\ldots,x_n:\tau_n)_1&=x_1:\overleftarrow{\mathcal{D}}(\tau_1)_1,\ldots, x_n:\overleftarrow{\mathcal{D}}(\tau_n)_1 \\\overrightarrow{\mathcal{D}}(x_1:\tau_1,\ldots,x_n:\tau_n)_2&=\overrightarrow{\mathcal{D}}(\tau_1)_2\boldsymbol{\mathop{*}}\cdots\boldsymbol{\mathop{*}}\overrightarrow{\mathcal{D}}(\tau_n)_2\\\overleftarrow{\mathcal{D}}(x_1:\tau_1,\ldots,x_n:\tau_n)_2&=\overleftarrow{\mathcal{D}}(\tau_1)_2\boldsymbol{\mathop{*}}\cdots\boldsymbol{\mathop{*}}\overleftarrow{\mathcal{D}}(\tau_n)_2.\end{align*}

To each program $\Gamma\vdash t:\sigma$ , CHAD associates programs calculating the forward-mode and reverse-mode derivatives $\overrightarrow{\mathcal{D}}_{{\overline{\Gamma}}}(t)$ and $\overleftarrow{\mathcal{D}}_{{\overline{\Gamma}}}(t)$ , which are indexed by the list ${\overline{\Gamma}}$ of identifiers that occur in $\Gamma$ :

\begin{align*}&\overrightarrow{\mathcal{D}}(\Gamma)_1\vdash \overrightarrow{\mathcal{D}}_{\overline{\Gamma}}(t) : \overrightarrow{\mathcal{D}}(\sigma)(\boldsymbol){\mathop{*}} \left( \overrightarrow{\mathcal{D}}(\Gamma)_2\multimap \overrightarrow{\mathcal{D}}(\sigma)(\right)\\&\overleftarrow{\mathcal{D}}(\Gamma)_1\vdash \overleftarrow{\mathcal{D}}_{\overline{\Gamma}}(t) : \overleftarrow{\mathcal{D}}(\sigma)\boldsymbol{\mathop{*}} \left( \overleftarrow{\mathcal{D}}(\sigma)\multimap\overleftarrow{\mathcal{D}}(\Gamma)_2 \right).\end{align*}

Observing that each program t computes a differentiable function $\unicode{x27E6} t\unicode{x27E7}$ between Euclidean spaces, as long as all primitive operations op are differentiable, the key property that we prove for these code transformations is that they actually calculate derivatives:

Theorem A (Correctness of CHAD, Theorem 124). For any well-typed program:

$$x_1:{\mathbf{real}}^{n_1},\ldots,x_k:{\mathbf{real}}^{n_k}\vdash {t}:{\mathbf{real}}^m$$

we have that $\unicode{x27E6} \overrightarrow{\mathcal{D}}_{x_1,\ldots,x_k}(t)\unicode{x27E7}=\mathcal{T}_{\unicode{x27E6} t\unicode{x27E7}}\;\text{ and }\;\unicode{x27E6} \overleftarrow{\mathcal{D}}_{x_1,\ldots,x_k}(t)\unicode{x27E7}=\mathcal{T}^*{\unicode{x27E6} t\unicode{x27E7}}.$

Once we fix the semantics for the source and target languages, we can show that this theorem holds if we define $\overrightarrow{\mathcal{D}}$ and $\overleftarrow{\mathcal{D}}$ on programs using the chain rule. The proof works by plain induction on the syntax. For example, we can correctly define reverse-mode CHAD on a first-order language as follows:

\begin{align*} &\overleftarrow{\mathcal{D}}_{\overline{\Gamma}}(\mathrm{op}(t_1,\ldots,t_k)) \stackrel{\mathrm{def}}{=} && \mathbf{let}\,\langle x_1,x_1'\rangle=\,\overleftarrow{\mathcal{D}}_{\overline{\Gamma}}(t_1)\,\mathbf{in}\,\cdots\\ &&& \mathbf{let}\,\langle x_k,x_k'\rangle=\,\overleftarrow{\mathcal{D}}_{\overline{\Gamma}}(t_k)\,\mathbf{in}\,\\ &&&\langle\mathrm{op}(x_1,\ldots,x_k),\underline{\lambda} \mathsf{v}. \mathbf{let}\,\mathsf{v}=\,{D\mathrm{op}}^{t}(x_1,\ldots,x_k;\mathsf{v})\,\mathbf{in}\,\\ &&&\phantom{\langle \mathrm{op}(x_1,\ldots,x_k),\underline{\lambda} \mathsf{v}.\rangle}x_1'\bullet \mathbf{proj}_{1}\,{\mathsf{v}}+\cdots+x_k'\bullet \mathbf{proj}_{k}\,{\mathsf{v}}\rangle\end{align*}

\begin{align*}&\overleftarrow{\mathcal{D}}_{\overline{\Gamma}}(x) \stackrel{\mathrm{def}}{=} && \langle x,\underline{\lambda} \mathsf{v}. \mathbf{coproj}_{\mathbf{idx}(x; {\overline{\Gamma}})\,}\,(\mathsf{v})\rangle\end{align*}

\begin{align*} &\overleftarrow{\mathcal{D}}_{\overline{\Gamma}}(\mathbf{let}\,x=\,t\,\mathbf{in}\,s)\stackrel{\mathrm{def}}{=} &&\mathbf{let}\,\langle x,x'\rangle=\,\overleftarrow{\mathcal{D}}_{\overline{\Gamma}}(t)\,\mathbf{in}\,\\ &&& \mathbf{let}\,\langle y,y'\rangle=\,\overleftarrow{\mathcal{D}}_{\overline{\Gamma},x}(s)\,\mathbf{in}\,\\ &&& \langle y, \underline{\lambda} \mathsf{v}. \mathbf{let}\,\mathsf{v}=\,y'\bullet \mathsf{v}\,\mathbf{in}\, \mathbf{fst}\,\mathsf{v}+x'\bullet (\mathbf{snd}\, \mathsf{v})\rangle\end{align*}

\begin{align*}&\overleftarrow{\mathcal{D}}_{\overline{\Gamma}}(\langle t, s\rangle) \stackrel {\mathrm{def}}= &&\mathbf{let}\,\langle x,x'\rangle=\,\overleftarrow{\mathcal{D}}_{\overline{\Gamma}}(t)\,\mathbf{in}\, \\ &&&\mathbf{let}\,\langle y,y'\rangle=\,\overleftarrow{\mathcal{D}}_{\overline{\Gamma}}(s)\,\mathbf{in}\,\\ &&&\langle \langle x, y \rangle,\underline{\lambda} \mathsf{v}. x'\bullet (\mathbf{fst}\,\mathsf{v}) + {y'\bullet(\mathbf{snd}\, \mathsf{v})}\rangle\end{align*}

\begin{align*}&\overleftarrow{\mathcal{D}}_{\overline{\Gamma}}(\mathbf{fst}\, t) \stackrel {\mathrm{def}}= &&\mathbf{let}\,\langle x,x'\rangle=\,\overleftarrow{\mathcal{D}}_{\overline{\Gamma}}(t)\,\mathbf{in}\,\langle \mathbf{fst}\, x, \underline{\lambda} \mathsf{v}. x'\bullet \langle\mathsf{v},\underline{0}\rangle\rangle\end{align*}

\begin{align*}&\overleftarrow{\mathcal{D}}_{\overline{\Gamma}}(\mathbf{snd}\, t) \stackrel {\mathrm{def}}= &&\mathbf{let}\,\langle x,x'\rangle=\,\overleftarrow{\mathcal{D}}_{\overline{\Gamma}}(t)\,\mathbf{in}\,\langle\mathbf{snd}\, x,\underline{\lambda} \mathsf{v}. x'\bullet\langle\underline{0},\mathsf{v}\rangle\rangle\end{align*}

Here, we write $\underline{\lambda} \mathsf{v}. t$ for a linear function abstraction (merely a notational convention – it can simply be thought of as a plain function abstraction) and $t\bullet s$ for a linear function application (which again can be thought of as a plain function application). Furthermore, given $\Gamma;\mathsf{v}:\underline{\alpha}\vdash t:\boldsymbol{(}\underline{\sigma}_1 \boldsymbol{\mathop{*}} \cdots \boldsymbol{\mathop{*}} \underline{\sigma}_n\boldsymbol{)}$ , we write $\Gamma;\mathsf{v}:\underline{\alpha}\vdash \mathbf{proj}_{i}\,(t):\underline{\sigma}_i$ for the i-th projection of t. Similarly, given $\Gamma;\mathsf{v}:\underline{\alpha}\vdash t:\underline{\sigma}_i$ , we write the i-th coprojection $\Gamma;\mathsf{v}:\underline{\alpha}\vdash\mathbf{coproj}_{i}\,(t)= \langle \underline{0},\ldots,\underline{0},t,\underline{0},\ldots,\underline{0}\rangle:\boldsymbol{(}\underline{\sigma}_1 \boldsymbol{\mathop{*}} \cdots \boldsymbol{\mathop{*}} \underline{\alpha}_n\boldsymbol{)}$ and we write $\mathbf{idx}(x_i; x_1,\ldots,x_n)\,=i$ for the index of an identifier in a list of identifiers. Finally, ${D\mathrm{op}}^{t}$ here is a linear operation that implements the transposed derivative of the primitive operation op.

Note, in particular, that CHAD pairs up primal and (co)tangent values and shares common subcomputations. We see that what CHAD achieves is a compositional efficient reverse-mode AD algorithm that computes the (transposed) derivatives of a composite program in terms of the (transposed) derivatives ${D\mathrm{op}}^{t}$ of the basic building blocks op.

2.3 CHAD on a higher-order language: a categorical perspective saves the day

So far, this account of CHAD has been smooth sailing: we can simply follow the usual mathematics of (transposed) derivatives of functions ${\mathbb{R}}^n\to {\mathbb{R}}^m$ and implement it in code. A challenge arises when trying to extend the algorithm to more expressive languages with features that do not have an obvious counterpart in multivariate calculus, like higher-order functions.

Vákár and Smeding (Reference Vákár and Smeding2022) and Vákár (Reference Vákár2021) solve this problem by observing that we can understand CHAD through the categorical structure of Grothendieck constructions (aka $\Sigma$ -types of categories). In particular, they observe that the syntactic category of the target language for CHAD, a language with both cartesian and linear types, forms a locally indexed category ${\mathbf{LSyn}}:{\mathbf{CSyn}}^{op}\to \mathbf{Cat}$ , that is, functor to the category of categories and functors for which $\mathrm{obj} \left( {\mathbf{LSyn}}\right)(\tau)=\mathrm{obj} \left( {\mathbf{LSyn}}\right)(\sigma)$ for all $\tau,\sigma\in\mathrm{obj} \left( {\mathbf{CSyn}}\right)$ and ${\mathbf{LSyn}}(\tau\xrightarrow{t}\sigma):{\mathbf{LSyn}}(\sigma)\to{\mathbf{LSyn}}(\tau)$ is identity on objects. Here, ${\mathbf{CSyn}} $ is the syntactic category whose objects are cartesian types $\tau,\sigma,\rho$ and morphisms $\tau\to \sigma$ are programs $x:\tau\vdash t:\sigma$ , up to a standard program equivalence. Similarly, ${\mathbf{LSyn}}(\tau)$ is the syntactic category whose objects are linear types $\underline{\alpha},\underline{\sigma},\underline{\gamma}$ and morphisms $\underline{\alpha}\to\underline{\gamma}$ are programs $x:\tau;\mathsf{v}:\underline{\alpha}\vdash t:\underline{\gamma}$ of type $\underline{\gamma}$ that have a free variable x of cartesian type $\tau$ and a free variable $\mathsf{v}$ of linear type $\underline{\alpha}$ . The key observation then is the following.

Theorem B (CHAD from a universal property, Corollary 69). Forward- and reverse-mode CHAD are the unique structure-preserving functors:

\begin{align*} &\overrightarrow{\mathcal{D}}({-}):\mathbf{Syn}\to \Sigma_{{\mathbf{CSyn}}}{\mathbf{LSyn}}\\ &\overleftarrow{\mathcal{D}}({-}):\mathbf{Syn}\to \Sigma_{{\mathbf{CSyn}}}{\mathbf{LSyn}}^{op}\end{align*}

from the syntactic category $\mathbf{Syn}$ of the source language to (opposite) Grothendieck construction of the target language ${\mathbf{LSyn}}:{\mathbf{CSyn}}^{op}\to \mathbf{Cat}$ that send primitive operations op to their derivative $D\mathrm{op}$ and transposed derivative ${D\mathrm{op}}^{t}$ , respectively.

In particular, they prove that this is true for the unambiguous definitions of CHAD for a source language that is the first-order functional language we have considered above, which we can see as the freely generated category $\mathbf{Syn}$ with finite products, generated by the objects ${\mathbf{real}}^n$ and morphisms op. That is, for this limited language, “structure-preserving functor” should be interpreted as “finite product-preserving functor.”

This leads (Vákár Reference Vákár2021; Vákár and Smeding Reference Vákár and Smeding2022) to the idea to try to use Theorem B as a definition of CHAD on more expressive programming languages. In particular, they consider a higher-order functional source language $\mathbf{Syn}$ , that is, the freely generated cartesian closed category on the objects ${\mathbf{real}}^n$ and morphisms op and try to define $\overrightarrow{\mathcal{D}}(-)$ and $\overleftarrow{\mathcal{D}}(-)$ as the (unique) structure-preserving (meaning: cartesian closed) functors to $\Sigma_{{\mathbf{CSyn}}}{\mathbf{LSyn}}$ and $\Sigma_{{\mathbf{CSyn}}}{\mathbf{LSyn}}^{op}$ for a suitable linear target language ${\mathbf{LSyn}}:{\mathbf{CSyn}}^{op}\to \mathbf{Cat}$ . The main contribution then is to identify conditions on a locally indexed category $\mathcal{L}:\mathcal{C}^{op}\to \mathbf{Cat}$ that guarantee that $\Sigma_{\mathcal{C}}\mathcal{L}$ and $\Sigma_{\mathcal{C}}\mathcal{L}^{op}$ are cartesian closed and to take the target language ${\mathbf{LSyn}}:{\mathbf{CSyn}}^{op}\to \mathbf{Cat}$ as a freely generated such category.

Insight 2. To understand how to perform CHAD on a source language with language feature X (e.g., higher-order functions), we need to understand the categorical semantics of language feature X (e.g., categorical exponentials) in categories of the form $\Sigma_\mathcal{C}\mathcal{L}$ and $\Sigma_\mathcal{C}\mathcal{L}^{op}$ . Giving sufficient conditions on $\mathcal{L}$ for such a semantics to exist yields a suitable target language for CHAD, with the definition of the algorithm falling from the universal property of the source language.

Furthermore, we observe in these papers that Theorem A again holds for this extended definition of CHAD on higher-order languages. However, to prove this, plain induction no longer suffices and we instead need to use a logical relations construction over the semantics (in the form of categorical sconing) that relates differentiable curves to their associated primal and (co)tangent curves. This is necessary because the program t may use higher-order constructions such as $\lambda$ -abstractions and function applications in its definition, even if the input and output types are plain first-order types that implement some Euclidean space.

Insight 3. To obtain a correctness proof of CHAD on source languages with language feature X, it suffices to give a concrete denotational semantics for the source and target languages as well as a categorical semantics of language feature X in a category of logical relations (a scone) over these concrete semantics. The main technical challenge is to analyze logical relations techniques for language feature X.

Finally, these papers observe that the resulting target language can be implemented as a shallowly embedded DSL in standard functional languages, using a module system to implement the required linear types as abstract types, with a reference Haskell implementation available at https://github.com/VMatthijs/CHAD. In fact, Vytiniotis et al. (Reference Vytiniotis, Belov, Wei, Plotkin and Abadi2019) had proposed the same CHAD algorithm for higher-order languages, arriving at it from practical considerations rather than abstract categorical observations.

Insight 4. The code generated by CHAD naturally comes equipped with very precise (e.g., linear) types. These types emphasize the connections to its mathematical foundations and provide scaffolding for its correctness proof. However, they are unnecessary for a practical implementation of the algorithm: CHAD can be made to generate standard functional (e.g., Haskell) code; the type safety can even be rescued by implementing the linear types as abstract types.

2.4 CHAD for sum types: a challenge – (co)tangent spaces of varying dimension

A natural approach, therefore, when extending CHAD to yet more expressive source languages is to try to use Theorem B as a definition. In the case of sum types (aka variant types), therefore, we should consider their categorical equivalent, distributive coproducts, and seek conditions on $\mathcal{L}:\mathcal{C}^{op}\to\mathbf{Cat}$ under which $\Sigma_{\mathcal{C}}\mathcal{L}$ and $\Sigma_{\mathcal{C}}\mathcal{L}^{op}$ have distributive coproducts. The difficulty is that these categories tend not to have coproducts if $\mathcal{L}$ is locally indexed. Instead, the desire to have coproducts in $\Sigma_{\mathcal{C}}\mathcal{L}$ and $\Sigma_{\mathcal{C}}\mathcal{L}^{op}$ naturally leads us to consider more general strictly indexed categories $\mathcal{L}:\mathcal{C}^{op}\to\mathbf{Cat}$ .

In fact, this is compatible with what we know from differential geometry (Tu Reference Tu2011): coproducts allow us to construct spaces with multiple connected components, each of which may have a distinct dimension. To make things concrete, the space $\mathcal{T}_{x}{({\mathbb{R}}^2 \sqcup {\mathbb{R}}^3)}$ of tangent vectors to ${\mathbb{R}}^2 \sqcup {\mathbb{R}}^3$ is either $\underline{\mathbb{R}}^2$ or $\underline{\mathbb{R}}^3$ depending on whether the base point x is chosen in the left or right component of the coproduct. More generally, a differentiable function $f:X\to Y$ between spaces of varying dimension (which can be formalized as manifolds with multiple connected components) induces functions on the spaces of tangent and cotangent vectors2:

\begin{align*}\mathcal{T}{f}&:\Pi_{x\in X}\Sigma_{y\in Y}(\mathcal{T}_{x} X\multimap \mathcal{T}_{y}Y)\\\mathcal{T}^*{f}&:\Pi_{x\in X}\Sigma_{y\in Y}(\mathcal{T}^*_{y} Y\multimap \mathcal{T}^*_{x}X),\end{align*}

whose first component is f itself and whose second component is the action on (co)tangent vectors that f induces.

If the types $\overrightarrow{\mathcal{D}}(\tau)_2$ and $\overleftarrow{\mathcal{D}}(\tau)_2$ are to represent spaces of tangent and cotangent vectors to the spaces that $\overrightarrow{\mathcal{D}}(\tau)_{1}$ and $\overleftarrow{\mathcal{D}}(\tau)_1$ represent, we would expect them to be types that vary with the particular base point (primal) we choose. This leads to a refined view of CHAD: while $\vdash \overrightarrow{\mathcal{D}}(\tau)_1:\mathrm{type} $ and $\vdash\overleftarrow{\mathcal{D}}(\tau)_1:\mathrm{type}$ can remain (closed/nondependent) cartesian types, ${p}:\overrightarrow{\mathcal{D}}(\tau)_1\vdash \overrightarrow{\mathcal{D}}(\tau)_2:\mathrm{ltype}$ and ${p}:\overleftarrow{\mathcal{D}}(\tau)_1\vdash \overleftarrow{\mathcal{D}}(\tau)_2:\mathrm{ltype}$ are, in general, linear dependent types.

Insight 5. To accommodate sum types in CHAD, it is natural to consider a target language with dependent types: this allows the dimension of the spaces of (co)tangent vectors to vary with the chosen primal. In categorical terms, we need to consider general strictly indexed categories $\mathcal{L}:\mathcal{C}^{op}\to \mathbf{Cat}$ instead of merely locally indexed ones.

The CHAD transformations of the program now becomes typed in the following more precise way:

\[\begin{array}{l} \overrightarrow{\mathcal{D}}(\Gamma)_1\vdash \overrightarrow{\mathcal{D}}_{\overline{\Gamma}}(t):\Sigma{{p}:\overrightarrow{\mathcal{D}}(\tau)_1}.{\overrightarrow{\mathcal{D}}(\Gamma)_2\multimap \overrightarrow{\mathcal{D}}(\tau)_2}\\ \overleftarrow{\mathcal{D}}(\Gamma)_1\vdash \overleftarrow{\mathcal{D}}_{\overline{\Gamma}}(t):\Sigma{{p}:\overleftarrow{\mathcal{D}}(\tau)_1}.{\overleftarrow{\mathcal{D}}(\tau)_2\multimap \overleftarrow{\mathcal{D}}(\Gamma)_2},\end{array}\]

where the action of $\overrightarrow{\mathcal{D}}({-})_2$ and $\overleftarrow{\mathcal{D}}(-)_2$ on typing contexts $\Gamma=x_1:\tau_1,\ldots,x_n:\tau_n$ has been refined to

\[ \overrightarrow{\mathcal{D}}(\Gamma)_2\stackrel {\mathrm{def}}= \boldsymbol{(}\overrightarrow{\mathcal{D}}\tau_1)_2[{}^{x_1}\!/\!_{{p}}] \boldsymbol{\mathop{*}} \cdots \boldsymbol{\mathop{*}} \overrightarrow{\mathcal{D}}\tau_n)_2[{}^{x_n}\!/\!_{{p}}]\boldsymbol{)}\qquad\quad \overleftarrow{\mathcal{D}}(\Gamma)_2\stackrel {\mathrm{def}}= \boldsymbol{(}\overleftarrow{\mathcal{D}}(\tau_1)_2[{}^{x_1}\!/\!_{{p}}] \boldsymbol{\mathop{*}} \cdots \boldsymbol{\mathop{*}} \overleftarrow{\mathcal{D}}(\tau_n)_2[{}^{x_n}\!/\!_{{p}}]\boldsymbol{)}.\]

All given definitions remain valid, where we simply reinterpret some tuples as having a $\Sigma$ -type rather than the more limited original tuple type.

We prove the following novel results.

Theorem C (Bicartesian closed structure of $\Sigma$ -categories, Propositions 17 and 18, Theorems 25, 26, and 39, and Corollaries 35 and 36). For a category $\mathcal{C}$ and a strictly indexed category $\mathcal{L}:\mathcal{C}^{op}\to \mathbf{Cat}$ , $\Sigma_\mathcal{C} \mathcal{L}$ and $\Sigma_\mathcal{C} \mathcal{L}^{op}$ have

  • (fibered) finite products, if $\mathcal{C}$ has finite coproducts and $\mathcal{L}$ has strictly indexed products and coproducts;

  • (fibered) finite coproducts, if $\mathcal{C}$ has finite coproducts and $\mathcal{L}$ is extensive;

  • exponentials, if $\mathcal{L}$ is a biadditive model of the dependently typed enriched effect calculus (we intentially keep this vague here to aid legibility – the point is that these are relatively standard conditions).

Furthermore, the coproducts in $\Sigma_\mathcal{C} \mathcal{L}$ and $\Sigma_\mathcal{C} \mathcal{L}^{op}$ distribute over the products, as long as those in $\mathcal{C}$ do, even in the absence of exponentials. Notably, the exponentials are not generally fibered over $\mathcal{C}$ .

The crucial notion here is our (novel) notion of extensivity of an indexed category, which generalizes well-known notions of extensive categories. In particular, we call $\mathcal{L}:\mathcal{C}^{op}\to \mathbf{Cat}$ extensive if the canonical functor $\mathcal{L}(\sqcup_{i=1}^n C_i)\to \prod_{i=1}^n \mathcal{L}(C_i)$ is an equivalence. Furthermore, we note that we need to reestablish the product and exponential structures of $\Sigma_\mathcal{C} \mathcal{L}$ and $\Sigma_\mathcal{C} \mathcal{L}^{op}$ due to the generalization from locally indexed to arbitrary strictly indexed categories $\mathcal{L}$ .

Using these results, we construct a suitable target language ${\mathbf{LSyn}}:{\mathbf{CSyn}}^{op}\to\mathbf{Cat}$ for CHAD on a source language with sum types (and tuple and function types) and derive the forward and reverse CHAD algorithms for such a language and reestablish Theorems A and B in this more general context. This target language is a standard dependently typed enriched effect calculus with cartesian sum types and extensive families of linear types (i.e., dependent linear types that can be defined through case distinction). Again, the correctness proof of Theorem A uses the universal property of Theorem B and a logical relations (categorical sconing) construction over the denotational semantics of the source and target languages. This logical relations construction is relatively straightforward and relies on well-known sconing methods for bicartesian closed categories. In particular, we obtain the following formulas for a sum type $\left\{\ell_1\tau_1\mid\cdots\mid \ell_n\tau_n\right\}$ with constructors $\ell_1,\ldots,\ell_n$ that take arguments of type $\tau_1,\ldots,\tau_n$ :

\begin{align*} &\overrightarrow{\mathcal{D}}\left\{\ell_1\tau_1\mid\cdots\mid \ell_n\tau_n\right\})_1 \stackrel {\mathrm{def}}= \left\{\ell_1\overrightarrow{\mathcal{D}}\tau_1)_1\mid \cdots \mid\ell_n\overrightarrow{\mathcal{D}}\tau_n)_1\right\}\\ &\overrightarrow{\mathcal{D}}\left\{\ell_1\tau_1\mid\cdots\mid \ell_n\tau_n\right\})_2\stackrel {\mathrm{def}}= {\mathbf{case}\,{p}\,\mathbf{of}\,\{{\ell_1{p}\to \overrightarrow{\mathcal{D}}\tau_1)_2\mid\cdots\mid \ell_n{p}\to\overrightarrow{\mathcal{D}}\tau_n)_2}\}}\\ &\overleftarrow{\mathcal{D}}(\left\{\ell_1\tau_1\mid\cdots\mid \ell_n\tau_n\right\})_1 \stackrel {\mathrm{def}}= \left\{\ell_1\overleftarrow{\mathcal{D}}(\tau_1)_1\mid \cdots \mid\ell_n\overleftarrow{\mathcal{D}}(\tau_n)_1\right\}\\ &\overleftarrow{\mathcal{D}}(\left\{\ell_1\tau_1\mid\cdots\mid \ell_n\tau_n\right\})_2\stackrel {\mathrm{def}}= {\mathbf{case}\,{p}\,\mathbf{of}\,\{{\ell_1{p}\to \overrightarrow{\mathcal{D}}\tau_1)_2\mid\cdots\mid \ell_n{p}\to\overleftarrow{\mathcal{D}}(\tau_n)_2}\}},\end{align*}

mirroring our intuition that the (co)tangent bundle to a coproduct of spaces decomposes (extensively) into the (co)tangent bundles to the component spaces.

2.5 CHAD for (co)inductive types: where do we begin?

If we are to really push forward the dream of differentiable programming, we need to learn how to perform AD on programs that operate on data types. To this effect, we analyze CHAD for inductive and coinductive types. If we want to follow our previous methodology to find suitable definitions and correctness proofs, we first need a good categorical axiomatization of such types. It is well known that inductive types correspond to initial algebras of functors, while coinductive types are precisely terminal coalgebras. The question, however, is what class of functors to consider. That choice makes the vague notion of (co)inductive types precise.

Following Santocanale (Reference Santocanale2002), we work with the class of $\mu\nu$ -polynomials, a relatively standard choice, that is functors that can be defined inductively through the combination of

  • constants for primitive types ${\mathbf{real}}^n$ ;

  • type variables ${\alpha}$ ;

  • unit and tuple types $\mathbf{1}$ and $\tau\boldsymbol{\mathop{*}}\sigma$ of $\mu\nu$ -polynomials;

  • sum types $\left\{\ell_1\tau_1\mid\cdots\mid \ell_n\tau_n\right\}$ of $\mu\nu$ -polynomials;

  • initial algebras $\mu{\alpha}.\tau$ of $\mu\nu$ -polynomials;

  • terminal coalgebras $\nu{\alpha}.\tau$ of $\mu\nu$ -polynomials.

Notably, we exclude function types, as the non-fibered nature of exponentials in $\Sigma_\mathcal{C} \mathcal{L}$ and $\Sigma_\mathcal{C} \mathcal{L}^{op}$ would significantly complicate the technical development. While this excludes certain examples like the free state monad (which for type $\sigma$ state would be the intial algebra $\mu{\alpha}.\left\{Get (\sigma\to {\alpha})\mid Put (\sigma\boldsymbol{\mathop{*}} {\alpha})\right\}$ ), it still includes the vast majority of examples of eager and lazy types that one uses in practice, for example, lists $\mu{\alpha}.\left\{Empty\,\mathbf{1}\mid Cons (\sigma\boldsymbol{\mathop{*}} {\alpha})\right\}$ , (finitely branching) labeled trees like $\mu{\alpha}.\left\{Leaf\,\mathbf{1}\mid Node (\sigma\boldsymbol{\mathop{*}} {\alpha}\boldsymbol{\mathop{*}} {\alpha})\right\}$ , streams $\nu{\alpha}.\sigma\boldsymbol{\mathop{*}} {\alpha}$ , and many more.

We characterize conditions on a strictly indexed category $\mathcal{L}:\mathcal{C}^{op}\to\mathbf{Cat}$ that guarantee that $\Sigma_\mathcal{C}\mathcal{L}$ and $\Sigma_\mathcal{C} \mathcal{L}^{op}$ have this precise notion of inductive and coinductive types. The first step is to give a characterization of initial algebras and terminal coalgebras of split fibration endofunctors on $\Sigma_\mathcal{C}\mathcal{L}$ and $\Sigma_\mathcal{C} \mathcal{L}^{op}$ . For legibility, we state the results here for simple endofunctors and (co)algebras, but they generalize to parameterized endofunctors and (co)algebras.

Theorem D (Characterization of initial algebras and terminal coalgebras in $\Sigma$ -categories, Corollary 49 and Theorem 52). Let E be a split fibration endofunctor on $\Sigma_\mathcal{C}\mathcal{L}$ (resp. $\Sigma_\mathcal{C} \mathcal{L}^{op}$ ) and let $(\overline{E},e)$ be the corresponding strictly indexed endofunctor on $\mathcal{L}$ . Then, E has a (fibered) initial algebra if

  • $\overline{E}:\mathcal{C}\to\mathcal{C}$ has an initial algebra $\mathbf{\mathfrak{in}} _{\overline{E}}:\overline{E}(\mu\overline{E})\to \mu\overline{E}$ ;

  • $\mathcal{L}(\mathbf{\mathfrak{in}} _{\overline{E}} )^{-1} e_{\mu\overline{E}} :\mathcal{L}(\mu\overline{E})\to \mathcal{L}(\mu\overline{E})$ has an initial algebra (resp. terminal coalgebra);

  • $\mathcal{L}(f)$ preserves initial algebras (resp. terminal coalgebras) for all morphisms $f\in \mathcal{C}$ ;

and E has a (fibered) terminal coalgebra if

  • $\overline{E}:\mathcal{C}\to\mathcal{C}$ has a terminal coalgebra $\mathbf{\mathfrak{out}} _{\overline{E}}:\nu\overline{E}\to \overline{E}(\nu\overline{E})$ ;

  • $\mathcal{L}(\mathbf{\mathfrak{out}} _{\overline{E}} ) e_{\mu\overline{E}} :\mathcal{L}(\nu\overline{E})\to \mathcal{L}(\nu\overline{E})$ has a terminal coalgebra (resp. initial algebra)

  • $\mathcal{L}(f)$ preserves terminal coalgebras (resp. initial algebras) for all morphisms $f\in \mathcal{C}$ .

We use this result to give sufficient conditions for (fibered) $\mu\nu$ -polynomials (including their fibered initial algebras and terminal coalgebras) to exist in $\Sigma_\mathcal{C}\mathcal{L}$ and $\Sigma_\mathcal{C}\mathcal{L}^{op}$ . In particular, we show that it suffices to extend the target language ${\mathbf{LSyn}}:{\mathbf{CSyn}}^{op}\to \mathbf{Cat}$ with both cartesian and linear inductive and coinductive types to perform CHAD on a source language $\mathbf{Syn}$ with inductive and coinductive types. Again, an equivalent of Theorem B holds.

We write $\mathbf{roll}\,{x}$ for the constructor of inductive types (applied to an identifier x), $\mathbf{unroll}\,x$ for the destructor of coinductive types, and ${\tau}.\mathbf{roll}^{-1}\,x\stackrel {\mathrm{def}}= \mathbf{fold}\,x\,\mathbf{with}\,y\to\tau{}[^{y\vdash \mathbf{roll}_{}\,y}\!/\!_{{\alpha}}]$ , where we write $\tau[{}^{y\vdash \mathbf{roll}_{}\,y}\!/\!_{{\alpha}}]$ for the functorial action of the parameterized type $\tau$ with type parameter ${\alpha}$ on the term $\mathbf{roll}_{}\,y$ in context y. This yields the following formula for spaces of primals and (co)tangent vectors to (co)inductive types where:

\begin{align*}&\overrightarrow{\mathcal{D}}({\alpha})_1\stackrel {\mathrm{def}}= {\alpha} \qquad\qquad & \overrightarrow{\mathcal{D}}({\alpha})_2 = \underline{\alpha}\\&\overrightarrow{\mathcal{D}}\mu{\alpha}.(\tau)_1\stackrel {\mathrm{def}}= \mu{\alpha}.\overrightarrow{\mathcal{D}}(\tau)_1\qquad\qquad&\overrightarrow{\mathcal{D}}\mu{\alpha}.(\tau)_2\stackrel {\mathrm{def}}= \underline{\mu}\underline{\alpha}.\overrightarrow{\mathcal{D}}(\tau)_2[{}^{{\overrightarrow{\mathcal{D}}(\tau)_1}.\mathbf{roll}^{-1}{p}}\!/\!_{{p}}]\\&\overrightarrow{\mathcal{D}}\nu{\alpha}.(\tau)_1\stackrel {\mathrm{def}}= \nu{\alpha}.\overrightarrow{\mathcal{D}}(\tau)_1\qquad\qquad&\overrightarrow{\mathcal{D}}\nu{\alpha}.(\tau)_2\stackrel {\mathrm{def}}= \underline{\nu}\underline{\alpha}.\overrightarrow{\mathcal{D}}(\tau)_2[{}^{\mathbf{unroll}\,{p}}\!/\!_{{p}}]\\&\overleftarrow{\mathcal{D}}({\alpha})_1\stackrel {\mathrm{def}}= {\alpha} \qquad\qquad & \overleftarrow{\mathcal{D}}({\alpha})_2 = \underline{\alpha}\\&\overleftarrow{\mathcal{D}}(\mu{\alpha}.(\tau)_1\stackrel {\mathrm{def}}= \mu{\alpha}.\overleftarrow{\mathcal{D}}(\tau)_1\qquad\qquad&\overleftarrow{\mathcal{D}}(\mu{\alpha}.(\tau)_2\stackrel {\mathrm{def}}= \underline{\nu}\underline{\alpha}.\overleftarrow{\mathcal{D}}(\tau)_2[{}^{{\overrightarrow{\mathcal{D}}(\tau)_1}.\mathbf{roll}^{-1}{p}}\!/\!_{{p}}]\\&\overleftarrow{\mathcal{D}}(\nu{\alpha}.(\tau)_1\stackrel {\mathrm{def}}= \nu{\alpha}.\overleftarrow{\mathcal{D}}(\tau)_1\qquad\qquad&\overleftarrow{\mathcal{D}}(\nu{\alpha}.\tau)_2\stackrel {\mathrm{def}}= \underline{\mu}\underline{\alpha}.\overleftarrow{\mathcal{D}}(\tau)_2[{}^{\mathbf{unroll}\,{p}}\!/\!_{{p}}]\end{align*}

Insight 6. Types of primals to (co)inductive types are (co)inductive types of primals, types of tangents to (co)inductive types are linear (co)inductive types of tangents, and types of cotangents to inductive types are linear coinductive types of cotangents and vice versa.

For example, for a type $\tau=\mu{\alpha}.\left\{Empty\,\mathbf{1}\mid Cons (\sigma\boldsymbol{\mathop{*}} {\alpha})\right\}$ of lists of elements of type $\sigma$ , we have a cotangent space:

\[\overleftarrow{\mathcal{D}}(\tau)_2 = \underline{\nu}\underline{\alpha}.{\mathbf{case}\,{\mathbf{roll}^{-1}\,{{p}}}\,\mathbf{of}\,\{{Empty\,\_\to \underline{\mathbf{1}}\mid Cons\, {p}\to \overleftarrow{\mathcal{D}}(\sigma)_2[{}^{\mathbf{fst}\,{p}}\!/\!_{{p}}]\boldsymbol{\mathop{*}}\underline{\alpha}}\}}\qquad\text{where}\]

$\mathbf{roll}^{-1}\,{{p}}=\mathbf{fold}\,{p}\,\mathbf{with}\,y\to{\mathbf{case}\,y\,\mathbf{of}\,\{{Empty\,y\to Empty\,y\mid Cons \, y \to Cons\langle\mathbf{fst}\, y, \mathbf{roll}\,(\mathbf{snd}\,y)\rangle}\}}\hspace{-40pt}\\[8pt]and,~for~a~type~\tau=\nu{\alpha}.\sigma\boldsymbol{\mathop{*}} {\alpha}$ of streams, we have a cotangent space:

$$\overleftarrow{\mathcal{D}}(\tau)_2 = \underline{\mu}\underline{\alpha}.\overleftarrow{\mathcal{D}}(\sigma)_2[{}^{\mathbf{fst}\,(\mathbf{unroll}\,{p})}\!/\!_{{p}}]\boldsymbol{\mathop{*}}\underline{\alpha}.$$

We demonstrate that the strictly indexed category $\mathbf{FVect}:\mathbf{Set}^{op}\to \mathbf{Cat}$ of families of vector spaces also satisfies our conditions, so it gives a concrete denotational semantics of the target language ${\mathbf{LSyn}}:{\mathbf{CSyn}}^{op}\to\mathbf{Cat}$ , by Theorem B. To reestablish the correctness Theorem A, existing logical relations techniques do not suffice, as far as we are aware. Instead, we achieve it by developing a novel theory of categorical logical relations (sconing) for languages with expressive type systems like our AD source language.

Insight 7. We can obtain powerful logical relations techniques for reasoning about expressive type systems by analyzing when the forgetful functor from a category of logical relations to the underlying category is comonadic and monadic.

In almost all instances, the forgetful functor from a category of logical relations to the underlying category is comonadic and in many instances, including ours, it is even monadic. This gives us the following logical relations techniques for expressive type systems:

Theorem E (Logical relations for expressive types, Section 11). Let $G:\mathcal{C}\to\mathcal{D}$ be a functor. We observe

  • If $\mathcal{D}$ has binary products, then the forgetful functor from the scone (the comma category) $\mathcal{D}\downarrow G\to \mathcal{D}\times\mathcal{C}$ is comonadic (Theorem 97).

  • If G has a left adjoint and $\mathcal{C} $ has binary coproducts, then $\mathcal{D}\downarrow G\to \mathcal{D}\times\mathcal{C}$ is monadic (Corollary 99).

This is relevant because:

  • comonadic functors create initial algebras (Theorem 109);

  • monadic functors create terminal coalgebras (Theorem 109);

  • monadic–comonadic functors create $\mu\nu$ -polynomials (Corollary 110);

  • if $\mathcal{E}$ is monadic–comonadic over $\mathcal{E}'$ , then $\mathcal{E}$ is finitely complete cartesian closed if $\mathcal{E}'$ is (Proposition 103).

As a consequence, we can lift our concrete denotational semantics of all types, including inductive and coinductive types to our categories of logical relations over the semantics.

These logical relations techniques are suffient to yield the correctness Theorem 1. Indeed, as long as derivatives of primitive operations are correctly implemented in the sense that $\unicode{x27E6} D\mathrm{op}\unicode{x27E7}=D\mathrm{op}$ and $\unicode{x27E6} {D\mathrm{op}}^{t}\unicode{x27E7}={D\unicode{x27E6} \mathrm{op}\unicode{x27E7}}^{t}$ , Theorem E tells us that the unique structure-preserving functors:

\begin{align*}&(\unicode{x27E6} -\unicode{x27E7},\unicode{x27E6} \overrightarrow{\mathcal{D}}(-)\unicode{x27E7}):\mathbf{Syn}\to \mathbf{Set}\times \Sigma_\mathbf{Set} \mathbf{FVect}\\ & (\unicode{x27E6} -\unicode{x27E7},\unicode{x27E6} \overleftarrow{\mathcal{D}}(-)\unicode{x27E7}):\mathbf{Syn}\to \mathbf{Set}\times \Sigma_\mathbf{Set}\mathbf{FVect}^{op}\end{align*}

lift to the scones of $\mathrm{Hom}(({\mathbb{R}}^k,({\mathbb{R}}^k,\underline{\mathbb{R}}^k)),-) :\mathbf{Set}\times \Sigma_\mathbf{Set} \mathbf{FVect}\to\mathbf{Set}$ and $\mathrm{Hom}(({\mathbb{R}}^k, ({\mathbb{R}}^k,\underline{\mathbb{R}}^k)),-) :\mathbf{Set}\times \Sigma_\mathbf{Set} \mathbf{FVect}^{op}\to \mathbf{Set}$ where we lift the image of ${\mathbf{real}}^n$ , respectively, to the logical relations:

\begin{align*}&\left\{(f,(g,h))\mid f=g\text{ and } h = Df\phantom{{}^t}\right\}\hookrightarrow (\mathbf{Set}\times \Sigma_\mathbf{Set} \mathbf{FVect}\phantom{{}^{op}})\left(({\mathbb{R}}^k,({\mathbb{R}}^k,\underline{\mathbb{R}}^k)), ({\mathbb{R}}^n,({\mathbb{R}}^n,\underline{\mathbb{R}}^n))\right)\\&\left\{(f,(g,h))\mid f=g\text{ and } h = {Df}^{t}\right\}\hookrightarrow (\mathbf{Set}\times \Sigma_\mathbf{Set} \mathbf{FVect}^{op})\left(({\mathbb{R}}^k,({\mathbb{R}}^k,\underline{\mathbb{R}}^k)), ({\mathbb{R}}^n,({\mathbb{R}}^n,\underline{\mathbb{R}}^n))\right).\end{align*}

We see that $\unicode{x27E6} \overrightarrow{\mathcal{D}}(t)\unicode{x27E7}$ and $\unicode{x27E6} \overleftarrow{\mathcal{D}}(t)\unicode{x27E7}$ propagate derivatives and transposed derivatives of differentiable k-surfaces (differentiable functions ${\mathbb{R}}^k\to \mathrm{dom}\unicode{x27E6} t\unicode{x27E7}$ ) correctly for all programs t. Seeing that $(\mathrm{id},(\mathrm{id},x\mapsto \mathrm{id}))$ is one such k-surface in the logical relation associated with ${\mathbf{real}}^k$ , we see that $(\unicode{x27E6} t\unicode{x27E7},(\pi_1\circ\unicode{x27E6} \overrightarrow{\mathcal{D}}(t)\unicode{x27E7},\pi_2\circ\unicode{x27E6} \overrightarrow{\mathcal{D}}(t)\unicode{x27E7}))$ and $(\unicode{x27E6} t\unicode{x27E7},(\pi_1\circ \unicode{x27E6} \overleftarrow{\mathcal{D}}(t)\unicode{x27E7}),\pi_2\circ \unicode{x27E6} \overleftarrow{\mathcal{D}}(t)\unicode{x27E7}))$ are k-surfaces in the relations as well, for any $x:{\mathbf{real}}^k\vdash t:{\mathbf{real}}^n$ . That is, Theorem A holds.

Our novel logical relations machinery is in no way restricted to the context of CHAD, however. In fact, it is widely applicable for reasoning about total functional languages with expressive type systems.

2.6 Inductive types and derivatives

So far, we have only phrased the CHAD correctness Theorem A only for programs t with domain/codomain isomorphic to some Euclidean space ${\mathbb{R}}^n$ , even if t may make use of any complex types (including variant, inductive, coinductive, and function types) in its computation. The reason for this restriction is that this limited context of functions $f:{\mathbb{R}}^n\to {\mathbb{R}}^m$ is an obvious setting where we have a simple, canonical, unambiguous notion of derivative $\mathcal{T}{f}:{\mathbb{R}}^n\to {\mathbb{R}}^m\times (\underline{\mathbb{R}}^n\multimap \underline{\mathbb{R}}^m)$ , allowing us to phrase an obvious correctness criterion.

More generally, for $f:X\to Y$ where X and Y are manifolds, we also have an unambiguous notion of derivative $\mathcal{T}{f}:\Pi_{x\in X} \Sigma_{y\in Y}\mathcal{T}_{x}X\multimap \mathcal{T}_{y}Y$ , which allows us to strengthen our correctness result. In fact, for our purposes, it suffices to consider the relatively simple context of differentiable functions $f:\coprod\limits_{i\in I}{\mathbb{R}}^{n_i}\to \coprod\limits_{j\in J}{\mathbb{R}}^{m_j}$ between very simple manifolds that arise as disjoint unions of (finite-dimensional) Euclidean spaces. Such functions f decompose uniquely as copairings $f=[\iota_{\phi(i)}\circ g_i]_{i\in I}$ where we write $\iota_k$ for the k-th coprojection and where $\phi:I\to J$ is some function and $g_i:{\mathbb{R}}^{n_i}\to {\mathbb{R}}^{m_{\phi(j)}}$ . That is, f can be understood as the family $(g_i)_{i\in I}$ and its derivative $\mathcal{T}{f}$ decomposes uniquely as the family of plain derivatives $\mathcal{T}_{g_i}$ in the usual sense. We have a similar decomposition for the transposed derivatives $\mathcal{T}^*f$ .

This notion of derivatives of functions between disjoint unions of Euclidean spaces is relevant to our context, as we have the following result.

Theorem F (Canonical form of $\mu$ -polynomial semantics, Corollary 27). For any types $\tau_i$ built from Euclidean spaces ${\mathbf{real}}^n$ , tuple types $\tau_i\boldsymbol{\mathop{*}}\tau_j$ , variant types $\left\{\ell_1\tau_1\mid\cdots\mid \ell_n\tau_n\right\}$ , type variables ${\alpha}$ , and inductive types $\mu{\alpha}.\tau_i$ (so-called $\mu$ -polynomials), its denotation $\unicode{x27E6} \tau_i\unicode{x27E7}$ is isomorphic to a manifold of the form $\coprod\limits_{i\in I}{\mathbb{R}}^{n_i}$ for some countable set I and some $n_i\in \mathbb{N}$ .

Consequently, we can strengthen Theorem A in the following form:

Theorem G (Correctness of CHAD (Generalized), Theorem 129). For any well-typed program

$$x_1:\tau_1,\ldots,x_k:\tau_n\vdash {t}:\sigma,$$

where $\tau_i,\sigma$ are all (closed) $\mu$ -polynomials, we have that $\unicode{x27E6} \overrightarrow{\mathcal{D}}_{x_1,\ldots,x_k}(t)\unicode{x27E7}=\mathcal{T}_{\unicode{x27E6} t\unicode{x27E7}}\;\text{ and }\;\unicode{x27E6} \overleftarrow{\mathcal{D}}_{x_1,\ldots,x_k}(t)\unicode{x27E7}=\mathcal{T}^*{\unicode{x27E6} t\unicode{x27E7}}.$

Again, t can make use of coinductive types and function types in the middle of its computation, but they may not occur in the input or output types. The reason is that, as far as we are aware, there is no canonical3 notion of semantic derivative for functions between the sort of infinite-dimensional spaces that co-datatypes such as coinductive types and function types implement. This makes it challenging to even phrase what semantic correctness at such types would mean.

2.7 How does CHAD for expressive types work in practice?

The CHAD code transformations we describe in this papers are well behaved in practical implementations in the sense of the following compile-time complexity result.

Theorem H (No code blowup, Corollary 130). The size of the code of the CHAD transformed programs $\overrightarrow{\mathcal{D}}_{\overline{\Gamma}}(t)$ and $\overleftarrow{\mathcal{D}}_{{\overline{\Gamma}}}(t)$ grows linearly with the size of the original source program t.

We have ensured to pair up the primal and (co)tangent computations in our CHAD transformation and to exploit any possible sharing of common subcomputations, using $\mathbf{let}$ -bindings. However, we leave a formal study of the runtime complexity of our technique to future work.

As formulated in this paper, CHAD generates code with linear dependent types. This seems very hard to implement in practice. However, this is an illusion: we can use the code generated by CHAD and interpret it as less precise types. We sketch how all type dependency can be erased and how all linear types other than the linear (co)inductive types can be implemented as abstract types in a standard functional language like Haskell. In fact, we describe three practical implementation strategies for our treatment of sum types, none of which require linear or dependent types. All three strategies have been shown to work in the CHAD reference implementation. We suggest how linear (co)inductive types might be implemented in practice, based on their concrete denotational semantics, but leave the actual implementation to future work.

3. Background: Categorical Semantics of Expressive Total Languages

In this section, we fix some notation and recall the well-known abstract categorical semantics of total functional languages with expressive type systems (Crole Reference Crole1993; Pitts Reference Pitts1995; Santocanale Reference Santocanale2002), which builds on the usual semantics of the simply typed $\lambda$ -calculus in Cartesian closed categories (Lambek and Scott Reference Lambek and Scott1988). In this paper, we will be interested in a few particular instantiations (or models) of such an abstract categorical semantics $\mathcal{C}$ :

  • the initial model $\mathbf{Syn}$ (Section 5), which represents the programming language under consideration, up to $\beta\eta$ -equivalence; this will be the source language of our AD code transformation;

  • the concrete denotational model $\mathbf{Set}$ (Section 9) in terms of sets and functions, which represents our default denotational semantics of the source language;

  • models $\Sigma_{\mathcal{C}}\mathcal{L}$ and $\Sigma_{\mathcal{C}}\mathcal{L}^{op}$ (Section 6) in the the $\Sigma$ -types of suitable indexed categories $\mathcal{L}:\mathcal{C}^{op}\to\mathbf{Cat}$ ;

  • in particular, the models $\Sigma_{{\mathbf{CSyn}}}{\mathbf{LSyn}}$ and $\Sigma_{{\mathbf{CSyn}}}{\mathbf{LSyn}}^{op}$ (Section 7) built out of the target language, which yield forward and reverse-mode CHAD code transformations, respectively;

  • sconing (categorical logical relations) constructions $\overleftrightarrow{\mathbf{Scone}} $ and $\overleftrightarrow{\mathbf{Scone}} $ (Section 11) over the models $\mathbf{Set}\times \Sigma_{\mathbf{Set}}\mathbf{FVect}$ and $\mathbf{Set}\times \Sigma_{\mathbf{Set}}\mathbf{FVect}^{op}$ that yield the correctness arguments for forward- and reverse-mode CHAD, respectively, where $\mathbf{FVect}:\mathbf{Set}^{op}\to \mathbf{Cat}$ is the strictly indexed category of families of real vector spaces.

We deem it relevant to discuss the abstract categorical semantic framework for our language as we need these various instantiations of the framework.

3.1 Basics

We use standard definitions from category theory; see, for instance, Mac Lane (Reference Mac Lane1971), Leinster (Reference Leinster2014). A category $\mathcal{C}$ can be seen as a semantics for a typed functional programming language, whose types correspond to objects of $\mathcal{C}$ and whose programs that take an input of type A and produce an output of type B are represented by the homset $\mathcal{C}(A,B)$ . Identity morphisms $\mathrm{id}_{A}$ represent programs that simply return their input (of type A) unchanged as output and composition $g\circ f$ of morphisms f and g represents running the program g after the program f. Notably, the equations that hold between morphisms represent program equivalences that hold for the particular notion of semantics that $\mathcal{C}$ represents. Some of these program equivalences are so fundamental that we demand them as structural equalities that need to hold in any categorical model (such as the associativity law $f\circ (g \circ h)=(f\circ g)\circ h$ ). In programming languages terms, these are known as the $\beta$ - and $\eta$ -equivalences of programs.

3.2 Tuple types

Tuple types represent a mechanism for letting programs take more than one input or produce more than one output. Categorically, a tuple type corresponds to a product $\prod_{i\in I}A_i$ of a finite family of types $\left\{A_i\right\}_{i\in I}$ , which we also write $\mathbb{1}$ or $A_1\times A_2$ in the case of nullary and binary products. For basic aspects of products, we refer the reader to Mac Lane (Reference Mac Lane1971, Chapter III).

We write $\left({f_i}\right)_{i\in I}:C\to \prod_{i\in I}A_i$ for the product pairing of $\left\{f_i:C:A_i\right\}_{i\in I}$ and $\pi_{j}:\prod_{i\in I}A_i\to A_j$ for the j-th product projection, for $j\in I$ . As such, we say that a categorical semantics $\mathcal{C}$ models (finite) tuples if $\mathcal{C}$ has (chosen) finite products.

3.3 Primitive types and operations

We are interested in programming languages that have support for a certain set $\textrm{Ty}$ of ground types such as integers and (floating point) real numbers as well as certain sets $\mathsf{Op}(T_1,\ldots, T_n; S)$ , for $T_1,\ldots,T_n,S\in\textrm{Ty}$ , of operations on these basic types such as addition, multiplication, and sine functions. We model such primitive types and operations categorically by demanding that our category has a distinguished object $C_T$ for each $T\in \textrm{Ty}$ to represent the primitive types and a distinguished morphism $f_{\mathrm{op}}\in \mathcal{C}(C_{T_1}\times \ldots\times C_{T_n}, C_S)$ for all primitive operations $\mathrm{op}\in \mathsf{Op}(T_1,\ldots, T_n; S)$ . For basic aspects of categorical type theory, see, for instance, Crole (Reference Crole1993, Chapters 3&4).

3.4 Function types

Function types let us type popular higher-order programming idioms such as maps and folds, which capture common control flow abstractions. Categorically, a type of functions from A to B is modeled as an exponential $A\Rightarrow B$ . We write $\mathrm{ev}:(A\Rightarrow B)\times A\to B$ (evaluation) for the counit of the adjunction $(-)\times A\dashv A\Rightarrow(-)$ and $\Lambda$ for the Currying natural isomorphism $\mathcal{C}(A\times B, C)\to \mathcal{C}(A,B\Rightarrow C)$ . We say that a categorical semantics $\mathcal{C}$ with tuple types models function types if $\mathcal{C}$ has a chosen right adjoint $(-)\times A\dashv A\Rightarrow(-)$ .

3.5 Sum types (aka variant types)

Sum types (aka variant types) let us model data that exists in multiple different variants and branch in our code on these different possibilities. Categorically, a sum type is modeled as a coproduct $\coprod_{i\in I}A_i$ of a collection of a finite family $\left\{A_i\right\}_{i\in I}$ of types, which we also write ${\mathbb{0}}$ or $A_1\sqcup A_2$ in the case of nullary and binary coproducts. We write $\left[{f_i}\right]_{i\in I}:\coprod_{i\in I}C_i\to A$ for the copairing of $\left\{f_i:C_i\to A\right\}_{i\in I}$ and $\iota_{j}:A_j\to \coprod_{i\in I}A_i$ for the j-th coprojection. In fact, in presence of tuple types, a more useful programming interface is obtained if one restricts to distributive coproducts, that is, coproducts $\coprod_{i\in I}A_i$ such that the map $\left[{\left({\iota_{i}\circ\pi_{1}}\right){\pi_{2}}}\right]_{i\in I}:\coprod_{i\in I}(A_i\times B)\to (\coprod_{i\in I}A_i)\times B$ is an isomorphism; see, for instance, Carboni et al. (Reference Carboni, Lack and Walters1993), Lack (Reference Lack2012). Note that in presence of function types, coproducts are automatically distributive since the left adjoint functors $(-)\times A$ preserve colimits; see, for instance, Leinster (Reference Leinster2014, 6.3). As such, we say that a categorical semantics $\mathcal{C}$ models (finite) sum types if $\mathcal{C}$ has (chosen) finite distributive coproducts.

3.6 Inductive and coinductive types

We employ the usual semantic interpretation of inductive and coinductive types as, respectively, initial algebras and terminal coalgebras of a certain class of functors. We refer the reader, for instance, to Barr and Wells (Reference Barr and Wells2005, Chapter 9), Santocanale (Reference Santocanale2002), and Adamek et al. (Reference Adamek, Milius and Moss2010).

Most of this section is dedicated to describing precisely which class of functors we consider initial algebras and terminal coalgebras, a class we call $\mu\nu$ -polynomials. Roughly speaking, we define $\mu\nu$ -polynomials to be functors that can be constructed from products, coproducts, projections, diagonals, constants, initial algebras, and terminal coalgebras.

To fix terminology and for future reference of the detailed constructions, we recall below basic aspects of parameterized initial algebras and parameterized terminal coalgebras.

Definition 1. (The category of E-algebras). Let $E : \mathcal{D}\to \mathcal{D}$ be an endofunctor. The category of E-algebras, denoted by $E\textrm{-}\mathrm{Alg}$ , is defined as follows. The objects are pairs $(W, \zeta ) $ in which $W\in \mathcal{D} $ and $ \zeta : E(W)\to W $ is a morphism of $\mathcal{D} $ . A morphism between E-algebras $(W, \zeta ) $ and $(Y, \xi) $ is a morphism $g: W\to Y $ of $\mathcal{D} $ such that

(1)

commutes. Dually, we define the category $E\textrm{-}\mathrm{CoAlg}$ of E-coalgebras by:

(2) \begin{equation} E\textrm{-}\mathrm{CoAlg} := \left(E^{\mathrm{op}}\textrm{-}\mathrm{Alg}\right) ^{\mathrm{op}}\end{equation}

in which $E^{\mathrm{op}} : \mathcal{D} ^{\mathrm{op}}\to \mathcal{D} ^{\mathrm{op}} $ is the image of E by $\mathrm{op} :\mathbf{Cat}\to \mathbf{Cat} $ .

Definition 2. (Initial algebra and terminal coalgebra). Let $E : \mathcal{D}\to \mathcal{D}$ be an endofunctor. Provided that they exist, the initial object $(\mu E, \mathbf{\mathfrak{in}} _E ) $ of $E\textrm{-}\mathrm{Alg}$ and the terminal object $(\nu E, \mathbf{\mathfrak{out}} _E ) $ of $E\textrm{-}\mathrm{CoAlg} $ are, respectively, referred to as the initial E-algebra and the terminal E-coalgebra.

Remark 3. By Lambek’s Theorem, provided that the initial algebra $(\mu E, \mathbf{\mathfrak{in}} _E ) $ of an endofunctor E exists, we have that $\mathbf{\mathfrak{in}} _E$ is invertible. Dually, we get the result for terminal coalgebras.

Assuming the existence of the initial E-algebra and the terminal E-coalgebra, we denote by:

(3) \begin{equation} \mathrm{fold}_E (Y, \xi): \mu E \to Y, \quad \mathrm{unfold}_E (X, \varrho ): X\to \nu E\end{equation}

the unique morphisms in $\mathcal{D} $ such that

(4)

commute. Whenever it is clear from the context, we denote $ \mathrm{fold}_E (Y, \xi) $ by $\mathrm{fold}_E \xi $ , and $\mathrm{unfold}_E (X, \varrho )$ by $\mathrm{unfold}_E \varrho $ .

Given a functor $ H : \mathcal{D} '\times\mathcal{D} \to \mathcal{D} $ and an object X of $\mathcal{D} ' $ , we denote by $H^X $ the endofunctor:

(5) \begin{equation} H(X, -): \mathcal{D} \to \mathcal{D} .\end{equation}

In this setting, if $\mu H^X$ exists for any object $X\in\mathcal{D}'$ then the universal properties of the initial algebras induce a functor denoted by $\mu H : \mathcal{D}' \to \mathcal{D}$ , called the parameterized initial algebra. In the following, we spell out how to construct parameterized initial algebras and terminal coalgebras.

Proposition 4 ( $\mu$ -operator and $\nu$ -operator). Let $ H:\mathcal{D} '\times \mathcal{D} \to\mathcal{D} $ be a functor. Assume that, for each object $X\in\mathcal{D} '$ , the functor $H^X = H(X,-) $ is such that $\mu H ^X $ exists. In this setting, we have the induced functor:

\begin{eqnarray*} \mu H : \mathcal{D} ' & \to & \mathcal{D}\\ X & \mapsto & \mu H^X\\ \left( f: X\to Y \right) & \mapsto & \mathrm{fold} _{H^X} \left( \mathbf{\mathfrak{in}}_{H^Y}\circ H(f, \mu H ^Y)\right). \end{eqnarray*}

Dually, assuming that, for each object $X\in\mathcal{D} '$ , $\nu H ^X $ exists, we have the induced functor:

\begin{eqnarray*} \nu H : \mathcal{D} ' & \to & \mathcal{D}\\ X & \mapsto & \nu H^X\\ \left( f: X\to Y \right) & \mapsto & \mathrm{unfold} _{H^Y} \left( H(f, \nu H ^X)\circ \mathbf{\mathfrak{out}}_{H^X}\right). \end{eqnarray*}

Proof. We assume that the functor $ H:\mathcal{D} '\times \mathcal{D} \to\mathcal{D} $ is such that, for any object $X\in\mathcal{D} ' $ , $\mu H ^X $ exists. For each morphism $f: X\to Y $ , we define $\mu H (f) = \mathrm{fold} _{H^X} \left( \mathbf{\mathfrak{in}}_{H^Y}\circ H(f, \mu H ^Y)\right) $ as above. We prove below that this makes $\mu H(f) $ a functor.

Given $X\in\mathcal{D} '$ ,

\begin{align*} & \mu H( \mathrm{id} _ X ) \\ & = \mathrm{fold} _{H^X} \left( \mathbf{\mathfrak{in}}_{H^X}\circ H( \mathrm{id} _ X , \mu H ^X)\right) \\ & = \mathrm{fold} _{H^X} \left( \mathbf{\mathfrak{in}}_{H^X} \right) \\ & = \mathrm{id} _{\mu H ^X} . \end{align*}

Moreover, given morphisms $f: X\to Y $ and $g: Y\to Z $ in $\mathcal{D} '$ , we have that

and, hence, the diagram:

commutes. By the universal property of the initial algebra $\left( \mu H^X, \mathbf{\mathfrak{in}} _{H ^X}\right) $ , we conclude that

It is worth noting that in Proposition 4, $\mathcal{D}'$ can be any category. However, in the standard setting of initial algebra semantics, there is a special interest in the case where $\mathcal{D}' = \mathcal{D}^{n-1}$ and $n>1$ , which is described below.

Proposition 5 (Parameterized initial algebras and terminal coalgebras). Let $ H:\mathcal{D} ^n\to\mathcal{D} $ be a functor in which $n>1 $ . Assume that, for each object $X\in\mathcal{D} ^{n-1}$ , $\mu H ^X $ exists. In this setting, we have the induced functor:

\begin{eqnarray*} \mu H : \mathcal{D} ^{n-1} & \to & \mathcal{D}\\ X & \mapsto & \mu H^X\\ \left( f: X\to Y \right) & \mapsto & \mathrm{fold} _{H^X} \left( \mathbf{\mathfrak{in}}_{H^Y}\circ H(f, \mu H ^Y)\right). \end{eqnarray*}

Dually, if $\nu H ^X $ exists for any $X\in\mathcal{D} ^{n-1}$ , we have the induced functor:

\begin{eqnarray*} \nu H : \mathcal{D}^{n-1} & \to & \mathcal{D}\\ X & \mapsto & \nu H^X\\ \left( f: X\to Y \right) & \mapsto & \mathrm{unfold} _{H^Y} \left( H(f, \nu H ^X)\circ \mathbf{\mathfrak{out}}_{H^X}\right). \end{eqnarray*}

In order to model inductive and coinductive types coming from parameterized types not involving function types, we introduce the following notions.

Definition 6. ( $\mu\nu$ -polynomials). Assuming that $\mathcal{D} $ has finite coproducts and finite products, the category $\mu\nu\mathsf{Poly} _ \mathcal{D} $ is the smallest subcategory of $\mathbf{Cat} $ satisfying the following.

(O). The objects are defined inductively by:

  • (O1) the terminal category $\mathbb{1} $ is an object of $\mu\nu\mathsf{Poly} _ \mathcal{D} $ ;

  • (O2) the category $\mathcal{D} $ is an object of $\mu\nu\mathsf{Poly} _ \mathcal{D} $ ;

  • (O3) for any pair of objects $\left( \mathcal{D} ', \mathcal{D} '' \right) \in \mu\nu\mathsf{Poly} _ \mathcal{D}\times \mu\nu\mathsf{Poly} _ \mathcal{D} $ , the product $\mathcal{D} '\times \mathcal{D} '' $ is an object of $\mu\nu\mathsf{Poly} _ \mathcal{D} $ .

(M) The morphisms satisfy the following properties:

  • (M1) for any object $\mathcal{D} '$ of $\mu\nu\mathsf{Poly} _ \mathcal{D} $ , the unique functor $\mathcal{D} '\to \mathbb{1} $ is a morphism of $\mu\nu\mathsf{Poly} _ \mathcal{D} $ ;

  • (M2) for any object $\mathcal{D} '$ of $\mu\nu\mathsf{Poly} _ \mathcal{D} $ , all the functors $\mathbb{1} \to \mathcal{D} ' $ are morphisms of $\mu\nu\mathsf{Poly} _ \mathcal{D} $ ;

  • (M3) the binary product $\times : \mathcal{D} \times\mathcal{D} \to \mathcal{D} $ is a morphism of $\mu\nu\mathsf{Poly} _ \mathcal{D} $ ;

  • (M4) the binary coproduct $\sqcup : \mathcal{D}\times \mathcal{D} \to \mathcal{D} $ is a morphism of $\mu\nu\mathsf{Poly} _ \mathcal{D} $ ;

  • (M5) for any pair of objects $\left( \mathcal{D} ', \mathcal{D} '' \right) \in \mu\nu\mathsf{Poly} _ \mathcal{D}\times \mu\nu\mathsf{Poly} _ \mathcal{D} $ , the projections:

    \begin{equation*} \pi _1 : \mathcal{D} '\times \mathcal{D} '' \to \mathcal{D} ',\qquad \pi _2 : \mathcal{D} '\times \mathcal{D} '' \to \mathcal{D} '' \end{equation*}
    are morphisms of $\mu\nu\mathsf{Poly} _ \mathcal{D} $ ;
  • (M6) given objects $ \mathcal{D} ', \mathcal{D} '' , \mathcal{D} '''$ of $\mu\nu\mathsf{Poly} _\mathcal{D} $ , if $E: \mathcal{D} ' \to \mathcal{D} '' $ and $J : \mathcal{D} ' \to \mathcal{D} ''' $ are morphisms of $\mu\nu\mathsf{Poly} _ \mathcal{D} $ , then so is the induced functor $(E,J) :\mathcal{D} ' \to \mathcal{D} '' \times \mathcal{D} ''' $ ;

  • (M7) if $\mathcal{D} '$ is an object of $\mu\nu\mathsf{Poly} _ \mathcal{D} $ , $H: \mathcal{D} '\times \mathcal{D} \to\mathcal{D} $ is a morphism of $\mu\nu\mathsf{Poly} _ \mathcal{D} $ and $\mu H : \mathcal{D} ' \to \mathcal{D} $ exists, then $\mu H $ is a morphism of $\mu\nu\mathsf{Poly} _ \mathcal{D} $ ;

  • (M8) if $\mathcal{D} '$ is an object of $\mu\nu\mathsf{Poly} _ \mathcal{D} $ , $H: \mathcal{D} '\times \mathcal{D} \to\mathcal{D} $ is a morphism of $\mu\nu\mathsf{Poly} _ \mathcal{D} $ and $\nu H : \mathcal{D} ' \to \mathcal{D} $ exists, then $\nu H $ is a morphism of $\mu\nu\mathsf{Poly} _ \mathcal{D} $ .

We say that $\mathcal{D} $ has $\mu\nu$ -polynomials if $\mathcal{D} $ has finite coproducts and products and, for any endofunctor $ E: \mathcal{D} \to\mathcal{D} $ in $ \mu\nu\mathsf{Poly} _ \mathcal{D} $ , $\mu E $ and $\nu E $ exist. We say that $\mathcal{D} $ has chosen $\mu\nu$ -polynomials if we have additionally made a choice of initial algebras and terminal coalgebras for all $\mu\nu$ -polynomials.

Remark 7 (Self-duality). A category $\mathcal{D} $ has $\mu\nu$ -polynomials if and only if $\mathcal{D}^{\mathrm{op}} $ has $\mu\nu$ -polynomials as well.

Another suitably equivalent way of defining $\mu\nu\mathsf{Poly} _ \mathcal{D}$ is the following. The category $\mu\nu\mathsf{Poly} _ \mathcal{D}$ is the smallest subcategory of $\mathbf{Cat} $ such that:

  • - the inclusion $\mu\nu\mathsf{Poly} _ \mathcal{D}\to\mathbf{Cat} $ creates finite products;

  • - $\mathcal{D}$ is an object of the subcategory $\mu\nu\mathsf{Poly} _ \mathcal{D}$ ;

  • - for any object $\mathcal{D} '$ of $\mu\nu\mathsf{Poly} _ \mathcal{D}$ , all the functors $\mathbb{1} \to \mathcal{D} ' $ are morphisms of $\mu\nu\mathsf{Poly} _ \mathcal{D}$ ;

  • - and the binary product $\times : \mathcal{D} \times\mathcal{D} \to \mathcal{D} $ is a morphism of $\mu\nu\mathsf{Poly} _ \mathcal{D} $ ;

  • - the binary coproduct $\sqcup : \mathcal{D}\times \mathcal{D} \to \mathcal{D} $ is a morphism of $\mu\nu\mathsf{Poly} _ \mathcal{D} $ ;

  • - if $\mathcal{D} '$ is an object of $\mu\nu\mathsf{Poly} _ \mathcal{D} $ , $H: \mathcal{D} '\times \mathcal{D} \to\mathcal{D} $ is a morphism of $\mu\nu\mathsf{Poly} _ \mathcal{D} $ and $\mu H : \mathcal{D} ' \to \mathcal{D} $ exists, then $\mu H $ is a morphism of $\mu\nu\mathsf{Poly} _ \mathcal{D} $ ;

  • - if $\mathcal{D} '$ is an object of $\mu\nu\mathsf{Poly} _ \mathcal{D} $ , $H: \mathcal{D} '\times \mathcal{D} \to\mathcal{D} $ is a morphism of $\mu\nu\mathsf{Poly} _ \mathcal{D} $ and $\nu H : \mathcal{D} '\to \mathcal{D} $ exists, then $\nu H $ is a morphism of $\mu\nu\mathsf{Poly} _ \mathcal{D} $ .

Lemma 8. Let $\mathcal{C}$ be a category with $\mu\nu$ -polynomials. If $\mathcal{D} $ is an object of $\mu\nu\mathsf{Poly} _ \mathcal{C} $ and

$$H: \mathcal{D} \times \mathcal{C} \to\mathcal{C} $$

is a functor in $\mu\nu\mathsf{Poly} _ \mathcal{C} $ , then $\mu H : \mathcal{D}\to\mathcal{C} $ and $\nu H : \mathcal{D}\to\mathcal{C} $ exist (and, hence, they are morphisms of $\mu\nu\mathsf{Poly} _ \mathcal{C} $ ).

Proof. Let X be any object of $ \mathcal{D} $ . Denoting by $X: \mathbb{1} \to \mathcal{D} $ the functor constantly equal to X, the functor $H^X $ is the composition below.

Since all the morphisms above are in $ \mu\nu\mathsf{Poly} _ \mathcal{C} $ , we conclude that $H^X $ is an endomorphism of $ \mu\nu\mathsf{Poly} _ \mathcal{C} $ . Therefore, since $\mathcal{C} $ has $\mu\nu$ -polynomials, $\mu H^X $ and $\nu H ^X $ exist.

By Proposition 4, since $\mu H^X $ and $\nu H ^X $ exist for any X in $\mathcal{D} $ , $\mu H $ and $\nu H $ exist.

We say that a categorical semantics $\mathcal{C}$ with (finite) sum and tuple types supports inductive and coinductive types if $\mathcal{C}$ has chosen $\mu\nu$ -polynomials. Note that we do not consider the more general notion of (co)inductive types defined by endofunctors that may contain function types in their construction.

4. Structure-Preserving Functors

In this paper, the definition of our AD macro, the definitions of the concrete semantics, and logical relations are all framed in terms of appropriate structure-preserving functors. This fact highlights the significance of the suitable notions of structure-preserving functors in our work.

A structure-preserving functor between bicartesian closed categories are, of course, bicartesian closed functors. We usually assume that those are strict, which means that the functors preserve the structure on the nose.

It remains to establish the notion of structure-preserving functor between categories with $\mu\nu$ -polynomials. We do it below, starting by establishing the notion of preservation/creation/reflection of initial algebras and terminal coalgebras.

4.1 Preservation, reflection, and creation of initial algebras

We begin by recalling a fundamental result on lifting functors from the base categories to the categories of algebras in Lemma 9. This is actually related to the universal property of the categories of algebras.

Lemma 9. Let $F : \mathcal{D}\to\mathcal{C} $ be a functor. Given endofunctors $E : \mathcal{C}\to\mathcal{C} $ , $E': \mathcal{D}\to\mathcal{D} $ and a natural transformation $\gamma : E \circ F\longrightarrow F\circ E ' $ , we have an induced functor defined by:

\begin{eqnarray*} \check{F}_{\gamma } : & E'\textrm{-}\mathrm{Alg} & \to E\textrm{-}\mathrm{Alg} \\ & \left( X, \zeta \right) & \mapsto \left( F(X), F ( \zeta ) \circ \gamma _X \right)\\ & g & \mapsto F(g). \end{eqnarray*}

Proof. Indeed, if $g : W\to Z $ is the underlying morphism of an algebra morphism between $(W, \zeta ) $ and $(Z, \xi ) $ , we have that

which proves that F(g) in fact gives a morphism between the algebras $ \left( F(W), F(\zeta ) \circ \gamma _W \right) $ and $\left( F (Z), F (\xi ) \circ \gamma _Z \right) $ . The functoriality of $\check{F} _\gamma $ follows, then, from that of F.

Dually, we have:

Lemma 10. Let $E : \mathcal{C}\to\mathcal{C} $ , $G : \mathcal{C}\to\mathcal{D} $ , and $E': \mathcal{D}\to\mathcal{D} $ be functors. Each natural transformation $\beta :G\circ E \longrightarrow E'\circ G$ induces a functor:

\begin{eqnarray*} \tilde{G}^{\beta } : & E\textrm{-}\mathrm{CoAlg} & \to E'\textrm{-}\mathrm{CoAlg} \\ & (W, \xi ) & \mapsto \left( G(W), \beta _W\circ G ( \xi ) \right)\\ & f & \mapsto G(f). \end{eqnarray*}

Below, whenever we talk about strict preservation, we are assuming that we have chosen initial objects (terminal objects) in the respective categories of (co)algebras.

We can, now, establish the definition of preservation, reflection, and creation of initial algebras using the respective notions for the induced functor. More precisely:

Definition 11. (Preservation, reflection, and creation of initial algebras). We say that a functor $F : \mathcal{D}\to \mathcal{C} $ (strictly) preserves the initial algebra/reflects the initial algebra/creates the initial algebra of the endofunctor $E: \mathcal{C}\to \mathcal{C} $ if, whenever $E' : \mathcal{D}\to\mathcal{D} $ is such that $\gamma : E\circ F\cong F\circ E ' $ (or, in the strict case, $F\circ E ' = E\circ F$ ), the functor:

\begin{eqnarray*} \check{F}_ \gamma : & E'\textrm{-}\mathrm{Alg} & \to E\textrm{-}\mathrm{Alg} \\ & \left( X , \zeta \right) & \mapsto \left( F (X), F ( \zeta )\circ \gamma _X \right)\\ & g & \mapsto F (g). \end{eqnarray*}

induced by $\gamma$ strictly) preserves the initial object/reflects the initial object/creates the initial object.

Finally, we say that a functor $F : \mathcal{D}\to \mathcal{C} $ (strictly) preserves initial algebras/reflects initial algebras/creates initial algebras if F (strictly) preserves initial algebras/reflects initial algebras/creates initial algebras of any endofunctor on $\mathcal{D} $ .

Remark 12. In other words, let $F : \mathcal{D}\to \mathcal{C} $ be a functor.

  • (I) We say that F (strictly) preserves initial algebras, if: for any natural isomorphism $\gamma : E\circ F\cong F\circ E ' $ (or, in the strict case, for each identity $E\circ F = F\circ E ' $ ) in which E and E’ are endofunctors, assuming that $ \left( \mu E', \mathbf{\mathfrak{in}} _{E'} \right) $ is the initial E ’-algebra, the E-algebra $ \left( F \left( \mu E'\right), F \left( \mathbf{\mathfrak{in}} _{E'} \right)\circ \gamma _{\mu E' } \right) $ is an initial object of $E\textrm{-}\mathrm{Alg} $ (the chosen initial object of $E\textrm{-}\mathrm{Alg} $ , in the strict case).

  • (II) We say that F reflects initial algebras, if: for any natural isomorphism $\gamma : E\circ F\cong F\circ E ' $ in which E and E’ are endofunctors, if $ \left( F(Y), F\left( \xi\right)\circ \gamma _ Y \right) $ is an initial E -algebra and $(Y, \xi) $ is an E’-algebra, then $(Y, \xi) $ is an initial E’-algebra.

  • (III) We say that F creates initial algebras if: (A) F reflects and preserves initial algebras and, moreover, (B) for any $\gamma : E\circ F\cong F\circ E ' $ in which E and E’ are endofunctors, $E '\textrm{-}\mathrm{Alg} $ has an initial algebra if $E\textrm{-}\mathrm{Alg} $ does.

Definition 13. (Preservation, reflection, and creation of terminal coalgebras). We say that a functor $G : \mathcal{C}\to \mathcal{D} $ (strictly) preserves the initial algebra/reflects the initial algebra/creates the initial algebra of an endofunctor $E:\mathcal{C}\to\mathcal{C} $ if, for any natural isomorphism $\beta : G\circ E \cong E'\circ G $ (or, in the strict case, $GE = E'G$ ), the functor:

\begin{eqnarray*} \tilde{G}^\beta : & E\textrm{-}\mathrm{CoAlg} & \to E'\textrm{-}\mathrm{CoAlg} \\ & \left( W, \xi \right) & \mapsto \left( G(W), \beta_W\circ G ( \xi ) \right)\\ & f & \mapsto G(f). \end{eqnarray*}

induced by $\beta$ (strictly) preserves the terminal object/reflects the terminal object/creates the terminal object.

Finally, we say that $G : \mathcal{C}\to \mathcal{D}$ (strictly) preserves terminal coalgebras/reflects terminal coalgebras/creates terminal coalgebras if G (strictly) preserves terminal coalgebras/reflects terminal coalgebras/creates terminal coalgebras of any endofunctor on $\mathcal{C} $ .

4.2 $\mu\nu$ -polynomial-preserving functors

Finally, we can introduce the concept of a structure-preserving functor for $\mu\nu$ -polynomials.

Definition 14. A functor $G: \mathcal{D}\to\mathcal{C}$ (strictly) preserves $\mu\nu$ -polynomials if it strictly preserves finite coproducts, finite products, as well as initial algebras and terminal coalgebras of $\mu\nu$ -polynomials.

5. An Expressive Functional Language as a Source Language for AD

We describe a source language for our AD code transformations. We consider a standard total functional programming language with an expressive type system, over ground types ${\mathbf{real}}^n$ for arrays of real numbers of static length n, for all $n\in \mathbb{N}$ , and sets $\mathsf{Op}_{n_1,...,n_k}^m$ of primitive operations op, for all $k, m, n_1,\ldots, n_k\in \mathbb{N}$ . These operations op will be interpreted as differentiable functions $({\mathbb{R}}^{n_1}\times \cdots\times {\mathbb{R}}^{n_k})\to {\mathbb{R}}^m$ , and the reader can keep the following examples in mind:

  • constants $\underline{c}\in \mathsf{Op}_{}^n$ for each $c\in {\mathbb{R}}^n$ , for which we slightly abuse notation and write $\underline{c}(\langle \rangle)$ as $\underline{c}$ ;

  • elementwise addition and product $(+),(*)\!\in\!\mathsf{Op}_{n,n}^n$ and matrix-vector product $(\star)\!\in\!\mathsf{Op}_{n\cdot m, m}^n$ ;

  • operations for summing all the elements in an array: $\mathrm{sum}\in\mathsf{Op}_{n}^1$ ;

  • some nonlinear functions like the sigmoid function $\varsigma\in \mathsf{Op}_{1}^1$ .

Its kinds, types, and terms are generated by the grammar in Fig. 1. We write $\Delta\vdash\tau:\mathrm{type}$ to specify that the type $\tau$ is well kinded in kinding context $\Delta$ , where $\Delta$ is a list of the form $\alpha_1:\mathrm{type},\ldots,\alpha_n:\mathrm{type}$ . The idea is that the type variables identifiers $\alpha_1,\ldots, \alpha_n$ can be used in the formation of $\tau$ . These kinding judgments are defined according to the rules displayed in Fig. 2. We write $\Delta\mid\Gamma \vdash t : \tau$ to specify that the term t is well typed in the typing context $\Gamma$ , where $\Gamma$ is a list of the form $x_1:\tau_1,\ldots,x_n:\tau_n$ for variable identifiers $x_i$ and types $\tau_i$ that are well kinded in kinding context $\Delta$ . These typing judgments are defined according to the rules displayed in Fig. 3. As Fig. 4 displays, we consider the terms of our language up to the standard $\beta\eta$ -theory. To present this equational theory, we define in Fig. 5, by induction, some syntactic sugar for the functorial action $\Delta,\Delta'\mid\Gamma,x:\tau{}[^{\sigma}\!/\!_{{\alpha}}]\vdash \tau{}[^{x\vdash t}\!/\!_{{\alpha}}] :\tau{}[^{\rho}\!/\!_{{\alpha}}]$ in argument ${\alpha}$ of parameterized types $\Delta,{\alpha}:\mathrm{type}\vdash \tau:\mathrm{type}$ on terms $\Delta'\mid\Gamma,x:\sigma\vdash t:\rho$ .

Figure 1: Grammar for the kinds, types, and terms of the source language for our AD transformations.

Figure 2: Kinding rules for the AD source language. Note that we only consider the formation of function types of nonparameterized types (shaded in gray).

Figure 3: Typing rules for the AD source language.

Figure 4: We consider the standard $\beta\eta$ -laws above for our language. We write $\stackrel{\# {x_1,\ldots,x_n}}{=}$ to indicate that the variables $x_1,\ldots,x_n$ need to be fresh in the left-hand side. Equations hold on pairs of terms of the same type. As usual, we only distinguish terms up to $\alpha$ -renaming of bound variables.

Figure 5: Functorial action $\Delta,\Delta'\mid\Gamma,x:\tau{}[^{\sigma}\!/\!_{{\alpha}}]\vdash \tau{}[^{x\vdash t}\!/\!_{{\alpha}}] :\tau{}[^{\rho}\!/\!_{{\alpha}}]$ in argument ${\alpha}$ of parameterized types $\Delta,{\alpha}:\mathrm{type}\vdash \tau:\mathrm{type}$ on terms $\Delta'\mid\Gamma,x:\sigma\vdash t:\rho$ of the source language.

We employ the usual conventions of free and bound variables and write $\tau{}[^{\sigma}\!/\!_{{\alpha}}]$ for the capture-avoiding substitution of the type $\sigma$ for the identifier ${\alpha}$ in $\tau$ (and similarly, $t{}[^{s}\!/\!_{x}]$ for the capture-avoiding substitution of the term s for the identifier x in t). We define make liberal use of the standard syntactic sugar $\mathbf{let}\,\langle x, y \rangle=\,t\,\mathbf{in}\,s\stackrel {\mathrm{def}}= \mathbf{let}\,z=\,t\,\mathbf{in}\,\mathbf{let}\,x=\,\mathbf{fst}\, z\,\mathbf{in}\,\mathbf{let}\,y=\,\mathbf{snd}\, z\,\mathbf{in}\,s$ .

This standard language is equivalent to the freely generated bicartesian closed category $\mathbf{Syn}$ with $\mu\nu$ -polynomials on the directed polygraph (computad) given by the ground types ${\mathbf{real}}^n$ as objects and primitive operations op as arrows. Equivalently, we can see it as the initial category that supports tuple types, function types, sum types, inductive and coinductive types, and primitive types $\textrm{Ty}=\left\{{\mathbf{real}}^n\mid n\in\mathbb{N}\right\}$ and primitive operations $\mathsf{Op}({\mathbf{real}}^{n_1},\ldots,{\mathbf{real}}^{n_k};{\mathbf{real}}^m)=\mathsf{Op}_{n_1,\ldots,n_k}^m$ (in the sense of Section 3). $\mathbf{Syn}$ effectively represents programs as (categorical) combinators, also known as “point-free style” in the functional programming community. Concretely, $\mathbf{Syn}$ has types as objects, homsets $\mathbf{Syn}(\tau,\sigma)$ consist of $(\alpha)\beta\eta$ -equivalence classes of terms $\cdot\mid x:\tau\vdash t:\sigma$ , identities are $\cdot\mid x:\tau\vdash x:\tau$ , and the composition of $\cdot\mid x:\tau\vdash t:\sigma$ and $\cdot\mid y:\sigma\vdash s:\rho$ is given by $\cdot\mid x:\tau\vdash \mathbf{let}\,y=\,t\,\mathbf{in}\,s:\rho$ .

Corollary 15 (Universal property of $\mathbf{Syn}$ ). Given any bicartesian closed category with $\mu\nu$ -polynomials $\mathcal{C}$ , any consistent assignment of $F({\mathbf{real}}^n )\in\mathrm{obj} \left( \mathcal{C}\right)$ and $F(\mathrm{op})\in \mathcal{C}(F({\mathbf{real}}^{n_1})\times \cdots\times F({\mathbf{real}}^{n_k}), F({\mathbf{real}}^m))$ for $\mathrm{op}\in\mathsf{Op}_{n_1,\ldots,n_k}^m$ extends to a unique $\mu\nu$ -polynomial-preserving bicartesian closed functor $F:\mathbf{Syn}\to\mathcal{C}$ .

6. Modeling Expressive Functional Languages in Grothendieck Constructions

In this section, we present a novel construction of categorical models (in the sense of Section 3) $\Sigma_\mathcal{C}\mathcal{L}$ and $\Sigma_\mathcal{C}\mathcal{L}^{op}$ of expressive functional languages (like our AD source language of Section 5) in $\Sigma$ -types of suitable indexed categories $\mathcal{L}:\mathcal{C}^{op}\to \mathbf{Cat}$ . In particular, the problem we solve in this section is to identify suitable sufficient conditions to put on an indexed category $\mathcal{L}:\mathcal{C}^{op}\to \mathbf{Cat}$ , whose base category we think of as the semantics of a cartesian type theory and whose fiber categories we think of as the semantics of a dependent linear type theory, such that $\Sigma_\mathcal{C}\mathcal{L}$ and $\Sigma_\mathcal{C}\mathcal{L}^{op}$ are categorical models of expressive functional languages in this sense. We call such an indexed category a $\Sigma$ -bimodel of language feature X if it satifies our sufficient conditions for $\Sigma_\mathcal{C}\mathcal{L}$ and $\Sigma_\mathcal{C}\mathcal{L}^{op}$ to be categorical models of language feature X.

This abstract material in many ways forms the theoretical crux of this paper. We consider two particular instances of this idea later:

  • the case where $\mathcal{L}$ is the syntactic category ${\mathbf{LSyn}}:{\mathbf{CSyn}}^{op}\to \mathbf{Cat}$ of a suitable target language for AD translations (Section 7); the universal property of the source language $\mathbf{Syn}$ then yields unique structure-preserving functors ${D}_{:}\mathbf{Syn}\to\Sigma_{{\mathbf{CSyn}}}{\mathbf{LSyn}}$ and ${D}_{:}\mathbf{Syn}\to\Sigma_{{\mathbf{CSyn}}}{\mathbf{LSyn}}^{op}$ implementing forward and reverse-mode AD;

  • the case where $\mathcal{L}$ is the indexed category of families of real vector spaces $\mathbf{FVect}:\mathbf{Set}^{op}\to \mathbf{Cat}$ (Section 9); this gives a concrete denotational semantics to the target language, which we use in the correctness proof of AD.

6.1 Basics: the categories $\Sigma_\mathcal{C} \mathcal{L}$ and $\Sigma_\mathcal{C} \mathcal{L}^{\mathrm{op}}$

Recall that for any strictly indexed category, that is, a (strict) functor $\mathcal{L}:\mathcal{C}^{\mathrm{op}}\to\mathbf{Cat}$ , we can consider its total category (or Grothendieck construction) $\Sigma_\mathcal{C} \mathcal{L}$ , which is a fibered category over $\mathcal{C}$ (see Johnstone Reference Johnstone2002, Sections A1.1.7, B1.3.1). We can view it as a $\Sigma$ -type of categories, which generalizes the cartesian product. Further, given a strictly indexed category $\mathcal{L}:\mathcal{C}^{\mathrm{op}}\to \mathbf{Cat}$ , we can consider its fiberwise dual category $\mathcal{L}^{\mathrm{op}}:\mathcal{C}^{\mathrm{op}}\to \mathbf{Cat}$ , which is defined as the composition $\mathcal{C}^{\mathrm{op}}\xrightarrow{\mathcal{L}}\mathbf{Cat}\xrightarrow{\mathrm{op}}\mathbf{Cat}$ , where op is defined by $A\mapsto A^{\mathrm{op}} $ . Thus, we can apply the same construction to $\mathcal{L}^{\mathrm{op}}$ to obtain a category $\Sigma_{\mathcal{C}}\mathcal{L}^{\mathrm{op}}$ .

Concretely, $\Sigma_\mathcal{C} \mathcal{L}$ is the following category:

  • objects are pairs (W,w) of an object W of $\mathcal{C}$ and an object w of $\mathcal{L}(W)$ ;

  • morphisms $(W,w)\to (X,x)$ are pairs (f, f’) with $f :W\to X$ in $\mathcal{C}$ and $ {f}' : w \to \mathcal{L}(f)(x)$ in $\mathcal{L}(W)$ ;

  • identities ${\mathrm{id}}_{(W,w)}$ are $({\mathrm{id}}_{W},{\mathrm{id}}_{W})$ ;

  • composition of $(W,w)\xrightarrow{(f, {f}' )}(X,x)$ and $(X,x)\xrightarrow{(g, {g}' )}(Y,y)$ is given by:

$$(g\circ f, \mathcal{L}(f)( {g}' ) \circ {f}' ) .$$

Concretely, $\Sigma_{\mathcal{C}}\mathcal{L}^{\mathrm{op}}$ is the following category:

  • objects are pairs (W, w) of an object W of $\mathcal{C}$ and an object w of $\mathcal{L}(W)$ ;

  • morphisms $(W,w)\to (X,x)$ are pairs (f, f’) with $f :W\to X$ in $\mathcal{C}$ and $ {f}' : \mathcal{L}(f)(x)\to w $ in $\mathcal{L}(W)$ ;

  • identities ${\mathrm{id}}_{(W,w)}$ are $({\mathrm{id}}_{W},{\mathrm{id}}_{W})$ ;

  • composition of $(W,w)\xrightarrow{(f, {f}' )}(X,x)$ and $(X,x)\xrightarrow{(g, {g}' )}(Y,y)$ is given by:

$$(g\circ f, {f}' \circ \mathcal{L}(f)( {g}' ) ) .$$

6.2 Products in total categories

We start by studying the cartesian structure of $\Sigma_\mathcal{C}\mathcal{L}$ . We refer to Gray (Reference Gray1966) for a basic reference for fibrations/indexed categories and properties of the total category.

Definition 16. A strictly indexed category $\mathcal{L}$ has strictly indexed finite (co)products if

  • (i) each fiber $\mathcal{L}(C)$ has chosen finite (co)products $(\times , \mathbb{1})$ (respectively, $(\sqcup , {\mathbb {0}})$ );

  • (ii) change of base strictly preserves these (co)products in the sense that $\mathcal{L}(f)$ preserves finite products (respectively, finite coproducts) for all morphisms f in $\mathcal{C}$ .

We recall the well-known fact that $\Sigma_\mathcal{C} \mathcal{L}$ ( $\Sigma_\mathcal{C} \mathcal{L}^{\mathrm{op}}$ ) has finite products if $\mathcal{C} $ has finite products and $\mathcal{L} $ has indexed finite products (coproducts).

Proposition 17 (Cartesian structure of $\Sigma_\mathcal{C} \mathcal{L} $ ). Assuming that $\mathcal{C}$ has finite products $(\mathbb{1},\times)$ and $\mathcal{L}$ has indexed finite products $(\mathbb{1},\times)$ , we have that $\Sigma_{\mathcal{C}}\mathcal{L}$ has (fibered) terminal object $\mathbb{1} =\left(\mathbb{1},\mathbb{1}\right)$ and (fibered) binary product $(W,w)\times (Y,y)=(W\times Y,\mathcal{L}(\pi_1)(w)\times \mathcal{L}(\pi_2)(y))$ .

Proof. We have (natural) bijections:

In particular, finite products in $\Sigma_{\mathcal{C}}\mathcal{L}$ are fibered in the sense that the projection functor $\Sigma_{\mathcal{C}}\mathcal{L}\to \mathcal{C}$ preserves them, on the nose. Codually, we have:

Proposition 18 (Cartesian structure of $\Sigma_\mathcal{C} \mathcal{L} ^{\mathrm{op}} $ ). Assuming that $\mathcal{C}$ has finite products $(\mathbb{1},\times)$ and $\mathcal{L}$ has indexed finite coproducts $({\mathbb {0}},\sqcup )$ , we have that $\Sigma_{\mathcal{C}}\mathcal{L}^{\mathrm{op}}$ has (fibered) terminal object $\mathbb{1}=(\mathbb{1},{\mathbb {0}} )$ and (fibered) binary product $(W,w)\times (Y,y)=(W\times Y,\mathcal{L}(\pi_1)(w)\sqcup \mathcal{L}(\pi_2)(y))$ .

That is, in our terminology, $\mathcal{L}:\mathcal{C}^{op}\to \mathbf{Cat}$ is a $\Sigma$ -bimodel of tuple types if $\mathcal{C}$ has chosen finite products and $\mathcal{L}$ has finite strictly indexed products and coproducts.

We will, in particular, apply the results above in the situation where $\mathcal{L}$ has indexed finite biproducts in the sense of Definition 19, in which case the finite product structures of $\Sigma_\mathcal{C}\mathcal{L}$ and $\Sigma_\mathcal{C}\mathcal{L}^{op}$ coincide.

Definition 19. (Strictly indexed finite biproducts). A category with finite products and coproducts is semi-additive if the binary coproduct functor is naturally isomorphic to the binary product functor; see, for instance, Lack (Reference Lack2012), Lucatelli Nunes (Reference Lucatelli Nunes2019). In this case, the product/coproduct is called biproduct, and the biproduct structure is denoted by $(\times, \mathbb{1}) $ or $(+, {\mathbb {0}})$ .

A strictly indexed category $\mathcal{L}$ has strictly indexed finite biproducts if

  • $\mathcal{L} $ has strictly indexed finite products and coproducts;

  • each fiber $\mathcal{L}(C)$ is semi-additive.

6.3 Generators

In this section, we establish the obvious sufficient (and necessary) conditions for $\Sigma_\mathcal{C}\mathcal{L}$ and $\Sigma_\mathcal{C}\mathcal{L}^{op}$ to model primitive types and operations in the sense of Section 3. These conditions are an immediate consequence of the structure of $\Sigma_\mathcal{C}\mathcal{L}$ and $\Sigma_\mathcal{C}\mathcal{L}^{op}$ as cartesian categories.

We say that $\mathcal{L}:\mathcal{C}^{op}\to \mathbf{Cat}$ is a $\Sigma$ -bimodel of primitive types $\textrm{Ty}$ and operations $\mathsf{Op}$ if

  • • for all $T\in\textrm{Ty}$ , we have a choice of objects $C_T\in \mathrm{obj} \left( \mathcal{C}\right)$ and $L_T,L'_T\in \mathrm{obj} \left( \mathcal{L}\right)(C_T)$ ;

  • for all $\mathrm{op}\in \mathsf{Op}(T_1,\ldots, T_n; S)$ , we have a choice of morphisms:

\begin{align*} &f_{\mathrm{op}}\in \mathcal{C}(C_{T_1}\times \ldots\times C_{T_n}, C_S)\\ &g_{\mathrm{op}}\in \mathcal{L}(C_{T_1}\times \ldots\times C_{T_n})(\mathcal{L}(\pi_1)(L_{T_1})\times\cdots\times \mathcal{L}(\pi_n)(L_{T_n}), \mathcal{L}(f_{\mathrm{op}})(L_S))\\ &g'_\mathrm{op}\in \mathcal{L}(C_{T_1}\times \ldots\times C_{T_n})(\mathcal{L}(f_{\mathrm{op}})(L'_S),\mathcal{L}(\pi_1)(L'_{T_1})\sqcup\cdots\sqcup \mathcal{L}(\pi_n)(L'_{T_n})). \end{align*}

We say that such a model has self-dual primitive types in case $L_T=L'_T$ for all $T\in\textrm{Ty}$ .

6.4 Cartesian closedness of total categories

The question of Cartesian closure of the categories $\Sigma_\mathcal{C} \mathcal{L}$ and $\Sigma_\mathcal{C} \mathcal{L}^{op}$ is a lot more subtle. In particular, the formulas for exponentials tend to involve $\Pi$ - and $\Sigma$ -types; hence, we need to recall some definitions from categorical dependent type theory. As also suggested by Kerjean and Pédrot (Reference Kerjean and Pédrot2021), these formulas relate closely to the Diller–Nahm variant (Diller Reference Diller1974; Hyland Reference Hyland2002; Moss and von Glehn Reference Moss, von Glehn, Dawar and Grädel2018) of the Dialectica interpretation (Gödel Reference Gödel1958) and Altenkirch et al. (Reference Altenkirch, Levy and Staton2010)’s formula for higher-order containers. We plan to explain this connection in detail in future work as it would form a distraction from the point of the current paper.

We use standard definitions from the semantics of dependent type theory and the dependently typed enriched effect calculus. An interested reader can find background on this material in Vákár (Reference Vákár2017, Chapter 5) and Ahman et al. (Reference Ahman, Ghani and Plotkin2016). We briefly recalling some of the usual vocabulary Vákár (Reference Vákár2017, Chapter 5).

Definition 21. Given an indexed category $\mathcal{D}:\mathcal{C}^{op}\to \mathbf{Cat}$ , we say:

  • it satisfies the comprehension axiom if: $\mathcal{C}$ has a chosen terminal object $\mathbb{1}$ ; $\mathcal{D}$ has strictly indexed terminal objects $\mathbb{1}$ (i.e., chosen terminal objects $\mathbb{1}\in\mathcal{D}(X)$ , such that $\mathcal{D}(g)(\mathbb{1})=\mathbb{1}\in\mathcal{D} (W) $ for all $g:W\to X$ in $\mathcal{C}$ ); and, for each object $\left( X, x\right)\in \Sigma_\mathcal{C} \mathcal{D} $ , the functor:

\begin{align*} \mathfrak{re}_{(X,x)} : (\mathcal{C}/X)^{op} &\to \mathbf{Set} \\ \left( W, W\xrightarrow{f}X \right) & \mapsto \mathcal{D}(W)(\mathbb{1}, \mathcal{D}(f)(x)) \end{align*}

are representable by an object $\left( X.x, X.x\xrightarrow{{\mathbf{p}_{X,x}}} X\right) $ of $\mathcal{C}/X$ :

\begin{align*} \mathfrak{re}_{(X,x)} \left( W, W\xrightarrow{f}X \right) = \mathcal{D}(W)(\mathbb{1}, \mathcal{D}(f)(x)) &\cong \mathcal{C}/X\left( \left( W, f \right) ,\left( X.x, {\mathbf{p}_{X,x}}\right) \right)\\ b&\mapsto (f,g). \end{align*}

We write ${\mathbf{v}_{X,x}}$ for the unique element of $\mathcal{D}(X.x)(\mathbb{1}, \mathcal{D}({\mathbf{p}_{X,x}})(x))$ such that $(\mathbf{p}_{X,x}, \mathbf{v}_{X,x})={\mathrm{id}}_{{\mathbf{p}_{X,x}}}$ (the universal element of the representation).

Furthermore, given $f: W\to X $ , we write $\mathbf{q}_{f,b}$ for the unique morphism $(f\circ {\mathbf{p}_{W,\mathcal{D}(f)(x)}}, \mathbf{v}_{W,{\mathcal{D}(f)(x)}})$ making the square below a pullback:

We henceforth call such squares $\mathbf{p}$ -squares;

  • it supports $\Sigma$ -types if we have left adjoint functors $\Sigma_w\dashv \mathcal{D}({\mathbf{p}_{W,w}}):\mathcal{D}(W.w)\leftrightarrows \mathcal{D}(W)$ satisfying the left Beck–Chevalley condition for $\mathbf{p}$ -squares w.r.t. $\mathcal{D} $ (this means that $\mathcal{D}(f) \circ \left( \Sigma_{\mathcal{D}(f)(x)} \to \Sigma_x\right) \circ \mathcal{D}({\mathbf{p}_{f,x}}) $ are the identity);

  • it supports $\Pi$ -types if $\mathcal{D}^{op}$ supports $\Sigma$ -types; explicitly, that is the case iff we have right adjoint functors $\mathcal{D}({\mathbf{p}_{W,w}})\dashv \Pi_w:\mathcal{D}(W)\leftrightarrows \mathcal{D}(W.w)$ satisfying the right Beck–Chevalley condition for $\mathbf{p}$ -squares in the sense that the canonical maps $\Pi_{\mathcal{D}(f)(x)} \circ \left( \mathcal{D}(f)\to\mathcal{D}({\mathbf{p}_{f,x}})\right) \circ \Pi_x$ are the identity.

Definition 22. In case $\mathcal{D}:\mathcal{C}^{op}\to \mathbf{Cat}$ satisfies the comprehension axiom, we say that

  • it satisfies democratic comprehension if the comprehension functor:

    \begin{align*} \mathcal{D}(W)(w',w) & \xrightarrow{{\mathbf{p}_{W,-}}} \mathcal{C}/W\left( \left( W.w', {\mathbf{p}_{W,w'}}\right) , \left( W.w, {\mathbf{p}_{W,w}}\right) \right)\\ d & \mapsto (\mathbf{p}_{W,w'},\mathcal{D}({\mathbf{p}_{W,w'}})(d)\circ \mathbf{v}_{W,w'}) \end{align*}

    defines an isomorphism of categories $\mathcal{D}(\mathbb{1} )\cong \mathcal{C}/\mathbb{1} \cong \mathcal{C}$ ;

  • • it satisfies full/faithful comprehension if the comprehension functor is full/faithful;

  • • it supports (strong) $\Sigma$ -types (i.e., $\Sigma$ -types with a dependent elimination rule, which in particular makes $\mathcal{D}$ support $\Sigma$ -types) if dependent projections compose: for all triple $\left( W, w , s\right)$ where $W\in\mathcal{C}$ , $w\in \mathrm{obj} \left(\mathcal{D}(W)\right) $ and $s\in \mathrm{obj} \left( \mathcal{D}(W.w)\right) $ , we have

    $${\mathbf{p}_{W,w}}\circ {\mathbf{p}_{W.w,s}}\cong {\mathbf{p}_{W,\Sigma_w s}};$$

    then, in particular, $W.\Sigma_w s\cong W.w.s$ ; further, we have projection morphisms $\pi_1\in\mathcal{D}(W)(\Sigma_w s, w)$ and $\pi_2\in \mathcal{D}(W.w)(\mathbb{1}, s)$ ;

Remark 23 ( $\Sigma$ - and $\Pi$ - as dependent product and function types). In case, $\mathcal{D}$ satisfies fully faithful comprehension,

  • $\Sigma_w \mathcal{D}({\mathbf{p}_{W,w}})(v)$ gives the categorical product $w\times v$ of w and v in $\mathcal{D}(W)$ ;

  • $\Pi_w \mathcal{D}({\mathbf{p}_{W,w}})(v)$ gives the categorical exponential $w\Rightarrow v$ of w and v in $\mathcal{D}(W)$ .

Definition 24. ( $\Sigma$ -bimodel for function types). We call a strictly indexed category $\mathcal{L}:\mathcal{C}^{op}\to\mathbf{Cat}$ a $\Sigma$ -bimodel for function types if it is a biadditive model of the dependently typed enriched effect calculus in the sense that it comes equipped with

  • ( $\mathcal{L}$ A) a model of cartesian dependent type theory in the sense of a strictly indexed category $\mathcal{C}':\mathcal{C}^{op}\to \mathbf{Cat}$ that satisfies full, faithful, democratic comprehension with $\Pi$ -types and strong $\Sigma$ -types;

  • ( $\mathcal{L}$ B) strictly indexed finite biproducts in the sense of Definition 19 in $\mathcal{L}$ ;

  • ( $\mathcal{L}$ C) $\Sigma$ - and $\Pi$ -types in $\mathcal{L}$ ;

  • ( $\mathcal{L}$ D) a strictly indexed functor $\multimap: \mathcal{L}^{\mathrm{op}}\times\mathcal{L}\to\mathcal{C}'$ and a natural isomorphism:

    $$\mathcal{L}(W)(w,x)\cong \mathcal{C}'(A)(\mathbb{1}, w\multimap x).$$

We can immediately note that our notion of $\Sigma$ -bimodel of function types is also a $\Sigma$ -bimodel of tuple types. Indeed, strong $\Sigma$ -types and comprehension give us, in particular, chosen finite products in $\mathcal{C}$ .

We next show why this name is justified: we show that the Grothendieck construction of a $\Sigma$ -bimodel of function types is cartesian closed.4

In the following, we slightly abuse notation to aid legibility:

  • denoting by $!_{W}: W\to\mathbb{1} $ the only morphism, we will sometimes conflate $Z\in\mathrm{obj} \mathcal{C}'(\mathbb{1} )$ and $\mathbb{1} .Z\in\mathrm{obj} \left( \mathcal{C}\right)$ as well as $f\in \mathcal{C}'(W)(\mathbb{1}, \mathcal{C}'(!_{W} )(Z))$ and $(!_{W},f)\in \mathcal{C}(W, \mathbb{1} .Z)$ ); this is justified by the democratic comprehension axiom;

  • we will sometimes simply write z for $\mathcal{D}({\mathbf{p}_{W,w}})(z)$ where the weakening map $\mathcal{D}({\mathbf{p}_{W,w}})$ is clear from context.

Given $X, Y\in \mathcal{C}$ we will write ${\mathrm{ev1}}$ for the obvious $\mathcal{C}$ -morphism

$${\mathrm{ev1}}:\Pi_{X} \Sigma_{Y}Z.X\to Y,$$

that is, the morphism obtained as the composition (where we write $\pi_1$ for the projection $\Sigma_{Y}Z\to Y$ ):

$$\Pi_{X}\Sigma_{Y}Z.X\cong (\Pi_{X}\Sigma_{Y}Z)\times X\xrightarrow{(\Pi_{X}\pi_1)\times X}(\Pi_{X}Y)\times X\cong (X\Rightarrow Y)\times X\xrightarrow{\mathrm{ev}}Y$$

With these notational conventions in place, we can describe the cartesian closed structure of Grothendieck constructions.

Theorem 25 (Exponentials of the total category). For a $\Sigma$ -bimodel $\mathcal{L}$ for function types, $\Sigma_{\mathcal{C}}\mathcal{L}$ has exponential:

$$(X,x)\Rightarrow (Y,y)= (\Pi_{X}\Sigma_{Y} \mathcal{L}(\pi_1)(x)\multimap \mathcal{L}(\pi_2)(y), \Pi_{X} \mathcal{L}({\mathrm{ev1}})(y)).$$

Proof. We have (natural) bijections:

Codually, we have

Theorem 26 For a $\Sigma$ -bimodel $\mathcal{L}$ for function types, $\Sigma_{\mathcal{C}}\mathcal{L}^{\mathrm{op}}$ has exponential:

$$(X,x)\Rightarrow (Y,y)= (\Pi_{X}\Sigma_{Y} \mathcal{L}(\pi_2)(y)\multimap \mathcal{L}(\pi_1)(x), \Sigma_{X} \mathcal{L}({\mathrm{ev1}})(y)).$$

Note that these exponentials are not fibered over $\mathcal{C}$ in the sense that the projection functors $\Sigma_\mathcal{C} \mathcal{L}\to \mathcal{C}$ and $\Sigma_\mathcal{C} \mathcal{L}^{op}\to \mathcal{C}$ are generally not cartesian closed functors. This is in contrast with the interpretation of all other type formers we consider in this paper.

6.5 Coproducts in total categories

We, now, study the coproducts in the total categories $\Sigma_\mathcal{C}\mathcal{L} $ and $\Sigma_\mathcal{C}\mathcal{L} ^{\mathrm{op}} $ . We are particularly interested in the case of extensive indexed categories, a notion introduced in Section 6.6. For future reference, we start by recalling the general case: see, for instance, Gray (Reference Gray1966) for a basic reference on properties of the total categories.

Proposition 27 (Initial object in $\Sigma_\mathcal{C} \mathcal{L} $ ). Let $\mathcal{L} : \mathcal{C} ^{\mathrm{op}}\to\mathbf{Cat} $ be a strictly indexed category. We assume that

  • (i) $\mathcal{C} $ has initial object ${\mathbb {0}} $ ;

  • (ii) $\mathcal{L} ({\mathbb {0}} ) $ has initial object, denoted, by abuse of language, by ${\mathbb {0}} $ .

In this case, $\left( {\mathbb {0}} ,{\mathbb {0}} \right) $ is the initial object of $\Sigma_\mathcal{C} \mathcal{L} $ .

Proof. Assuming the hypothesis above, given any object $(Y,y)\in\Sigma_\mathcal{C}\mathcal{L} $ ,

Proposition 28 (Coproducts in $\Sigma_\mathcal{C} \mathcal{L} $ ). Let $\mathcal{L} : \mathcal{C} ^{\mathrm{op}}\to\mathbf{Cat} $ be a strictly indexed category. We assume that

  • (i) $ ((W_i,w_i)) _{i\in I} $ is family of objects of $\Sigma_\mathcal{C}\mathcal{L}$ ;

  • (ii) the category $\mathcal{C} $ has the coproduct:

    (6)
    of the objects in $ \left( (W_i,w_i)\right) _{i\in I} $ ;
  • (iii) there is an adjunction $\mathcal{L} (\iota _ {W_i} )! \dashv \mathcal{L} (\iota _ {W_i} ) $ for each $i\in I$ ;

  • (iv) $\mathcal{L} \left( \displaystyle\coprod _{i\in I } W_i \right) $ has the coproduct $\displaystyle \coprod _{i\in I} \mathcal{L} (\iota _ {W_i} )! (w_i) $ of the objects $\left( \mathcal{L} (\iota _ {W_i} )! (w_i)\right) _{i\in I} $ .

In this case,

$$\left( \displaystyle\coprod _{i\in I } W_i ,\quad \displaystyle\coprod _{i\in I } \mathcal{L} (\iota _ {W_i} )! (w_i) \right) $$

is the coproduct of the objects $ \left( (W_i,w_i )\right) _{i\in I} $ in $\Sigma_\mathcal{C} \mathcal{L} $ .

Proof. Assuming the hypothesis above, given any object $(Y,y)\in\Sigma_\mathcal{C}\mathcal{L} $ ,

Codually, we get results on the initial objects and coproducts in the category $\Sigma_\mathcal{C} \mathcal{L} ^{\mathrm{op}} $ below.

Corollary 29 (Initial object in $\Sigma_\mathcal{C} \mathcal{L} ^{\mathrm{op}} $ ). Let $\mathcal{L} : \mathcal{C} ^{\mathrm{op}}\to\mathbf{Cat} $ be a strictly indexed category. We assume that

  • (i) $\mathcal{C} $ has initial object ${\mathbb {0}} $ ;

  • (ii) $\mathcal{L} ({\mathbb {0}} ) $ has terminal object $\mathbb{1} $ .

In this case, $\left( {\mathbb {0}} ,\mathbb{1} \right) $ is the initial object of $\Sigma_\mathcal{C} \mathcal{L} $ .

Corollary 30 (Coproducts in $\Sigma_\mathcal{C} \mathcal{L} ^{\mathrm{op}} $ ). Let $\mathcal{L} : \mathcal{C} ^{\mathrm{op}}\to\mathbf{Cat} $ be a strictly indexed category. We assume that

  • (i) $ ((W_i,w_i)) _{i\in I} $ is family of objects of $\Sigma_\mathcal{C}\mathcal{L}$ ;

  • (ii) the category $\mathcal{C} $ has the coproduct:

    (7)
    of the objects in $ \left( (W_i,w_i)\right) _{i\in I} $ ;
  • (iii) there is an adjunction $\mathcal{L} (\iota _ {W_i} )\dashv \mathcal{L} (\iota _ {W_i} )^\ast $ for each $i\in I$ ;

  • (iv) $\mathcal{L} \left( \displaystyle\coprod _{i\in I } W_i \right) $ has the product $\displaystyle \prod _{i\in I} \mathcal{L} (\iota _ {W_i} )^\ast (w_i) $ of the objects $\left( \mathcal{L} (\iota _ {W_i} )^\ast (w_i)\right) _{i\in I} $ .

In this case,

$$\left( \displaystyle\coprod _{i\in I } W_i ,\quad \displaystyle\prod _{i\in I } \mathcal{L} (\iota _ {W_i} )^\ast (w_i) \right) $$

is the coproduct of the objects $ \left( (W_i,w_i )\right) _{i\in I} $ in $\Sigma_\mathcal{C} \mathcal{L} ^{\mathrm{op}} $ .

6.6 Extensive indexed categories and coproducts in total categories

We introduce a special property that fits our context well. We call this property extensivity because it generalizes the concept of extensive categories (see Section 6.12 for the notion of extensive category).

As we will show, the property of extensivity is a crucial requirement for our models. One significant advantage of this property is that it allows us to easily construct coproducts in the total categories, even under lenient conditions. We demonstrate this in Theorem 35.

  • We assume that the category $\mathcal{C} $ has finite coproducts. Given $ W, X\in\mathcal{C} $ , we denote by:

(8)

the coproduct (and coprojections) in $\mathcal{C} $ , and by ${\mathbb {0}} $ the initial object of $\mathcal{C} $ .

Definition 31. (Extensive indexed categories). We call an indexed category $\mathcal{L} : \mathcal{C} ^{\mathrm{op}} \to \mathbf{Cat} $ extensive if, for any $(W,X)\in\mathcal{C}\times\mathcal{C} $ , the unique functor:

(9)

induced by the functors:

(10)

is an equivalence. In this case, for each $(W,X)\in\mathcal{C}\times\mathcal{C} $ , we denote by:

(11) \begin{equation} \mathcal{S} ^{(W,X)} : \mathcal{L} (W)\times \mathcal{L} (X)\to \mathcal{L} (W\sqcup X)\end{equation}

an inverse equivalence of ${\left(\mathcal{L} (\iota _ W), \mathcal{L} (\iota _ X)\right)} $ .

Since the products of $\mathcal{C} ^{\mathrm{op}} $ are the coproducts of $\mathcal{C} $ , the extensive condition described above is equivalent to say that the (pseudo)functor $\mathcal{L} : \mathcal{C} ^{\mathrm{op}} \to \mathbf{Cat} $ preserves binary (bicategorical) products (up to equivalence).

Since our cases of interest are strict, this leads us to consider strict extensivity, that is to say, whenever we talk about extensive strictly indexed categories, we are assuming that (9) is invertible. In this case, it is even clearer that extensivity coincides with the well-known notion of preservation of binary products.

Lemma 32 (Extensive strictly indexed categories). Let $\mathcal{L} : \mathcal{C} ^{\mathrm{op}} \to \mathbf{Cat} $ be an indexed category. $\mathcal{L} $ is strictly extensive if, and only if, $\mathcal{L} $ is a functor that preserves binary products.

Recall that, in general, preservation of binary products implies preservation of preterminal objects; see, for instance, Lucatelli Nunes (Reference Lucatelli Nunes2022, Remark 4.14). Lemma 33 is the appropriate analog of this observation suitably applied to the context of extensive indexed categories. Moreover, Lemma 33 can be seen as a generalization of Carboni et al. (Reference Carboni, Lack and Walters1993, Proposition 2.8).

Lemma 33 (Preservation of terminal objects). Let $ \mathcal{L} : \mathcal{C} ^{\mathrm{op}} \to \mathbf{Cat} $ be an extensive indexed category which is not (naturally isomorphic to the functor) constantly equal to ${\mathbb {0}} $ . The unique functor

(12) \begin{equation} \mathcal{L} ({\mathbb {0}} )\to \mathbb{1} \end{equation}

is an equivalence. If, furthermore, (9) is an isomorphism, then (12) is invertible.

Proof. Firstly, given any $X\in\mathcal{C} $ such that $\mathcal{L} (X) $ is not (isomorphic to) the initial object of $\mathbf{Cat} $ , we have that $\mathcal{L} ( i_X : {\mathbb {0}}\to X ) $ is a functor from $\mathcal{L} (X) $ to $\mathcal{L} ({\mathbb {0}} ) $ . Hence, $\mathcal{L} ({\mathbb {0}} ) $ is not isomorphic to the initial category as well.

Secondly, since $\iota _{{\mathbb {0}} } : {\mathbb {0}} \to {\mathbb {0}}\sqcup {\mathbb {0}} $ is an isomorphism, $\left(\mathcal{L} (\iota _ {{\mathbb {0}}} ), \mathcal{L} (\iota _ {\mathbb {0}} )\right)$ is an equivalence and

(13)

we conclude that $ \pi _{\mathcal{L}({\mathbb {0}} )} $ is an equivalence. This proves that $\mathcal{L} ({\mathbb {0}} ) \to \mathbb{1} $ is an equivalence by Appendix A, Lemma 132.

We proceed to study the cocartesian structure of $\Sigma_\mathcal{C} \mathcal{L} $ (and $\Sigma_\mathcal{C} \mathcal{L} ^{\mathrm{op}} $ ) when $\mathcal{L} $ is extensive. We start by proving in Theorem 34 that, in the case of extensive indexed categories, the hypothesis of Proposition 27 always holds.

Theorem 34. Let $\mathcal{L} : \mathcal{C} ^{\mathrm{op}} \to \mathbf{Cat} $ be an extensive (strictly) indexed category. Assume that X is an object of $\mathcal{C} $ such that $\mathcal{L} (X) $ has initial object ${\mathbb {0}} $ . In this case, for any $W\in \mathcal{C} $ , we have an adjunction:

(14)

in which, by abuse of language, ${\mathbb {0}} : \mathcal{L} (W)\to \mathcal{L} (X) $ is the functor constantly equal to ${\mathbb {0}} $ . Dually, we have an adjunction:

(15)

provided that $\mathcal{L} (X) $ has terminal object $\mathbb{1} $ and, by abuse of language, we denote by $\mathbb{1} : \mathcal{L} (W)\to \mathcal{L} (X) $ the functor constantly equal to $\mathbb{1} $ .

Proof. Assuming that $\mathcal{L} (X) $ has initial object ${\mathbb {0}} $ , we have the adjunction:

(16)

whose unit is the identity and counit is pointwise given by $ \varepsilon _{(w,x)} = ({\mathrm{id}}_w, {\mathbb {0}} \to x ) $ . Therefore, we have the composition of adjunctions:

Corollary 35 (Cocartesian structure of $\Sigma_\mathcal{C} \mathcal{L} $ ). Let $\mathcal{L} : \mathcal{C} ^{\mathrm{op}} \to \mathbf{Cat} $ be an extensive strictly indexed category, with initial objects ${\mathbb {0}} \in\mathcal{L} (W)$ for each $W\in \mathcal{C} $ . In this case, the category $\Sigma_\mathcal{C} \mathcal{L} $ has initial object ${\mathbb {0}} = ({\mathbb {0}} , {\mathbb {0}} )\in \Sigma_\mathcal{C} \mathcal{L}$ , and (fibered) binary coproduct given by $(W, w)\sqcup (X,x) = \left( W\sqcup X, \mathcal{S} ^{(W,X)} (w,x) \right) $ .

Proof. In fact, by Proposition 27, we have that $({\mathbb {0}} , {\mathbb {0}} )$ is the initial object of $\Sigma_\mathcal{C} \mathcal{L} $ . Moreover, given $\left( (W,w), (X,x)\right) \in \Sigma_\mathcal{C}\mathcal{L} \times \Sigma_\mathcal{C}\mathcal{L} $ , we have that

\begin{eqnarray*}\mathcal{S} ^{(W,X)}\circ \left({\mathrm{id}}_{\mathcal{L} (W)}, {\mathbb {0}} \right) = \mathcal{L} (\iota _W )! &\dashv &\mathcal{L} (\iota _W )\\\mathcal{S} ^{(W,X)}\circ \left( {\mathbb {0}} , {\mathrm{id}}_{\mathcal{L} (X)}\right) = \mathcal{L} (\iota _X )! &\dashv &\mathcal{L} (\iota _X )\end{eqnarray*}

by Theorem 34. Therefore, we get that