1 Introduction
In today’s programming languages, we find a wealth of powerful constructs and features – exceptions, higherorder store, dynamic method dispatch, coroutines, explicit continuations, concurrency features, Lispstyle ‘quote’ and so on – which may be present or absent in various combinations in any given language. There are, of course, many important pragmatic and stylistic differences between languages, but here we are concerned with whether languages may differ more essentially in their expressive power, according to the selection of features they contain.
One can interpret this question in various ways. For instance, Felleisen (Reference Felleisen1991) considers the question of whether a language $\mathcal{L}$ admits a translation into a sublanguage $\mathcal{L}'$ in a way which respects not only the behaviour of programs but also aspects of their (global or local) syntactic structure. If the translation of some $\mathcal{L}$ program into $\mathcal{L}'$ requires a complete global restructuring, we may say that $\mathcal{L}'$ is in some way less expressive than $\mathcal{L}$ . In the present paper, however, we have in mind even more fundamental expressivity differences that would not be bridged even if wholeprogram translations were admitted. These fall under two headings.

1. Computability: Are there operations of a given type that are programmable in $\mathcal{L}$ but not expressible at all in $\mathcal{L}'$ ?

2. Complexity: Are there operations programmable in $\mathcal{L}$ with some asymptotic runtime bound (e.g. ${{{{{{\mathcal{O}}}}}}}(n^2)$ ) that cannot be achieved in $\mathcal{L}'$ ?
We may also ask: are there examples of natural, practically useful operations that manifest such differences? If so, this might be considered as a significant advantage of $\mathcal{L}$ over $\mathcal{L}'$ .
If the ‘operations’ we are asking about are ordinary firstorder functions – that is, both their inputs and outputs are of ground type (strings, arbitrarysize integers, etc.) – then the situation is easily summarised. At such types, all reasonable languages give rise to the same class of programmable functions, namely, the ChurchTuring computable ones. As for complexity, the runtime of a program is typically analysed with respect to some cost model for basic instructions (e.g. one unit of time per array access). Although the realism of such cost models in the asymptotic limit can be questioned (see, e.g., Knuth, Reference Knuth1997, Section 2.6), it is broadly taken as read that such models are equally applicable whatever programming language we are working with, and moreover that all respectable languages can represent all algorithms of interest; thus, one does not expect the best achievable asymptotic runtime for a typical algorithm to be sensitive to the choice of programming language, except perhaps in marginal cases.
The situation changes radically, however, if we consider higherorder operations: that is, programmable operations whose inputs may themselves be programmable operations. Here it turns out that both what is computable and the efficiency with which it can be computed can be highly sensitive to the selection of language features present. This is essentially because a program may interact with a given function only in ways prescribed by the language (for instance, by applying it to an argument), and typically has no access to the concrete representation of the function at the machine level.
Most work in this area to date has focused on computability differences. One of the best known examples is the parallel if operation which is computable in a language with parallel evaluation but not in a typical ‘sequential’ programming language (Plotkin, Reference Plotkin1977). It is also well known that the presence of control features or local state enables observational distinctions that cannot be made in a purely functional setting: for instance, there are programs involving ‘call/cc’ that detect the order in which a (callbyname) ‘+’ operation evaluates its arguments (Cartwright & Felleisen, Reference Cartwright and Felleisen1992). Such operations are ‘nonfunctional’ in the sense that their output is not determined solely by the extension of their input (seen as a mathematical function ${{{{{{\mathbb{N}}}}}}}_\bot \times {{{{{{\mathbb{N}}}}}}}_\bot \rightarrow {{{{{{\mathbb{N}}}}}}}_\bot$ ); however, there are also programs with ‘functional’ behaviour that can be implemented with control or local state but not without them (Longley, Reference Longley1999). More recent results have exhibited differences lower down in the language expressivity spectrum: for instance, in a purely functional setting à la Haskell, the expressive power of recursion increases strictly with its type level (Longley, Reference Longley2018), and there are natural operations computable by recursion but not by iteration (Longley, Reference Longley2019). Much of this territory, including the mathematical theory of some of the natural definitions of computability in a higherorder setting, is mapped out by Longley & Normann (Reference Longley and Normann2015).
Relatively few results of this character have so far been established on the complexity side. Pippenger (Reference Pippenger1996) gives an example of an ‘online’ operation on infinite sequences of atomic symbols (essentially a function from streams to streams) such that the first n output symbols can be produced within time ${{{{{{\mathcal{O}}}}}}}(n)$ if one is working in an ‘impure’ version of Lisp (in which mutation of ‘cons’ pairs is admitted), but with a worstcase runtime no better than $\Omega(n \log n)$ for any implementation in pure Lisp (without such mutation). This example was reconsidered by Bird et al. (Reference Bird, Jones and de Moor1997) who showed that the same speedup can be achieved in a pure language by using lazy evaluation. Another candidate is the familiar $\log n$ overhead involved in implementing maps (supporting lookup and extension) in a pure functional language (Okasaki, Reference Okasaki1999), although to our knowledge this situation has not yet been subjected to theoretical scrutiny. Jones (Reference Jones2001) explores the approach of manifesting expressivity and efficiency differences between certain languages by restricting attention to ‘consfree’ programs; in this setting, the classes of representable firstorder functions for the various languages are found to coincide with some wellknown complexity classes.
Our purpose in this paper is to give a clear example of such an inherent complexity difference higher up in the expressivity spectrum. Specifically, we consider the following generic count problem, parametric in n: given a booleanvalued predicate P on the space ${\mathbb B}^n$ of boolean vectors of length n, return the number of such vectors q for which $P\,q = \mathrm{true}$ . We shall consider boolean vectors of any length to be represented by the type $\mathrm{Nat} \to \mathrm{Bool}$ ; thus for each n, we are asking for an implementation of a certain thirdorder operation
Naturally, we do not expect such a generic operation to compete in efficiency with a bespoke counting operation for some specific predicate, but it is nonetheless interesting to ask how efficient it is possible to be with this more modular approach.
A naïve implementation strategy, supported by any reasonable language, is simply to apply P to each of the $2^n$ vectors in turn. A much less obvious, but still purely ‘functional’, approach inspired by Berger (Reference Berger1990) achieves the effect of ‘pruned search’ where the predicate allows it (serving as a warning that counterintuitive phenomena can arise in this territory). This implementation is of interest in its own right and will be discussed in Section 7. Nonetheless, under a certain natural condition on P (namely that it must inspect all n components of the given vector before returning), both the above approaches will have $\Omega(n 2^n)$ runtime.
What we will show is that in a typical callbyvalue functional language without advanced control features, one cannot improve on this: any implementation of $\mathrm{count}_n$ must necessarily take time $\Omega(n2^n)$ on predicates P of a certain kind. Furthermore, we will show that the same lower bound also applies to a richer language supporting affine effect handlers, which suffices for the encoding of exceptions, local state, coroutines, and singleshot continuations. On the other hand, if we move to a language with general effect handlers, it becomes possible to bring the runtime down to ${{{{{{\mathcal{O}}}}}}}(2^n)$ : an asymptotic gain of a factor of n. We also show that our implementation method transfers to the more familiar generic search problem: that of returning the list of all vectors q such that $P\,q = \mathrm{true}$ .
The idea behind the speedup is easily explained and will already be familiar, at least informally, to programmers who have worked with multishot continuations. Suppose for example $n=3$ , and suppose that the predicate P always inspects the components of its argument in the order 0,1,2. A naïve implementation of $\mathrm{count}_3$ might start by applying the given P to $q_0 = (\mathrm{true},\mathrm{true},\mathrm{true})$ , and then to $q_1 = (\mathrm{true},\mathrm{true},\mathrm{false})$ . Clearly, there is some duplication here: the computations of $P\,q_0$ and $P\,q_1$ will proceed identically up to the point where the value of the final component is requested. What we would like to do, then, is to record the state of the computation of $P\,q_0$ at just this point, so that we can later resume this computation with false supplied as the final component value in order to obtain the value of $P\,q_1$ . (Similarly for all other internal nodes in the evident binary tree of boolean vectors.) Of course, such a ‘backup’ approach is easy to realise if one is implementing a bespoke search operation for some particular choice of P; but to apply this idea of resuming previous subcomputations in the generic setting (that is, uniformly in P) requires some feature such as general effect handlers or multishot continuations.
One could also obviate the need for such a feature by choosing to present the predicate P in some other way, but from our present perspective this would be to move the goalposts: our intention is precisely to show that our languages differ in an essential way as regards their power to manipulate data of type $(\mathrm{Nat} \to \mathrm{Bool}) \to \mathrm{Bool}$ . Indeed, a key aspect of our approach, inherited from Longley & Normann (Reference Longley and Normann2015), is that by allowing ourselves to fix the way in which data is given to us, we are able to uncover a wealth of interesting expressivity differences between languages, despite the fact that they are in some sense interencodable. Such an approach also seems reasonable from the perspective of programming in the large: when implementing some program module one does not always have the freedom to choose the form or type of one’s inputs, and in such cases, the kinds of expressivity distinctions we are considering may potentially make a real practical difference.
This idea of using firstclass control to achieve ‘backtracking’ has been exploited before and is fairly widely known (see e.g. Kiselyov et al., Reference Kiselyov, Shan, Friedman and Sabry2005), and there is a clear programming intuition that this yields a speedup unattainable in languages without such control features. Our main contribution in this paper is to provide, for the first time, a precise mathematical theorem that pins down this fundamental efficiency difference, thus giving formal substance to this intuition. Since our goal is to give a realistic analysis of the asymptotic runtimes achievable in various settings, but without getting bogged down in inessential implementation details, we shall work concretely and operationally with a CEKstyle abstract machine semantics as our basic model of execution time. The details of this model are only explicitly used for showing that our efficient implementation of generic count with effect handlers has the claimed ${{{{{{\mathcal{O}}}}}}}(2^n)$ runtime; but it also plays a background role as our reference model of runtime for the $\Omega(n2^n)$ lower bound results, even though we here work mostly with a simpler kind of operational semantics.
In the first instance, we formulate our results as a comparison between a purely functional base language ${\lambda_{\textrm{b}}}$ (a version of callbyvalue PCF) and an extension ${\lambda_{\textrm{h}}}$ with general effect handlers. This allows us to present the key idea in a simple setting, but we then show how our runtime lower bound is also applicable to a more sophisticated language ${{{{{{\lambda_{\textrm{a}}}}}}}}$ with affine effect handlers, intermediate in power between ${\lambda_{\textrm{b}}}$ and ${\lambda_{\textrm{h}}}$ and corresponding broadly to ‘singleshot’ uses of delimited control. Our proof involves some general machinery for reasoning about program evaluation in ${{{{{{\lambda_{\textrm{a}}}}}}}}$ which may be of independent interest.
In summary, our purpose is to exhibit an efficiency difference between singleshot and multishot versions of delimited control which, in our view, manifests a fundamental feature of the programming language landscape. Since many widely used languages do not support multishot control features, this challenges a common assumption that all realworld programming languages are essentially ‘equivalent’ from an asymptotic point of view. We also situate our results within a broader context by informally discussing the attainable efficiency for generic count within a spectrum of weaker languages. We believe that such results are important not only for a rounded understanding of the relative merits of existing languages but also for informing future language design.
For their convenience as structured delimited control operators, we adopt effect handlers (Plotkin & Pretnar, Reference Plotkin and Pretnar2013) as our universal control abstraction of choice, but our results adapt mutatis mutandis to other firstclass control abstractions such as ‘call/cc’ (Sperber et al., Reference Sperber, Dybvig, Flatt, van Stratten, Findler and Matthews2009), ‘control’ ( $\mathcal{F}$ ) and ‘prompt’ ( $\textbf{#}$ ) (Felleisen, Reference Felleisen1988), or ‘shift’ and ‘reset’ (Danvy & Filinski, Reference Danvy and Filinski1990).
The rest of the paper is structured as follows.

Section 2 provides an introduction to effect handlers as a programming abstraction.

Section 3 presents a pure PCFlike language ${\lambda_{\textrm{b}}}$ and an extension ${\lambda_{\textrm{h}}}$ with general effect handlers.

Section 4 defines abstract machines for ${\lambda_{\textrm{b}}}$ and ${\lambda_{\textrm{h}}}$ , yielding a runtime cost model.

Section 5 introduces the generic count problem and some associated machinery and presents an implementation in ${\lambda_{\textrm{h}}}$ with runtime ${{{{{{\mathcal{O}}}}}}}(2^n)$ (perhaps with small additional logarithmic factors according to the precise details of the cost model).

Section 6 discusses some extensions and variations of the foregoing result, adapting it to deal with a wider class of predicates and bridging the gap between generic count and generic search. We also briefly outline how one can use sufficient effect polymorphism to adapt the result to a setting with a typeandeffect system.

Section 7 surveys a range of approaches to generic counting in languages weaker than ${\lambda_{\textrm{h}}}$ , including the one suggested by Berger (Reference Berger1990), emphasising how the attainable efficiency varies according to the language, but observing that none of these approaches match the ${{{{{{\mathcal{O}}}}}}}(2^n)$ runtime bound of our effectful implementation.

Section 8 establishes that any generic count implementation within ${\lambda_{\textrm{b}}}$ must have runtime $\Omega(n2^n)$ on predicates of a certain kind.

Section 9 refines our definition of ${\lambda_{\textrm{h}}}$ to yield a language ${{{{{{\lambda_{\textrm{a}}}}}}}}$ for affine effect handlers, clarifying its relationship to ${\lambda_{\textrm{b}}}$ and ${\lambda_{\textrm{h}}}$ .

Section 10 develops some machinery for reasoning about program evaluation in ${{{{{{\lambda_{\textrm{a}}}}}}}}$ , and applies this to establish the $\Omega(n2^n)$ bound for generic count programs in this language.

Section 11 reports on experiments showing that the theoretical efficiency difference we describe is manifested in practice, using implementations in OCaml of various search procedures.

Section 12 concludes.
The languages ${\lambda_{\textrm{b}}}$ and ${\lambda_{\textrm{h}}}$ are rather minimal versions of previously studied systems – we only include the machinery needed for illustrating the generic search efficiency phenomenon. Some of the less interesting proof details are relegated to the appendices.
Relation to prior work This article is an extended version of the following previously published paper and Chapter 7 of the first author’s PhD dissertation:

Hillerström, D., Lindley, S. & Longley, J. (2020) Effects for efficiency: Asymptotic speedup with firstclass control. Proc. ACM Program. Lang. 4(ICFP), 100:1–100:29

Hillerström, D. (2021) Foundations for Programming and Implementing Effect Handlers. Ph.D. thesis. The University of Edinburgh, Scotland, UK
The main new contribution in the present version is that we introduce a language ${{{{{{\lambda_{\textrm{a}}}}}}}}$ for arbitrary affine effect handlers and develop the theory needed to extend our lower bound result to this language (Section 9), whereas in the previous version, only an extension with local state was considered. We have also included an account of the Berger search procedure (Section 7.3) and have simplified our original proof of the $\Omega(n2^n)$ bound for ${\lambda_{\textrm{b}}}$ (Section 8). The benchmarks have been ported to OCaml 5.1.1 in such a way that the effectful procedures make use of effect handlers internally (Section 11).
2 Effect handlers primer
Effect handlers were originally studied as a theoretical means to provide a semantics for exception handling in the setting of algebraic effects (Plotkin & Power, Reference Plotkin and Power2001; Plotkin & Pretnar, Reference Plotkin and Pretnar2013). Subsequently, they have emerged as a practical programming abstraction for modular effectful programming (Convent et al., Reference Convent, Lindley, McBride and McLaughlin2020; Kammar et al., Reference Kammar, Lindley and Oury2013; Kiselyov et al., Reference Kiselyov, Sabry and Swords2013; Bauer & Pretnar, Reference Bauer and Pretnar2015; Leijen, Reference Leijen2017; Hillerström et al., Reference Hillerström, Lindley and Longley2020; Sivaramakrishnan et al., Reference Sivaramakrishnan, Dolan, White, Kelly, Jaffer and Madhavapeddy2021). In this section, we give a short introduction to effect handlers. For a thorough introduction to programming with effect handlers, we recommend the tutorial by Pretnar (Reference Pretnar2015), and as an introduction to the mathematical foundations of handlers, we refer the reader to the founding paper by Plotkin & Pretnar (Reference Plotkin and Pretnar2013) and the excellent tutorial paper by Bauer (Reference Bauer2018).
Viewed through the lens of universal algebra, an algebraic effect is given by a signature $\Sigma$ of typed operation symbols along with an equational theory that describes the properties of the operations (Plotkin & Power, Reference Plotkin and Power2001). An example of an algebraic effect is nondeterminism, whose signature consists of a single nondeterministic choice operation: $\Sigma \mathrel{\overset{{\mbox{def}}}{=}} \{ \mathrm{Branch} : \mathrm{Unit} \to \mathrm{Bool} \}$ . The operation takes a single parameter of type unit and ultimately produces a boolean value. The pragmatic programmatic view of algebraic effects differs from the original development as no implementation accounts for equations over operations yet.
As a simple example, let us use the operation Branch to model a coin toss. Suppose we have a data type $\mathrm{Toss} \mathrel{\overset{{\mbox{def}}}{=}} \mathrm{Heads} \mid\mathrm{Tails}$ , then we may implement a coin toss as follows.
From the type signature, it is clear that the computation returns a value of type Toss. It is not clear from the signature of toss whether it performs an effect. However, from the definition, it evidently performs the operation Branch with argument $\langle \rangle$ using the $\mathbf{do}$ invocation form. The result of the operation determines whether the computation returns either Heads or Tails. Systems such as Effekt (Brachthäuser et al., Reference Brachthäuser, Schuster and Ostermann2020), Frank (Lindley et al., Reference Lindley, McBride and McLaughlin2017; Convent et al., Reference Convent, Lindley, McBride and McLaughlin2020), Helium (Biernacki et al., Reference Biernacki, Piróg, Polesiuk and Sieczkowski2019, Reference Biernacki, Piróg, Polesiuk and Sieczkowski2020), Koka (Leijen, Reference Leijen2017), and Links (Hillerström & Lindley, Reference Hillerström and Lindley2016; Hillerström et al., Reference Hillerström, Lindley and Longley2020) include typeandeffect systems (or in the case of Effekt a capability type system) which track the use of effectful operations, whilst systems such as Eff (Bauer & Pretnar, Reference Bauer and Pretnar2015) and Multicore OCaml (Dolan et al., Reference Dolan, White, Sivaramakrishnan, Yallop and Madhavapeddy2015)/OCaml 5 (Sivaramakrishnan et al., Reference Sivaramakrishnan, Dolan, White, Kelly, Jaffer and Madhavapeddy2021) choose not to track effects in the type system. Our language is closer to the latter two.
An effectful computation may be used as a subcomputation of another computation, e.g. we can use toss to implement a computation that performs two coin tosses.
We may view an effectful computation as a tree, where the interior nodes correspond to operation invocations and the leaves correspond to return values. The computation tree for tossTwice is as follows.
It models the interaction with the environment. The operation Branch can be viewed as a query for which the response is either true or false. The response is provided by an effect handler. As an example, consider the following handler which enumerates the possible outcomes of two coin tosses.
The $\mathbf{handle}$ construct generalises the exceptional syntax of Benton & Kennedy (Reference Benton and Kennedy2001). This handler has a success clause and an operation clause. The success clause determines how to interpret the return value of tossTwice, or equivalently how to interpret the leaves of its computation tree. It lifts the return value into a singleton list. The operation clause determines how to interpret occurrences of Branch in toss. It provides access to the argument of Branch (which is unit) and its resumption, r. The resumption is a firstclass delimited continuation which captures the remainder of the tossTwice computation from the invocation of Branch inside the first instance of toss up to its nearest enclosing handler.
Applying r to true resumes evaluation of tossTwice via the true branch, which causes another invocation of Branch to occur, resulting in yet another resumption. Applying this resumption yields a possible return value of $[\mathrm{Heads},\mathrm{Heads}]$ , which gets lifted into the singleton list $[[\mathrm{Heads},\mathrm{Heads}]]$ . Afterwards, the latter resumption is applied to false, thus producing the value $[[\mathrm{Heads},\mathrm{Tails}]]$ . Before returning to the first invocation of the initial resumption, the two lists get concatenated to obtain the intermediary result $[[\mathrm{Heads},\mathrm{Heads}],[\mathrm{Heads},\mathrm{Tails}]]$ . Thereafter, the initial resumption is applied to false, which symmetrically returns the list $[[\mathrm{Tails},\mathrm{Heads}],[\mathrm{Tails},\mathrm{Tails}]]$ . Finally, the two intermediary lists get concatenated to produce the final result $[[\mathrm{Heads},\mathrm{Heads}],[\mathrm{Heads},\mathrm{Tails}],[\mathrm{Tails},\mathrm{Heads}],[\mathrm{Tails},\mathrm{Tails}]]$ .
3 Calculi
In this section, we present our base language ${\lambda_{\textrm{b}}}$ and its extension with effect handlers ${\lambda_{\textrm{h}}}$ .
3.1 Base calculus
The base calculus ${\lambda_{\textrm{b}}}$ is a finegrain callbyvalue (Levy et al., Reference Levy, Power and Thielecke2003) variation of PCF (Plotkin, Reference Plotkin1977). Finegrain callbyvalue is similar to Anormal form (Flanagan et al., Reference Flanagan, Sabry, Duba and Felleisen1993) in that every intermediate computation is named, but unlike Anormal form is closed under reduction.
The syntax of ${\lambda_{\textrm{b}}}$ is as follows.
The ground types are Nat and Unit which classify natural number values and the unit value, respectively. The function type $A \to B$ classifies functions that map values of type A to values of type B. The binary product type $A \times B$ classifies pairs of values whose first and second components have types A and B, respectively. The sum type $A + B$ classifies tagged values of either type A or B. Type environments $\Gamma$ map term variables to their types. For hygiene, we require that the variables appearing in a type environment are distinct.
We let k range over natural numbers and c range over primitive operations on natural numbers ( $+, , =$ ). We let x, y, z range over term variables. We also use f, g, h, q for variables of function type and i, j for variables of type Nat. The value terms are standard.
All elimination forms are computation terms. Abstraction is eliminated using application ( $V\,W$ ). The product eliminator $(\mathbf{let} \; {{{{{{\langle x,y \rangle}}}}}} = V \; \mathbf{in} \; N)$ splits a pair V into its constituents and binds them to x and y, respectively. Sums are eliminated by a case split ( $\mathbf{case}\; V\;\{\mathbf{inl}\; x \mapsto M; \mathbf{inr}\; y \mapsto N\}$ ). A trivial computation $(\mathbf{return}\;V)$ returns value V. The sequencing expression $(\mathbf{let} \; x {{{{{{\leftarrow}}}}}} M \; \mathbf{in} \; N)$ evaluates M and binds the result value to x in N.
The typing rules are those given in Figure 1, along with the familiar Exchange, Weakening and Contraction rules for environments. (Note that thanks to Weakening we are able to type terms such as $(\lambda x^A. (\lambda x^B.x))$ , even though environments are not permitted to contain duplicate variables.) We require two typing judgements: one for values and the other for computations. The judgement $\Gamma \vdash \square : A$ states that a $\square$ term has type A under type environment $\Gamma$ , where $\square$ is either a value term (V) or a computation term (M). The constants have the following types.
We give a smallstep operational semantics for
${\lambda _b}$
with evaluation contexts in the style of Felleisen (Reference Felleisen1987). The reduction relation ${{{{{{\leadsto}}}}}}$ is defined on computation terms via the rules given in Figure 2. The statement $M {{{{{{\leadsto}}}}}} N$ reads: term M reduces to term N in one step. We write $R^+$ for the transitive closure of relation R and $R^*$ for the reflexive, transitive closure of relation R.
Most often, we are interested in ${{{{{{\leadsto}}}}}}$ as a relation on closed terms. However, we will sometimes consider it as a relation on terms involving free variables, with the stipulation that none of these free variables also occur as bound variables within the terms. Since we never perform reductions under a binder, this means that the notation $M[V/x]$ in our rules may be taken simply to mean M with V textually substituted for free occurrences of x (no variable capture is possible). We also take $\ulcorner c \urcorner$ to mean the usual interpretation of constant c as a metalevel function on closed values.
The type soundness of our system is easily verified. This is subsumed by the property we shall formally state for the richer language ${\lambda_{\textrm{h}}}$ in Theorem 1 below.
When dealing with reductions $N {{{{{{\leadsto}}}}}} N'$ , we shall often make use of the idea that certain subterm occurrences within N’ arise from corresponding identical subterms of N. For instance, in the case of a reduction $(\lambda x^A .M) V {{{{{{\leadsto}}}}}} M[V/x]$ , we shall say that any subterm occurrence P within any of the substituted copies of V on the righthand side is a descendant of the corresponding subterm occurrence within the V on the lefthand side. (Descendants are called residuals e.g. in Barendregt, Reference Barendregt1984.) Similarly, any subterm occurrence Q of $M[V/x]$ not overlapping with any of these substituted copies of V is a descendant of the corresponding occurrence of an identical subterm within the M on the left. This notion extends to the other reduction rules in the evident way; we suppress the formal details. If P’ is a descendant of P, we also say that P is an ancestor of P’. By transitivity we extend these notions to the relations ${{{{{{\leadsto}}}}}}^+$ and ${{{{{{\leadsto}}}}}}^*$ . Note that if $N{{{{{{\leadsto}}}}}}^* N'$ then a subterm occurrence in N’ may have at most one ancestor in N but a subterm occurrence in N may have many descendants in N’.
Notation. We elide type annotations when clear from context. For convenience we often write code in directstyle assuming the standard lefttoright callbyvalue elaboration into finegrain callbyvalue (Moggi, Reference Moggi1991; Flanagan et al., Reference Flanagan, Sabry, Duba and Felleisen1993). For example, the expression $f\,(h\,w) + g\,{\langle \rangle}$ is syntactic sugar for:
We define sequencing of computations in the standard way.
We make use of standard syntactic sugar for pattern matching. For instance, we write
for suspended computations, and if the binder has a type other than Unit, we write:
We use the standard encoding of booleans as a sum:
3.2 Handler calculus
We now define ${\lambda_{\textrm{h}}}$ as an extension of ${\lambda_{\textrm{b}}}$ .
We assume a fixed effect signature $\Sigma$ that associates types $\Sigma(\ell)$ to finitely many operation symbols $\ell$ . An operation type $A \to B$ classifies operations that take an argument of type A and return a result of type B. A handler type $C \Rightarrow D$ classifies effect handlers that transform computations of type C into computations of type D. Following Pretnar (Reference Pretnar2015), we assume a global signature for every program. Computations are extended with operation invocation ( $\mathbf{do}\;\ell\;V$ ) and effect handling ( $\mathbf{handle}\; M \;\mathbf{with}\; H$ ). Handlers are constructed from a single success clause $(\{\mathbf{val}\; x \mapsto M\})$ and an operation clause $(\{ \ell \; p \; r \mapsto N \})$ for each operation $\ell$ in $\Sigma$ ; here the x,p,r are considered as bound variables. Following Plotkin & Pretnar (Reference Plotkin and Pretnar2013), we adopt the convention that a handler with missing operation clauses (with respect to $\Sigma$ ) is syntactic sugar for one in which all missing clauses perform explicit forwarding:
The typing rules for ${\lambda_{\textrm{h}}}$ are those of ${\lambda_{\textrm{b}}}$ (Figure 1) plus three additional rules for operations, handling, and handlers given in Figure 3. The TDo rule ensures that an operation invocation is only welltyped if the operation $\ell$ appears in the effect signature $\Sigma$ and the argument type A matches the type of the provided argument V. The result type B determines the type of the invocation. The THandle rule types handler application. The THandler rule ensures that the bodies of the success clause and the operation clauses all have the output type D. The type of x in the success clause must match the input type C. The type of the parameter p ( $A_\ell$ ) and resumption r ( $B_\ell \to D$ ) in operation clause $H^{\ell}$ is determined by the type of $\ell$ ; the return type of r is D, as the body of the resumption will itself be handled by H. We write $H^{\ell}$ and $H^{\mathrm{val}}$ for projecting success and operation clauses.
We extend the operational semantics to ${\lambda_{\textrm{h}}}$ . Specifically, we add two new reduction rules: one for handling return values and another for handling operation invocations.
The first rule invokes the success clause. The second rule handles an operation via the corresponding operation clause.
To allow for the evaluation of subterms within $\mathbf{handle}$ expressions, we extend our earlier grammar for evaluation contexts to one for handler contexts:
We then replace the SLift rule with a corresponding rule for handler contexts.
However, it is critical that the rule SOp is restricted to pure evaluation contexts ${{{{{{\mathcal{E}}}}}}}$ rather than handler contexts. This ensures that the $\mathbf{do}$ invocation is handled by the innermost handler (recalling our convention that all handlers handle all operations). If arbitrary handler contexts ${{{{{{\mathcal{H}}}}}}}$ were permitted in this rule, the semantics would become nondeterministic, as any handler in scope could be selected.
The ancestordescendant relation for subterm occurrences extends to ${\lambda_{\textrm{h}}}$ in the obvious way.
We now characterise normal forms and state the standard type soundness property of ${\lambda_{\textrm{h}}}$ .
Definition 1 (Computation normal forms). A computation term N is normal with respect to $\Sigma$ if $N = \mathbf{return}\;V$ for some V or $N = {{{{{{\mathcal{E}}}}}}}[\mathbf{do}\;\ell\,W]$ for some $\ell \in dom(\Sigma)$ , ${{{{{{\mathcal{E}}}}}}}$ , and W.
Theorem 1 (Type Soundness for ${\lambda_{\textrm{h}}}$ ) If $ \vdash M : C$ , then either there exists $ \vdash N : C$ such that $M {{{{{{\leadsto}}}}}}^\ast N$ and N is normal with respect to $\Sigma$ , or M diverges.
It is worth observing that our language does not prohibit ‘operation extrusion’: even if we begin with a term in which all $\mathbf{do}$ invocations fall within the scope of a handler, this property need not be preserved by reductions, since a $\mathbf{do}$ invocation may pass another $\mathbf{do}$ to the outermost handler. Such behaviour may be readily ruled out using a typeandeffect system, but this additional machinery is not necessary for our present purposes.
4 Abstract machine semantics
Thus far we have introduced the base calculus ${\lambda_{\textrm{b}}}$ and its extension with effect handlers ${\lambda_{\textrm{h}}}$ . For each calculus, we have given a smallstep operational semantics which uses a substitution model for evaluation. Whilst this model is semantically pleasing, it falls short of providing a realistic account of practical computation as substitution is an expensive operation. We now develop a more practical model of computation based on an abstract machine semantics.
4.1 Base machine
We choose a CEKstyle abstract machine semantics (Felleisen & Friedman, Reference Felleisen and Friedman1987) for ${\lambda _b}$ based on that of Hillerström et al. (Reference Hillerström, Lindley and Longley2020). The CEK machine operates on configurations which are triples of the form ${{{{{{\langle M \mid \gamma \mid \sigma \rangle}}}}}}$ . The first component contains the computation currently being evaluated. The second component contains the environment $\gamma$ which binds free variables. The third component contains the continuation which instructs the machine how to proceed once evaluation of the current computation is complete. The syntax of abstract machine states is as follows.
Values consist of function closures, constants, pairs, and left or right tagged values. We refer to continuations of the base machine as pure. A pure continuation is a stack of pure continuation frames. A pure continuation frame $(\gamma, x, N)$ closes a letbinding $\mathbf{let} \;x{{{{{{\leftarrow}}}}}} [~] \;\mathbf{in}\;N$ over environment $\gamma$ . We write ${{{{{{[]}}}}}}$ for an empty pure continuation and $\phi {{{{{{::}}}}}} \sigma$ for the result of pushing the frame $\phi$ onto $\sigma$ . We use pattern matching to deconstruct pure continuations.
The abstract machine semantics is given in Figure 4. The transition relation ( ${{{{{{\longrightarrow}}}}}}$ ) makes use of the value interpretation ( $[\![  ]\!] $ ) from value terms to machine values. The machine is initialised by placing a term in a configuration alongside the empty environment ( $\emptyset$ ) and the identity pure continuation ( ${{{{{{[]}}}}}}$ ). The rules (MApp), (MRec), (MConst), (MSplit), (MCaseL), and (MCaseR) eliminate values. The (MLet) rule extends the current pure continuation with let bindings. The (MRetCont) rule extends the environment in the top frame of the pure continuation with a returned value. Given an input of a welltyped closed computation term $ \vdash M : A$ , the machine will either diverge or return a value of type A. A final state is given by a configuration of the form ${{{{{{\langle \mathbf{return}\;V \mid \gamma \mid {{{{{{[]}}}}}} \rangle}}}}}}$ in which case the final return value is given by the denotation $[\![ V ]\!] \gamma$ of V under environment $\gamma$ .
Correctness. The base machine faithfully simulates the operational semantics for ${\lambda_{\textrm{b}}}$ ; most transitions correspond directly to $\beta$ reductions, but MLet performs an administrative step to bring the computation M into evaluation position. We formally state and prove the correspondence in Appendix A, relying on an inverse map $()$ from configurations to terms (Hillerström et al., Reference Hillerström, Lindley and Longley2020).
4.2 Handler machine
We now enrich the ${\lambda_{\textrm{b}}}$ machine to a ${\lambda_{\textrm{h}}}$ machine. We extend the syntax as follows.
The notion of configurations changes slightly in that the continuation component is replaced by a generalised continuation $\kappa \in \mathsf{Cont}$ (Hillerström et al., Reference Hillerström, Lindley and Longley2020); a continuation is now a list of resumptions. A resumption is a pair of a pure continuation (as in the base machine) and a handler closure ( $\chi$ ). A handler closure consists of an environment and a handler definition, where the former binds the free variables that occur in the latter. The machine is initialised by placing a term in a configuration alongside the empty environment ( $\emptyset$ ) and the identity continuation ( $\kappa_0$ ). The latter is a singleton list containing the identity resumption, which consists of the identity pure continuation paired with the identity handler closure:
Machine values are augmented to include resumptions as an operation invocation causes the topmost frame of the machine continuation to be reified (and bound to the resumption parameter in the operation clause).
The handler machine adds transition rules for handlers and modifies $(\text{ML<sc>et</sc>})$ and $(\text{MRetCont})$ from the base machine to account for the richer continuation structure. Figure 5 depicts the new and modified rules. The $(\text{MHandle})$ rule pushes a handler closure along with an empty pure continuation onto the continuation stack. The $(\text{MRetHandler})$ rule transfers control to the success clause of the current handler once the pure continuation is empty. The $(\text{MHandleOp})$ rule transfers control to the matching operation clause on the topmost handler, and during the process it reifies the handler closure. Finally, the $(\text{MResume})$ rule applies a reified handler closure, by pushing it onto the continuation stack. The handler machine has two possible final states: either it yields a value or it gets stuck on an unhandled operation.
Correctness. The handler machine faithfully simulates the operational semantics of ${\lambda_{\textrm{h}}}$ . Extending the result for the base machine, we formally state and prove the correspondence in Appendix B.
4.3 Realisability and asymptotic complexity
As witnessed by the work of Hillerström & Lindley (Reference Hillerström and Lindley2016), the machine structures are readily realisable using standard persistent functional data structures. Pure continuations on the base machine and generalised continuations on the handler machine can be implemented using linked lists with a time complexity of ${{{{{{\mathcal{O}}}}}}}(1)$ for the extension operation $(\_{{{{{{::}}}}}}\_)$ . The topmost pure continuation on the handler machine may also be extended in time ${{{{{{\mathcal{O}}}}}}}(1)$ , as extending it only requires reaching under the topmost handler closure. Environments, $\gamma$ , can be realised using a map, with a time complexity of ${{{{{{\mathcal{O}}}}}}}(\log\gamma)$ for extension and lookup (Okasaki, Reference Okasaki1999). We can use the same technique to realise label lookup, $H^{\ell}$ , with time complexity ${{{{{{\mathcal{O}}}}}}}(\log\Sigma)$ . However, in Section 5.4, we shall work only with a single effect operation, so $\Sigma = 1$ , meaning that in our analysis we can practically treat label lookup as being a constant time operation.
The worstcase time complexity of a single machine transition is exhibited by rules which involves the value translation function, $[\![  ]\!] \gamma$ , as it is defined structurally on values. Its worsttime complexity is exhibited by a nesting of pairs of variables $[\![ {{{{{{\langle x_1,{{{{{{\langle x_2,\cdots,{{{{{{\langle x_{n1},x_n \rangle}}}}}}\cdots \rangle}}}}}} \rangle}}}}}} ]\!] \gamma$ which has complexity ${{{{{{\mathcal{O}}}}}}}(n\log\gamma)$ .
Continuation copying. On the handler machine the topmost continuation frame can be copied in constant time due to the persistent runtime and the layout of machine continuations. An alternative design would be to make the runtime nonpersistent in which case copying a continuation frame $((\sigma, \chi) {{{{{{::}}}}}} \_)$ would be a ${{{{{{\mathcal{O}}}}}}}(\sigma + \chi)$ time operation, where $\chi$ is the size of the handler closure $\chi$ .
Primitive operations on naturals. Our model assumes that arithmetic operations on arbitrary natural numbers take ${{{{{{\mathcal{O}}}}}}}(1)$ time. This is common practice in the study of algorithms when the main interest lies elsewhere (Cormen et al., Reference Cormen, Leiserson, Rivest and Stein2009, Section 2.2). If desired, one could adopt a more refined cost model that accounted for the bitlevel complexity of arithmetic operations; however, doing so would have the same impact on both of the situations we are wishing to compare, and thus would add nothing but noise to the overall analysis.
5 Predicates, decision trees and generic count
We now come to the crux of the paper. In this section and the next, we prove that ${\lambda_{\textrm{h}}}$ supports implementations of certain operations with an asymptotic runtime bound that cannot be achieved in ${\lambda_{\textrm{b}}}$ (Section 8). While the positive half of this claim essentially consolidates a known piece of folklore, the negative half appears to be new. To establish our result, it will suffice to exhibit a single ‘efficient’ program in ${\lambda_{\textrm{h}}}$ , then show that no equivalent program in ${\lambda_{\textrm{b}}}$ can achieve the same asymptotic efficiency. We take generic search as our example.
Generic search is a modular search procedure that takes as input a predicate P on some multidimensional search space and finds all points of the space satisfying P. Generic search is agnostic to the specific instantiation of P, and as a result is applicable across a wide spectrum of domains. Classic examples such as Sudoku solving (Bird, Reference Bird2006), the nqueens problem (Bell & Stevens, Reference Bell and Stevens2009) and graph colouring can be cast as instances of generic search, and similar ideas have been explored in connection with exact real integration (Simpson, Reference Simpson1998; Daniels, Reference Daniels2016).
For simplicity, we will restrict attention to search spaces of the form ${{{{{{\mathbb{B}}}}}}}^n$ , the set of bit vectors of length n. To exhibit our phenomenon in the simplest possible setting, we shall actually focus on the generic count problem: given a predicate P on some ${{{{{{\mathbb{B}}}}}}}^n$ , return the number of points of ${{{{{{\mathbb{B}}}}}}}^n$ satisfying P. However, we shall explain why our results are also applicable to generic search proper.
We shall view ${{{{{{\mathbb{B}}}}}}}^n$ as the set of functions ${{{{{{\mathbb{N}}}}}}}_n \to {{{{{{\mathbb{B}}}}}}}$ , where ${{{{{{\mathbb{N}}}}}}}_n \mathrel{\overset{{\mbox{def}}}{=}} \{0,\dots,n1\}$ . In both ${\lambda_{\textrm{b}}}$ and ${\lambda_{\textrm{h}}}$ we may represent such functions by terms of type $\mathrm{Nat} \to \mathrm{Bool}$ . We will often informally write $\mathrm{Nat}_n$ in place of Nat to indicate that only the values $0,\dots,n1$ are relevant, but this convention has no formal status since our setup does not support dependent types.
To summarise, in both ${\lambda_{\textrm{b}}}$ and ${\lambda_{\textrm{h}}}$ we will be working with the types
and will be looking for programs
such that for suitable terms P representing semantic predicates $\Pi: {{{{{{\mathbb{B}}}}}}}^n \to {{{{{{\mathbb{B}}}}}}}$ , $\mathrm{count}_n~P$ finds the number of points of ${{{{{{\mathbb{B}}}}}}}^n$ satisfying $\Pi$ .
Before formalising these ideas more closely, let us look at some examples, which will also illustrate the machinery of decision trees that we will be using.
5.1 Examples of points, predicates and trees
Consider first the following terms of type Point:
(Here $\bot$ is the diverging term $(\mathbf{rec}\; f\,i.f\,i)\,{\langle \rangle}$ .) Then $\mathrm{q}_0$ represents $\langle{\mathrm{true},\dots,\mathrm{true}}\rangle \in {{{{{{\mathbb{B}}}}}}}^n$ for any n; $\mathrm{q}_1$ represents $\langle{\mathrm{true},\mathrm{false},\dots,\mathrm{false}}\rangle \in {{{{{{\mathbb{B}}}}}}}^n$ for any $n \geq 1$ ; and $\mathrm{q}_2$ represents $\langle{\mathrm{true},\mathrm{false}}\rangle \in {{{{{{\mathbb{B}}}}}}}^2$ .
Next some predicates. First, the following terms all represent the constant true predicate ${{{{{{\mathbb{B}}}}}}}^2 \to {{{{{{\mathbb{B}}}}}}}$ :
These illustrate that in the course of evaluating a predicate term P at a point q, for each $i<n$ the value of q at i may be inspected zero, one or many times.
Likewise, the following all represent the ‘identity’ predicate ${{{{{{\mathbb{B}}}}}}}^1 \to {{{{{{\mathbb{B}}}}}}}$ , in a sense to be made precise below (here $\&\&$ is shortcut ‘and’):
Slightly more interestingly, for each n we have the following program which determines whether a point contains an odd number of true components:
Here fold and map are the standard combinators on lists, and $\otimes$ is exclusiveor. Applying $\mathrm{Odd}_2$ to $\mathrm{q}_0$ yields false; applying it to $\mathrm{q}_1$ or $\mathrm{q}_2$ yields true.
We can think of a predicate term P as participating in a ‘dialogue’ with a given point $Q : \mathrm{Point}_n$ . The predicate may query Q at some coordinate k; Q may respond with true or false and this returned value may influence the future course of the dialogue. After zero or more such query/response pairs, the predicate may return a final answer (true or false).
The set of possible dialogues with a given term P may be organised in an obvious way into an unrooted binary decision tree, in which each internal node is labelled with a query $\mathord{?} k$ (with $k<n$ ), and with left and right branches corresponding to the responses true, false respectively. Any point will thus determine a path through the tree, and each leaf is labelled with an answer $\mathord{!} \mathrm{true}$ or $\mathord{!} \mathrm{false}$ according to whether the corresponding point or points satisfy the predicate.
Decision trees for a sample of the above predicate terms are depicted in Figure 6; the relevant formal definitions are given in the next subsection. In the case of $\mathrm{I}_2$ , one of the $\mathord{!} \mathrm{false}$ leaves will be ‘unreachable’ if we are working in ${\lambda_{\textrm{b}}}$ (but reachable in a language supporting mutable state).
We think of the edges in the tree as corresponding to portions of computation undertaken by P between queries, or before delivering the final answer. The tree is unrooted (i.e. starts with an edge rather than a node) because in the evaluation of $P\,Q$ there is potentially some ‘thinking’ done by P even before the first query or answer is reached. For the purpose of our runtime analysis, we will also consider timed variants of these decision trees, in which each edge is labelled with the number of computation steps involved.
It is possible that for a given P, the construction of a decision tree may hit trouble, because at some stage P either goes undefined or gets stuck at an unhandled operation. It is also possible that the decision tree is infinite because P can keep asking queries forever. However, we shall be restricting our attention to terms representing total predicates: those with finite decision trees in which every path leads to a leaf.
In order to present our complexity results in a simple and clear form, we will give special prominence to certain wellbehaved decision trees. For $n \in {{{{{{\mathbb{N}}}}}}}$ , we shall say a tree is nstandard if it is total (i.e. every maximal path leads to a leaf labelled with an answer) and along any path to a leaf, each coordinate $k<n$ is queried once and only once. Thus, an nstandard decision tree is a complete binary tree of depth $n+1$ , with $2^n  1$ internal nodes and $2^n$ leaves. However, there is no constraint on the order of the queries, which indeed may vary from one path to another. One pleasing property of this notion is that for a predicate term with an nstandard decision tree, the number of points in ${{{{{{\mathbb{B}}}}}}}^n$ satisfying the predicate is precisely the number of $\mathord{!} \mathrm{true}$ leaves in the tree.
Of the examples we have given, the tree for $\mathrm{T}_0$ is 0standard; those for $\mathrm{I}_0$ and $\mathrm{I}_1$ are 1standard; that for $\mathrm{Odd}_n$ is nstandard; and the rest are not nstandard for any n.
5.2 Formal definitions
We now formalise the above notions. We will present our definitions in the setting of ${\lambda_{\textrm{h}}}$ , but everything can clearly be relativised to ${\lambda_{\textrm{b}}}$ with no change to the meaning in the case of ${\lambda_{\textrm{b}}}$ terms. For the purpose of this subsection, we fix $n \in {{{{{{\mathbb{N}}}}}}}$ , set ${{{{{{\mathbb{N}}}}}}}_n \mathrel{\overset{{\mbox{def}}}{=}} \{0,\ldots,n1\}$ , and use k to range over ${{{{{{\mathbb{N}}}}}}}_n$ . We write ${{{{{{\mathbb{B}}}}}}}$ for the set of booleans, which we shall identify with the (encoded) boolean values of ${\lambda_{\textrm{h}}}$ , and use b to range over ${{{{{{\mathbb{B}}}}}}}$ .
As suggested by the foregoing discussion, we will need to work with both syntax and semantics. For points, the relevant definitions are as follows.
Definition 2 (points). A closed value $Q : \mathrm{Point}$ is said to be a syntactic npoint if:
A semantic npoint $\pi$ is simply a mathematical function $\pi: {{{{{{\mathbb{N}}}}}}}_n \to {{{{{{\mathbb{B}}}}}}}$ . (We shall also write $\pi \in {{{{{{\mathbb{B}}}}}}}^n$ .) Any syntactic npoint Q is said to denote the semantic npoint $[\![ Q ]\!] $ given by
Any two syntactic npoints Q and Q’ are said to be distinct if $[\![ Q ]\!] \neq [\![ Q' ]\!] $ .
By default, the unqualified term npoint will from now on refer to syntactic npoints.
Likewise, we wish to work with predicates both syntactically and semantically. By a semantic npredicate, we shall mean simply a mathematical function $\Pi: {{{{{{\mathbb{B}}}}}}}^n \to {{{{{{\mathbb{B}}}}}}}$ . One slick way to define syntactic npredicates would be as closed terms $P:\mathrm{Predicate}$ such that for every npoint Q, $P\,Q$ evaluates to either $\mathbf{return}\;\mathrm{true}$ or $\mathbf{return}\;\mathrm{false}$ . For our purposes, however, we shall favour an approach to npredicates via decision trees, which will yield more information on their behaviour.
We will model decision trees as certain partial functions from addresses to labels. An address will specify the position of a node in the tree via the path that leads to it, while a label will represent the information present at a node. Formally:
Definition 3 (untimed decision tree).

(i) The address set $\mathsf{Addr}$ is simply the set ${{{{{{\mathbb{B}}}}}}}^\ast$ of finite lists of booleans. If $bs,bs' \in \mathsf{Addr}$ , we write $bs \sqsubseteq bs'$ (resp. $bs \sqsubset bs'$ ) to mean that bs is a prefix (resp. proper prefix) of bs’.

(ii) The label set $\mathsf{Lab}$ consists of queries parameterised by a natural number and answers parameterised by a boolean:
\[ \mathsf{Lab} \mathrel{\overset{{\mbox{def}}}{=}} \{\mathord{?} k \mid k \in {{{{{{\mathbb{N}}}}}}} \} \cup \{\mathord{!} b \mid b \in {{{{{{\mathbb{B}}}}}}} \} \] 
(iii) An (untimed) decision tree is a partial function $\tau : \mathsf{Addr}\rightharpoonup \mathsf{Lab}$ such that:

The domain of $\tau$ (written $dom(\tau)$ ) is prefix closed.

Answer nodes are always leaves: if $\tau(bs) = \mathord{!} b$ then $\tau(bs')$ is undefined whenever $bs \sqsubset bs'$ .

As our goal is to reason about the time complexity of generic count programs and their predicates, it is also helpful to decorate decision trees with timing data that records the number of machine steps taken for each piece of computation performed by a predicate:
Definition 4 (timed decision tree). A timed decision tree is a partial function $\tau : \mathsf{Addr} \rightharpoonup\mathsf{Lab} \times {{{{{{\mathbb{N}}}}}}}$ such that its first projection $bs \mapsto \tau(bs).1$ is a decision tree. We write $\mathsf{labs}(\tau)$ for the first projection ( $bs \mapsto \tau(bs).1$ ) and $\mathsf{steps}(\tau)$ for the second projection ( $bs \mapsto \tau(bs).2$ ) of a timed decision tree.
Here we think of $\mathsf{steps}(\tau)(bs)$ as the computation time associated with the edge whose target is the node addressed by bs.
We now come to the method for associating a specific tree with a given term P. One may think of this as a kind of denotational semantics, but here we shall extract a tree from a term by purely operational means using our abstract machine model. The key idea is to try applying P to a distinguished free variable $q: \mathrm{Point}$ , which we think of as an ‘abstract point’. Whenever P wants to interrogate its argument at some index i, the computation will get stuck at some term $q\,i$ : this both flags up the presence of a query node in the decision tree, and allows us to explore the subsequent behaviour under both possible responses to this query.
Our definition captures this idea using abstract machine configurations. We write ${\mathrm{Conf}}_q$ for the set of $\lambda_h$ configurations possibly involving q (but no other free variables). We write $a \simeq b$ for Kleene equality: either both a and b are undefined or both are defined and $a = b$ .
It is convenient to define the timed tree and then extract the untimed one from it:
Definition 5.

(i) Define $\mathcal{T}: {\mathrm{Conf}}_q \to \mathsf{Addr} \rightharpoonup (\mathsf{Lab} \times {{{{{{\mathbb{N}}}}}}})$ to be the minimal family of partial functions satisfying the following equations:
Here $\mathrm{inc}(\ell, s) = (\ell, s + 1)$ , and in all of the above equations $\gamma(q) = \gamma'(q) = q$ . Clearly $\mathcal{T}(\mathcal{C})$ is a timed decision tree for any $\mathcal{C} \in {\mathrm{Conf}}_q$ .

(ii) The timed decision tree of a computation term is obtained by placing it in the initial configuration: $\mathcal{T}(M) \mathrel{\overset{{\mbox{def}}}{=}} \mathcal{T}({{{{{{\langle M, \emptyset[q \mapsto q], \kappa_0 \rangle}}}}}})$ .

(iii) The timed decision tree of a closed value $P:\mathrm{Predicate}$ is $\mathcal{T}(P\,q)$ . Since q plays the role of a dummy argument, we will usually omit it and write $\mathcal{T}(P)$ for $\mathcal{T}(P\,q)$ .

(iv) The untimed decision tree $\mathcal{U}(P)$ is obtained from $\mathcal{T}(P)$ via first projection: $\mathcal{U}(P) = \mathsf{labs}(\mathcal{T}(P))$ .
If the execution of a configuration $\mathcal{C}$ runs forever or gets stuck at an unhandled operation, then $\mathcal{T}(\mathcal{C})(bs)$ will be undefined for all bs. Although this is admitted by our definition of decision tree, we wish to exclude such behaviours for the terms we accept as valid predicates. Specifically, we frame the following definition:
Definition 6 A decision tree $\tau$ is an npredicate tree if it satisfies the following:

For every query $\mathord{?} k$ appearing in $\tau$ , we have $k \in {{{{{{\mathbb{N}}}}}}}_n$ .

Every query node has both children present:
\[ \forall bs \in \mathsf{Addr},\, k \in {{{{{{\mathbb{N}}}}}}}_n,\, b \in {{{{{{\mathbb{B}}}}}}}.~ \tau(bs) = \mathord{?} k \Rightarrow {{{{{{bs \mathbin{+\!\!+} [b]}}}}}} \in dom(\tau) \] 
All paths in $\tau$ are finite (so every maximal path terminates in an answer node).
A closed term $P: \mathrm{Predicate}$ is a (syntactic) npredicate if $\mathcal{U}(P)$ is an npredicate tree.
If $\tau$ is an npredicate tree, clearly any semantic npoint $\pi$ gives rise to a path $b_0 b_1 \dots $ through $\tau$ , given inductively by
This path will terminate at some answer node $b_0 b_1 \dots b_{r1}$ of $\tau$ , and we may write $\tau \bullet \pi \in {{{{{{\mathbb{B}}}}}}}$ for the answer at this leaf.
Proposition 1. If P is an npredicate and Q is an npoint, then $P\,Q {{{{{{\leadsto}}}}}}^\ast \mathbf{return}\;b$ where $b = \mathcal{U}(P) \bullet [\![ Q ]\!] $ .
Proof By interleaving the computation for the relevant path through $\mathcal{U}(P)$ with computations for queries to Q, and appealing to the correspondence between the smallstep reduction and abstract machine semantics. We omit the routine details.
It is thus natural to define the denotation of an npredicate P to be the semantic npredicate $[\![ P ]\!] $ given by $[\![ P ]\!] (\pi) = \mathcal{U}(P) \bullet \pi$ .
As mentioned earlier, we shall also be interested in a more constrained class of trees and predicates:
Definition 7 (standard trees and predicates). An npredicate tree $\tau$ is said to be nstandard if the following hold:

The domain of $\tau$ is precisely $\mathsf{Addr}_n$ , the set of bit vectors of length $\leq n$ .

There are no repeated queries along any path in $\tau$ :
\[ \forall bs, bs' \in dom(\tau),\, k \in {{{{{{\mathbb{N}}}}}}}_n.~ bs \sqsubseteq bs' \wedge \tau(bs)=\tau(bs')=\mathord{?} k \Rightarrow bs=bs' \]
A timed decision tree $\tau$ is nstandard if its underlying untimed decision tree $\mathsf{labs}(\tau)$ is too. An npredicate P is nstandard if $\mathcal{U}(P)$ is nstandard.
Clearly, in an nstandard tree, each of the n queries $\mathord{?} 0,\dots, \mathord{?}(n1)$ appears exactly once on the path to any leaf, and there are $2^n$ leaves, all of them answer nodes.
It is also clear how for any nstandard tree $\tau$ we may construct a predicate P that denotes it, simply by mirroring the structure of $\tau$ with nested $\mathbf{if}$ expressions:
Definition 8 (canonical standard predicates). Given an nstandard tree $\tau$ , we may associate to each address $bs \in dom(\tau)$ a $\lambda_b$ term $T_q(\tau,bs)$ (with free variable $q : \mathrm{Point}$ ) by reverse induction on the length of bs:
We then define
(so that clearly $\mathcal{U}(P(\tau)) = \tau$ ), and call $P(\tau)$ the canonical nstandard predicate for $\tau$ .
In practice, we will omit the subscript q from uses of T.
Note that the use of lists here is entirely at the metalevel, and none of the terms $T(\tau,bs)$ themselves involve list data. Because of their simple, standardised form, canonical nstandard predicates will play a useful role in our lower bound analysis in Section 8.
5.3 Specification of counting programs
We can now specify what it means for a program $K : \mathrm{Predicate} \to \mathrm{Nat}$ to implement counting.
Definition 9.

(i) The count of a semantic npredicate $\Pi$ , written $\sharp \Pi$ , is simply the number of semantic npoints $\pi \in {{{{{{\mathbb{B}}}}}}}^n$ for which $\Pi(\pi)=\mathrm{true}$ .

(ii) If P is any npredicate, we say that K correctly counts P if $K\,P {{{{{{\leadsto}}}}}}^\ast \mathbf{return}\;m$ , where $m = \sharp [\![ P ]\!] $ .
This definition gives us the flexibility to talk about counting programs that operate on various classes of predicates, allowing us to state our results in their strongest natural form. On the positive side, we shall shortly see that there is a single ‘efficient’ program in ${\lambda_{\textrm{h}}}$ that correctly counts all nstandard $\lambda_h$ predicates for every n; in Section 6.1 we improve this to one that correctly counts all npredicates of $\lambda_h$ . On the negative side, we shall show that an nindexed family of counting programs written in ${\lambda_{\textrm{b}}}$ , even if only required to work correctly on canonical nstandard $\lambda_b$ predicates, can never compete with our ${\lambda_{\textrm{h}}}$ program for asymptotic efficiency even in the most favourable cases.
5.4 Efficient generic count with effects
We now present the simplest version of our effectful implementation of counting: one that works on nstandard predicates.
Our program uses a variation of the handler for nondeterministic computation that we gave in Section 2. The main idea is to implement points as ‘nondeterministic computations’ using the Branch operation such that the handler may respond to every query twice, by invoking the provided resumption with true and subsequently false. The key insight is that the resumption restarts computation at the invocation site of Branch, meaning that prior computation performed by the predicate need not be repeated. In other words, the resumption ensures that common portions of computations prior to any query are shared between both branches.
We assert that $\mathrm{Branch} : \mathrm{Unit} \to \mathrm{Bool} \in \Sigma$ is a distinguished operation that may not be handled in the definition of any input predicate (it has to be forwarded according to the default convention). The algorithm is then as follows.
The handler applies predicate pred to a single ‘generic point’ defined using Branch. The boolean return value is interpreted as a single solution, whilst Branch is interpreted by alternately supplying true and false to the resumption and summing the results. The sharing enabled by the use of the resumption is exactly the ‘magic’ we need to make it possible to implement generic count more efficiently in ${\lambda_{\textrm{h}}}$ than in ${\lambda_{\textrm{b}}}$ . A curious feature of effcount is that it works for all nstandard predicates without having to know the value of n.
We may now articulate the crucial correctness and efficiency properties of effcount.
Theorem 2. The following hold for any $n \in {{{{{{\mathbb{N}}}}}}}$ and any nstandard predicate P of ${\lambda_{\textrm{h}}}$ :

1. effcount correctly counts P.

2. The number of machine steps required to evaluate $\mathrm{effcount}~P$ is
\[ \left( \displaystyle\sum_{bs \in \mathsf{Addr}_n} \mathsf{steps}(\mathcal{T}(P))(bs) \right) ~+~ {{{{{{\mathcal{O}}}}}}}(2^n) \]
Proof [Outline.] Suppose $bs \in \mathsf{Addr}_n$ , with length j. From the construction of $\mathcal{T}(P)$ , one may easily read off a configuration $\mathcal{C}_{bs}$ whose execution is expected to compute the count for the subtree below node bs, and we can explicitly describe the form $\mathcal{C}_{bs}$ will have. We write $\mathrm{Hyp}(bs)$ for the claim that $\mathcal{C}_{bs}$ correctly counts this subtree and does so within the following number of steps:
The $9*(2^{nj}1)$ expression is the number of machine steps contributed by the Branchcase inside the handler, whilst the $2*2^{nj}$ expression is the number of machine steps contributed by the $\mathbf{val}$ case. We prove $\mathrm{Hyp}(bs)$ by a laborious but entirely routine downwards induction on the length of bs. The proof combines counting of explicit machine steps with ‘oracular’ appeals to the assumed behaviour of P as modelled by $\mathcal{T}(P)$ . Once $\mathrm{Hyp}({{{{{{[]}}}}}})$ is established, both halves of the theorem follow easily. Full details are given in Appendix C of Hillerström et al. (Reference Hillerström, Lindley and Longley2020).
The above formula can clearly be simplified for certain reasonable classes of predicates. For instance, suppose we fix some constant $c \in {{{{{{\mathbb{N}}}}}}}$ , and let $\mathcal{P}_{n,c}$ be the class of all nstandard predicates P for which all the edge times $\mathsf{steps}(\mathcal{T}(P))(bs)$ are bounded by c. (Many reasonable predicates will belong to $\mathcal{P}_{n,c}$ for some modest value of c: for instance, the membership test for some regular language $\mathcal{L} \subseteq \{0,1\}^*$ , or even for many languages defined by deterministic pushdown automata if conslists be added to our language.) Since the number of sequences bs in question is less than $2^{n+1}$ , we may read off from the above formula that for predicates in $\mathcal{P}_{n,c}$ , the runtime of effcount is ${{{{{{\mathcal{O}}}}}}}(c2^n)$ .
Alternatively, should we wish to use the finergrained cost model that assigns an $O(\log \gamma)$ runtime to each abstract machine step (see Section 4.3), we may note that any environment $\gamma$ arising in the computation contains at most n entries introduced by the letbindings in effcount, and (if $P \in\mathcal{P}_{n,c}$ ) at most ${{{{{{\mathcal{O}}}}}}}(cn)$ entries introduced by P. Thus, the time for each step in the computation remains ${{{{{{\mathcal{O}}}}}}}(\log c+ \log n)$ , and the total runtime for effcount is ${{{{{{\mathcal{O}}}}}}}(c 2^n (\log c + \log n))$ .
One might also ask about the execution time for an implementation of ${\lambda_{\textrm{h}}}$ that performs genuine copying of continuations, as in systems such as MLton (2020). As MLton copies the entire continuation (stack), whose size is ${{{{{{\mathcal{O}}}}}}}(n)$ , at each of the $2^n$ branches, continuation copying alone takes time ${{{{{{\mathcal{O}}}}}}}(n2^n)$ and the effectful implementation offers no performance benefit. More refined implementations (Farvardin & Reppy, Reference Farvardin and Reppy2020; Flatt & Dybvig, Reference Flatt and Dybvig2020) that are able to take advantage of delimited control operators or sharing in copies of the stack can bring the complexity of continuation copying back down to ${{{{{{\mathcal{O}}}}}}}(2^n)$ .
Finally, one might consider another dimension of cost, namely, the space used by effcount. Consider a class $\mathcal{Q}_{n,c,d}$ of nstandard predicates P for which the edge times in $\mathcal{T}(P)$ never exceed c and the sizes of pure continuations never exceed d. If we consider any $P \in \mathcal{Q}_{n,c,d}$ then the total number of environment entries is bounded by cn, taking up space ${{{{{{\mathcal{O}}}}}}}(cn(\log cn))$ . We must also account for the pure continuations. There are n of these, each taking at most d space. Thus, the total space is ${{{{{{\mathcal{O}}}}}}}(n(d + c(\log c + \log n)))$ .
6 Extensions and variations
Our efficient implementation method is robust under several variations. We outline here how the idea generalises beyond nstandard predicates and adapts from generic count to generic search. We also indicate how one may obtain the speedup in question in the presence of a typeandeffect system.
6.1 Beyond nstandard predicates
The nstandardness restriction on predicates serves to make the efficiency phenomenon stand out as clearly as possible. However, we can relax the restriction by tweaking effcount to handle repeated queries and missing queries. The tradeoff is that the analysis of effcount becomes more involved. The key to relaxing the nstandardness restriction is the use of state to keep track of which queries have been computed. We can give stateful implementations of effcount without changing its type signature by using parameterpassing (Kammar et al., Reference Kammar, Lindley and Oury2013; Pretnar, Reference Pretnar2015) to internalise state within a handler. Parameterpassing abstracts every handler clause such that the current state is supplied before the evaluation of a clause continues and the state is threaded through resumptions: a resumption becomes a twoargument curried function $r : B \to S \to D$ , where the first argument of type B is the return type of the operation and the second argument is the updated state of type S.
Repeated queries. We can generalise effcount to handle repeated queries by memoising previous answers. First, we generalise the type of Branch to carry an index of a query.
For fixed n, we assume a type of natural number to boolean maps, $\mathrm{Map}_n$ , with the following interface.
Invoking $\mathrm{lookup}~i~map$ returns $\mathbf{inl}~{\langle \rangle}$ if i is not present in map, and $\mathbf{inr}~ans$ if i is associated by map with the value $ans : \mathrm{Bool}$ . Allowing ourselves a few extra constanttime arithmetic operations, we can realise suitable maps in ${\lambda_{\textrm{b}}}$ such that the time complexity of $\mathrm{add}_n$ and $\mathrm{lookup}_n$ is ${{{{{{\mathcal{O}}}}}}}(\log n)$ (Okasaki, Reference Okasaki1999). We can then use parameterpassing to support repeated queries as follows.
The state parameter s memoises query results, thus avoiding doublecounting and enabling $\mathrm{effcount}'_n$ to work correctly for predicates performing the same query multiple times.
Missing queries. Similarly, we can use parameterpassing to support missing queries.
The parameter d tracks the depth and the returned result is scaled by $2^{nd}$ accounting for the unexplored part of the current subtree. This enables $\mathrm{effcount}''_n$ to operate correctly on predicates that inspect n points at most once. We leave it as an exercise for the reader to combine $\mathrm{effcount}'_n$ and $\mathrm{effcount}''_n$ to handle both repeated queries and missing queries.
6.2 From generic count to generic search
We can generalise the problem of generic counting to generic searching. The key difference is that a generic search procedure must materialise a list of solutions, thus its type is
where $\mathrm{List}_A$ is the type of conslists whose elements have type A. We modify effcount to return a list of solutions rather than the number of solutions by lifting each result into a singleton list and using list concatenation instead of addition to combine partial results $xs_\mathrm{true}$ and $xs_\mathrm{false}$ as follows.
The Branch operation is now parameterised by an index i. The handler is now parameterised by the current path as a point q, which is output at a leaf if it is in the predicate. A little care is required to ensure that $\text{effSearch}_\textit{n}$ has runtime ${{{{{{\mathcal{O}}}}}}}(2^n)$ ; naïve use of conslist concatenation would result in ${{{{{{\mathcal{O}}}}}}}(n2^n)$ runtime, as conslist concatenation is linear in its first operand. In place of conslists, we use Hughes lists (Hughes, Reference Hughes1986), which admit constant time concatenation: $\text{HList}_A \mathrel{\overset{{\mbox{def}}}{=}} \mathrm{List}_A \to \mathrm{List}_A$ . The empty Hughes list $\mathrm{nil} : \text{HList}_A$ is defined as the identity function: $\mathrm{nil} \mathrel{\overset{{\mbox{def}}}{=}} \lambda xs. xs$ .
We use the function $\text{toConsList}$ to convert the final Hughes list to a standard conslist. This conversion has linear time complexity (it just conses all of the elements of the list together).
6.3 Typeandeffect system
Many practical implementations of effect handlers come equipped with rich type systems that track which effectful operations any function may perform (Bauer & Pretnar, Reference Bauer and Pretnar2014; Hillerström & Lindley, Reference Hillerström and Lindley2016; Leijen, Reference Leijen2017; Biernacki et al., Reference Biernacki, Piróg, Polesiuk and Sieczkowski2019; Brachthäuser et al., Reference Brachthäuser, Schuster and Ostermann2020). One may wonder whether our result transfers to such a system as we make crucial use of the ability to inject an effectful operation into a computation, which a first glance might seem to require a change of (effect) types.
However, as we shall briefly outline, with sufficient polymorphism we need not change the effect types. Our generic count program does not perform any externally visible effects. Therefore, if we equip our simple type system with some form of rank2 effect polymorphism, then we do not morally require a change of types even in the presence of the richer types provided by effect tracking.
Suppose we track the effects on function types, e.g. $A \to B !\varepsilon$ denotes a function that accepts a value of type A as input and produces some value of type B as output using effects $\varepsilon$ . Here $\varepsilon$ is intended to be an effect variable which may be instantiated to name concrete effectful operations that the function may perform such as $\mathrm{Branch} : \mathrm{Unit} \to \mathrm{Bool}$ . We shall not concern ourselves with a particular effect type formalism here, but rather just note that there are many approaches to realising such an effect system, e.g. using row types (Hillerström & Lindley, Reference Hillerström and Lindley2016; Leijen, Reference Leijen2017), subtyping (Bauer & Pretnar, Reference Bauer and Pretnar2014), intersection types (Brachthäuser et al., Reference Brachthäuser, Schuster and Ostermann2020), etc.
We can give a fully effectparametric signature to generic count using rank2 effect polymorphism.
Here $\emptyset$ denotes that an application of Count does not perform any externally visible effects. The parameter type of Count is a rank2 effect type. It effectively hides the implementation detail of the provided point from the predicate. Thus, the implementation of Count is allowed to supply a point that performs any effectful operation granted that the implementation guarantees to handle any such operation. This idea of using rank2 polymorphism is an old idea which dates back at least to McCracken (Reference McCracken1984); it has been used in practice in Haskell as the primary means for state encapsulation since Launchbury & Jones (Reference Launchbury and Jones1994).
7 Generic count in weaker languages
We have shown that there is an implementation of generic count in ${\lambda_{\textrm{h}}}$ with a runtime bound of ${{{{{{\mathcal{O}}}}}}}(2^n)$ for certain wellbehaved predicates. Our eventual goal is to prove that such a runtime bound is unattainable in ${\lambda_{\textrm{b}}}$ (Section 8), or indeed in the stronger language ${{{{{{\lambda_{\textrm{a}}}}}}}}$ (Section 10). In this section, we provide some context for these results by surveying a range of possible approaches to generic counting in languages weaker than ${\lambda_{\textrm{h}}}$ , emphasising how the attainable efficiency varies according to the expressivity of the language. Since the purpose here is simply to situate our main results within a broader landscape which may itself call for further investigation, our discussion in this section will be informal and intuitive rather than mathematically rigorous.
7.1 Naïve count
The naïve approach, of course, is simply to apply the given predicate P to all $2^n$ possible npoints in turn, keeping a count of those on which P yields true. Of course, this approach could be readily implemented in ${\lambda_{\textrm{b}}}$ ; but it is also clear how it could be effected in an even weaker language, in which the recursion construct of ${\lambda_{\textrm{b}}}$ is replaced by a weaker iteration construct. (For a comparison of the power of iteration and recursion, see Longley, Reference Longley2019.) For instance, the following operator (definable in ${\lambda_{\textrm{b}}}$ ) allows one to achieve the effect of whileloops manipulating data of type A:
Let us write ${{{{{{\lambda_{\textrm{i}}}}}}}}$ for the sublanguage of ${\lambda_{\textrm{b}}}$ allowing $\mathrm{while}_A$ for each type A, but disallowing all uses of $\mathbf{rec}$ elsewhere. Then it is a straightforward coding exercise to write a ${{{{{{\lambda_{\textrm{i}}}}}}}}$ program
that implements generic counting using the naïve strategy.
The evaluation of an nstandard predicate on an individual npoint must clearly take time $\Omega(n)$ . It is therefore clear that in whatever way the naïve count strategy is implemented, the runtime on any nstandard predicate P must be $\Omega(n2^n)$ . If P is not nstandard, the $\Omega(n)$ bound on each point application need not apply, but we may still say that a naïve count for any predicate P (at level n) must take time $\Omega(2^n)$ .
One might at first suppose that these properties are inevitable for any implementation of generic count within ${\lambda_{\textrm{b}}}$ , or indeed any purely functional language: surely, the only way to learn something about the behaviour of P on every possible npoint is to apply P to each of these points in turn? It turns out, however, that the $\Omega(2^n)$ lower bound can sometimes be circumvented by implementations that cleverly exploit nesting of calls to P. In the next section, we illustrate the germ of this idea, and in Section 7.3, we show how it gives rise to a practically superior counting program within ${\lambda_{\textrm{b}}}$ .
7.2 The nesting trick
The germ of the idea may be illustrated even within ${{{{{{\lambda_{\textrm{i}}}}}}}}$ . Suppose that we first construct some program
which, given a predicate P, returns some npoint Q such that $P~Q$ evaluates to true, if such a point exists, and any point at all if no such point exists. (In other words, ${\mathrm{bestshot}}_n$ embodies Hilbert’s choice operator $\varepsilon$ on predicates.) It is once again routine to construct such a program by naïve means, and we may moreover assume that for any P, the evaluation of ${\mathrm{bestshot}}_n\;P$ takes only constant time, all the real work being deferred until the argument of type $\mathrm{Nat}_n$ is supplied.
Now consider the following program:
Here the term $pred~({\mathrm{bestshot}}_n~pred)$ serves to test whether there exists an npoint satisfying pred: if there is not, our count program may return 0 straightaway. It is thus clear that ${\mathrm{lazycount}}_n$ is a correct implementation of generic count, and also that if pred is the predicate $\lambda q.\mathrm{false}$ then ${\mathrm{lazycount}}_n\;pred$ returns 0 within ${{{{{{\mathcal{O}}}}}}}(1)$ time, thus violating the $\Omega(2^n)$ lower bound suggested above.
This might seem like a footling point, as ${\mathrm{lazycount}}_n$ offers this efficiency gain only on (certain implementations of) the constantly false predicate. However, it turns out that by a recursive application of this nesting trick, we may arrive at a generic count program in ${\lambda_{\textrm{b}}}$ that spectacularly defies the $\Omega(2^n)$ lower bound for an interesting class of (nonnstandard) predicates, and indeed proves quite viable for counting solutions to ‘nqueens’ and similar problems. In contrast to the naïve strategy, however, this approach relies crucially on the use of recursion and cannot be implemented in a language such as ${{{{{{\lambda_{\textrm{i}}}}}}}}$ with mere iteration.
We shall refer to this ${\lambda_{\textrm{b}}}$ program as ${\mathrm{Bergercount}}$ , as it is modelled largely on Berger’s PCF implementation of the socalled fan functional (Berger, Reference Berger1990; Longley & Normann, Reference Longley and Normann2015). We give an implementation of ${\mathrm{Bergercount}}$ in the next section.
7.3 Berger count
Berger’s original program (Berger, Reference Berger1990) introduced a remarkable search operator for predicates on infinite streams of booleans and has played an important role in higherorder computability theory (Longley & Normann, Reference Longley and Normann2015). What we wish to highlight here is that if one applies the algorithm to predicates on finite boolean vectors, the resulting program, though no longer interesting from a computability perspective, still holds some interest from a complexity standpoint: indeed, it yields what seems to be the best known implementation of generic count within a PCFstyle ‘functional’ language (provided one accepts the use of a primitive for callbyneed evaluation).
We give the gist of an adaptation of Berger’s search algorithm on finite spaces. The key ingredient of Berger’s search algorithm is the ${\mathrm{bestshot}}_n$ function, which given any nstandard predicate P returns a point satisfying P if one exists, or the dummy point $\lambda i.\mathrm{false}$ if not. Figure 7 depicts the implementation of this function. It is implemented by via two mutually recursive auxiliary functions whose workings are admittedly hard to elucidate in a few words. The function ${\mathrm{bestshot}}'_n$ is a generalisation of ${\mathrm{bestshot}}_n$ that makes a best shot at finding a point $\pi$ satisfying given predicate and matching some specified list start in some initial segment of its components $[\pi(0),\dots,\pi(i1)]$ . Here we use two customary operations on lists: we write $$ for the list length function and nth for the function which projects the nth element of a list. The ${\mathrm{bestshot}}'_n$ function works ‘lazily’, drawing its values from start wherever possible, and performing an actual search only when required. This actual search is undertaken by ${\mathrm{bestshot}}''_n$ , which proceeds by first searching for a solution that extends the specified list with true (using the standard list append function); but if no such solution is forthcoming, it settles for false as the next component of the point being constructed. The whole procedure relies on a subtle combination of laziness, recursion and implicit nesting of calls to the provided predicate which means that the search is selfpruning in regions of the binary tree where the predicate only demands some initial segment $q~0$ , $q~(i1)$ of its argument q.
The above program makes use of an operation
which transforms a given thunk into an equivalent ‘memoised’ version, i.e. one that caches its value after its first invocation and immediately returns this value on all subsequent invocations. Such an operation may readily be implemented with the help of local state, or alternatively may simply be added as a primitive in its own right. The latter has the advantage that it preserves the purely ‘functional’ character of the language, in the sense that every program is observationally equivalent to a ${\lambda_{\textrm{b}}}$ program, namely the one obtained by replacing memoise by the identity.
Figure 8 depicts an implementation that exploits the above idea to yield a generic count program (this development appears to be new). Again, ${\mathrm{Bergercount}}_n$ is implemented by means of two mutually recursive auxiliary functions. The function $\mathrm{count}'_n$ counts the solutions to the provided predicate pred that start with the specified list of booleans, adding their number to a previously accumulated total given by acc. The function $\mathrm{count}''_n$ does the same thing, but exploiting the knowledge that a best shot at the ‘leftmost’ solution to P within this subtree has already been computed. (We are visualising npoints as forming a binary tree with true to the left of false at each fork.) Thus, $\mathrm{count}''_n$ will not reexamine the portion of the subtree to the left of this candidate solution, but rather will start at this solution and work rightward.
This gives rise to an ncount program that can work efficiently on predicates that tend to ‘fail fast’: more specifically, predicates P that inspect the components of their argument q in order $q~0$ , $q~1$ , $q~2$ , and which are frequently able to return false after inspecting just a small number of these components. Generalising our program from binary to kary branching trees, we see that the nqueens problem provides a typical example: most points in the space can be seen not to be solutions by inspecting just the first few components. Our experimental results in Section 11 attest to the viability of this approach and its overwhelming superiority over the naïve functional method.
By contrast, the above program is not able to exploit parts of the tree where our predicate ‘succeeds fast’, i.e. returns true after seeing just a few components. Unlike the effectful count program of Section 5.4, which may sometimes add $2^{nd}$ to the count in a single step, the Berger approach can only count solutions one at a time. Thus, for an nstandard predicate P, the evaluation of ${\mathrm{Bergercount}}_n~P$ that returns a natural number k must take time $\Omega(k)$ .
7.4 Pruned count
To do better than ${\mathrm{Bergercount}}$ , it seems that we must ascend to a more powerful language. We now briefly outline another approach, using ideas from Longley (Reference Longley1999), which yields a more efficient form of pruned search in an extension of $\lambda_b$ with local state of ground type. Since local state can certainly be encoded using affine effect handlers with no essential loss of efficiency, this approach falls within the ambit of what can be achieved within the language ${{{{{{\lambda_{\textrm{a}}}}}}}}$ to be introduced in Section 9.
The key idea is that each time we apply a predicate to a point, we may use local state to detect which components of the point are actually inspected by the predicate. We do this using a thirdorder function Modulus, which encloses the point in a wrapper that logs all calls to the point, then passes this wrapped point to the predicate:
This is somewhat different from the modulus functional considered in Longley (Reference Longley1999), which returns a sorted list of the arguments to which the point is applied. The latter has the theoretically pleasant consequence that the modulus is an example of a sequentially realisable functional – its externally observable behaviour is purely functional (i.e. extensional) although the function it implements cannot be realised in pure $\lambda_b$ . However, this property is purchased at the cost of the extra work needed to return a sorted list and is of little relevance to our present concerns.
The essential point is that if $\mathrm{Modulus}~pred~point$ returns ${{{{{{\langle b,ilist \rangle}}}}}}$ , then we know immediately that $pred~point'$ would also return the value b for every point’ that agrees with point at the components listed in ilist. This property can be used as the basis of a program
that takes care, at each stage, to apply the predicate to some ‘new’ point at which the value is not already known on the basis of previous calls, and which then increments the accumulator by either 0 (if the predicate returns false) or the appropriate $2^{nd}$ (if it returns true). In contrast to ${\mathrm{Bergercount}}$ , this has the effect of pruning the search space both where the predicate fails fast and where it succeeds fast.
Of course, the ability to prune in the ‘true’ case makes no difference for search problems such as nqueens, where the predicate never returns true without inspecting all components. Even for searches of this kind, however, ${\mathrm{prunedcount}}$ performs significantly better in practice than ${\mathrm{Bergercount}}$ , which achieves its pruning of ‘false’ subtrees by much more convoluted means. (The difference is clearly manifested by the experimental results reported in Section 11.) Indeed, in the absence of advanced control features, we are not aware of any approach to generic counting that essentially does better than ${\mathrm{prunedcount}}$ .
It is clear, however, that in the case of nstandard predicates, which always inspect all n components of their points, no pruning at all is possible, and neither ${\mathrm{Bergercount}}$ nor ${\mathrm{prunedcount}}$ improves on the $\Omega(n2^n)$ runtime of ${\mathrm{naivecount}}$ .
8 A lower bound for ${\lambda_{\textrm{b}}}$
The above discussion strongly suggests that the ${{{{{{\mathcal{O}}}}}}}(2^n)$ runtime of our ${\lambda_{\textrm{h}}}$ generic count implementation is unattainable in ${\lambda_{\textrm{b}}}$ , but also points out the existence of phenomena in this area that defy intuition (Escardó, Reference Escardó2007 gives some striking further examples). In this section, we prove rigorously that any implementation of generic counting in ${\lambda_{\textrm{b}}}$ must have runtime $\Omega(n2^n)$ on certain nstandard predicates. In the following two sections, we shall apply a similar analysis to the richer language ${{{{{{\lambda_{\textrm{a}}}}}}}}$ . This mathematically robust characterisation of the efficiency gap between languages with and without firstclass control constructs is the central contribution of the paper.
One might ask at this point whether the claimed lower bound could not be obviated by means of some known continuation passing style (CPS) or monadic transform of effect handlers (Hillerström et al., Reference Hillerström, Lindley, Atkey and Sivaramakrishnan2017; Leijen, Reference Leijen2017; Hillerström et al., Reference Hillerström, Lindley and Longley2020). This can indeed be done, but only by dint of changing the type of our predicates P – which, as noted in the introduction, would defeat the purpose of our enquiry. Our intention is precisely to investigate the relative power of various languages for manipulating predicates that are given to us in a certain way which we do not have the luxury of choosing.
As a first step, we note that where lower bounds are concerned, it will suffice to work with the smallstep operational semantics of ${\lambda_{\textrm{b}}}$ rather than the more elaborate abstract machine model employed in Section 4.1. This is because, as observed in Section 4.1, there is a tight correspondence between these two execution models such that for the evaluation of any closed term, the number of abstract machine steps is always at least the number of smallstep reductions. Thus, if we are able to show that the number of smallstep reductions for any generic count program in ${\lambda_{\textrm{b}}}$ on the predicates of interest is $\Omega(n2^n)$ , this will establish the desired lower bound on the runtime.
To establish a formal contrast with ${\lambda_{\textrm{h}}}$ , it will in fact suffice to show a lower bound of $\Omega(n2^n)$ on the worstcase runtime for generic count programs in ${\lambda_{\textrm{b}}}$ . For this purpose, it is convenient to focus on a specialised class of predicate terms that will be easy to work with. We therefore declare that our intention is initially to analyse the runtime of any generic count program in ${\lambda_{\textrm{b}}}$ on any canonical nstandard predicate as in Definition 8. However, we shall subsequently remark that in fact the same lower bound applies to arbitrary nstandard predicates.
Let us suppose, then, that K is a program of ${\lambda_{\textrm{b}}}$ that correctly counts all canonical nstandard predicates of ${\lambda_{\textrm{b}}}$ for some specific n. We now establish a key lemma, which vindicates the naïve intuition that if P is a canonical nstandard predicate, the only way for K to discover the correct value for $\sharp [\![ P ]\!] $ is to perform $2^n$ separate applications $P\;Q$ (allowing for the possibility that these applications need not be performed ‘in turn’ but might be nested in some complex way).
Lemma 1 (No shortcuts). Suppose K correctly counts all canonical nstandard predicates of ${\lambda_{\textrm{b}}}$ . If P is a canonical nstandard predicate, then K applies P to at least $2^n$ distinct npoints. More formally, for any of the $2^n$ possible semantic npoints $\pi : {{{{{{\mathbb{N}}}}}}}_n \to {{{{{{\mathbb{B}}}}}}}$ , there is a term ${{{{{{\mathcal{E}}}}}}}[P~Q]$ appearing in the smallstep reduction of $K~P$ such that Q is an npoint and $[\![ Q ]\!] = \pi$ .
Proof. Suppose $\pi$ is some semantic npoint. Since P is canonical, we have $P = P(\tau)$ for some $\tau$ . Let l be the maximal path through $\tau$ associated with $\pi$ : that is, the one we construct by responding to each query $\mathord{?} k$ with $\pi(k)$ . Then l is a leaf node such that $\tau(l) = \mathord{!} (\tau \bullet \pi)$ . Let $\tau'$ be obtained from $\tau$ by simply negating this answer value at l, and take $P' = P(\tau')$ .
Since the numbers of trueleaves in $\tau$ and $\tau'$ differ by 1, it is clear that if K indeed correctly counts all canonical nstandard predicates, then the values returned by $K~P$ and $K~P'$ will have an absolute difference of 1. On the other hand, we shall argue that if the computation of $K~P$ never actually ‘visits’ the leaf l in question, then K will be unable to detect any difference between P and P’. The situation is reminiscent of Milner’s context lemma (Milner, Reference Milner1977), which loosely says that the only way to observe a difference between two programs is to apply them to some argument on which they differ.
Without loss of generality we shall assume $\tau(l) = \mathrm{true}$ and $\tau'(l) = \mathrm{false}$ . This means that for some term context $C[]:\mathrm{Bool}$ with a single occurrence of a hole of type Bool, we have $P \equiv \lambda q.\,C[\mathrm{true}]$ and $P' \equiv \lambda q.\,C[\mathrm{false}]$ .
Now consider the reduction sequence starting from $K~(\lambda q.\,C[])$ (treating the hole ‘ $$ ’ as an additional variable). This cannot be infinite, for then the reduction of $K~P$ would also be infinite, since valid reduction steps are closed under substituting true for ‘ $$ ’; thus K would not correctly count all canonical nstandard predicates. Neither can this reduction terminate in a numeral k, for then both $K~P$ and $K~P'$ would evaluate to k for a similar reason, whereas the correct results should differ by 1. Nor can it terminate in just the term ’ $$ ’, as this does not have the correct type. We conclude that the reduction of $K~(\lambda q.\,C[])$ gets stuck at some term with the hole in head position: more precisely, since ‘ $$ ’ formally has type ${{{{{{\langle + \rangle}}}}}} {\langle \rangle}$ , we see by inspection of the reduction rules that it must get stuck at some term ${{{{{{\mathcal{D}}}}}}}[\mathbf{case}\;\;\{\cdots\}]$ , where ${{{{{{\mathcal{D}}}}}}}$ is an evaluation context. We write this term as $D[]$ , where the $D[~]$ abstracts only this head occurrence of the hole (there may well be other occurrences of the hole within D). From the form of evaluation contexts, we know that this hole occurrence does not appear under a $\lambda$ binder.
We now trace back through the reduction $K~(\lambda q.\,C[]) {{{{{{\leadsto}}}}}}^\ast D[]$ looking at the ancestors of this occurrence of ‘ $$ ’, and identifying the last point in the reduction at which this ancestor occurs within a descendant of the original $\lambda q.\,C[]$ . Since $C[]$ has no free variables other than the hole occurrence, and the only rule for eliminating a $\lambda$ is SApp, it is clear that at this point we have a term ${{{{{{\mathcal{E}}}}}}}[(\lambda q.\,C[])Q]$ with ${{{{{{\mathcal{E}}}}}}}$ an evaluation context, $C[~]$ a context abstracting only this ancestor occurrence of ‘ $$ ’, and Q a closed term of type Point. This reduces in the next step to ${{{{{{\mathcal{E}}}}}}}[E[]]$ where $E[] \equiv C[][Q/q]$ .
We now claim that Q is an npoint and $[\![ Q ]\!] = \pi$ as required. For this, we appeal to the fact that $P \equiv \lambda q.\,C[\mathrm{true}]$ is canonical, so that $C[]$ is simply a complex of nested $\mathbf{if}$ expressions as in Definition 8, with a hole replacing the leaf literal at the position indicated by the path l. It follows that $E[]$ itself is a complex of nested $\mathbf{if}$ expressions with branch conditions Q(k) and with the hole at one of the leaves. It is now clear that the only way for this hole to become later exposed (as it is in $D[]$ ) is for each of the branch conditions Q(k) to evaluate to $\pi(k)$ , so that the evaluation indeed follows the path l and we have $E[] {{{{{{\leadsto}}}}}}^* $ . But because $\tau$ is nstandard, each of $Q(0),\ldots,Q(n1)$ occurs exactly once on this path, so the above is exactly the condition for Q to be an npoint with value $\pi$ .
Corollary 1. Suppose K and P are as in Lemma 1. For any semantic npoint $\pi$ and any natural number $k < n$ , the reduction sequence for $K~P$ contains a term ${\mathcal{F}}[Q~k]$ , where ${\mathcal{F}}$ is an evaluation context and $[\![ Q ]\!] =\pi$ .
Proof Suppose $\pi \in {{{{{{\mathbb{B}}}}}}}^n$ . By Lemma 1, the computation of $K~P$ contains some ${{{{{{\mathcal{E}}}}}}}[P~Q]$ where $[\![ Q ]\!] = \pi$ , and the above analysis of the computation of $P~Q$ shows that it contains a term ${{{{{{\mathcal{E}}}}}}}'[Q~k]$ for each $k < n$ . The corollary follows, taking ${\mathcal{F}}[] \mathrel{\overset{{\mbox{def}}}{=}} {{{{{{\mathcal{E}}}}}}}[{{{{{{\mathcal{E}}}}}}}'[]]$ .
This gives our desired lower bound. Since our npoints Q are values, it is clearly impossible that ${\mathcal{F}}[Q~k] = {\mathcal{F}}'[Q'~k']$ (where ${\mathcal{F}},{\mathcal{F}}'$ are evaluation contexts) unless $Q=Q'$ and $k=k'$ . We may therefore read off $\pi$ from ${\mathcal{F}}[Q~k]$ as $[\![ Q ]\!] $ . There are thus at least $n2^n$ distinct terms in the reduction sequence for $K~P$ , so the reduction has length $\geq n 2^n$ . We have thus proved:
Theorem 3. If K is a ${\lambda_{\textrm{b}}}$ program that correctly counts all canonical nstandard ${\lambda_{\textrm{b}}}$ predicates, and P is any canonical nstandard ${\lambda_{\textrm{b}}}$ predicate, then the evaluation of $K~P$ must take time $\Omega(n2^n)$ .
In Hillerström et al. (Reference Hillerström, Lindley and Longley2020), a more complex proof was given, modelled on traditional proofs of Milner’s context lemma. This established the slightly stronger conclusion that the evaluation of $K~P$ takes time $\Omega(n2^n)$ for all nstandard predicates P, not just the canonical ones (under the strengthened hypothesis that K correctly counts all nstandard ${\lambda_{\textrm{b}}}$ predicates).
It is worth noting where our argument breaks down if applied to ${\lambda_{\textrm{h}}}$ . In ${\lambda_{\textrm{b}}}$ , in the course of computing $K~P$ , every Q to which P is applied will be a selfcontained closed term denoting some specific point $\pi$ . This is intuitively why we may only learn about one point at a time. In ${\lambda_{\textrm{h}}}$ , this is not the case, because of the presence of operation symbols. For instance, our effcount program from Section 5.4 will apply P to the ‘generic point’ $\lambda\_.\,\mathbf{do}\;\mathrm{Branch}~\langle \rangle$ . Thus, it need no longer be the case that the reduction of each term $Q\,k$ yields a value: it may get stuck at some invocation of $\ell$ , so that control will then pass to the effect handler.
9 Affine effect handlers
Having established our $\Omega(n2^n)$ runtime bound for implementations of generic count in the relatively simple setting of ${\lambda_{\textrm{b}}}$ , we now wish to show that the same bound applies for a much richer language ${{{{{{\lambda_{\textrm{a}}}}}}}}$ supporting affine effect handlers: intuitively those in which each resumption r may be invoked at most once. This will show that the multiple invocation of r within our effcount program is essential to its efficiency, and will formally locate the fundamental efficiency gap as occurring between ${{{{{{\lambda_{\textrm{a}}}}}}}}$ and ${\lambda_{\textrm{h}}}$ . Since affine effect handlers suffice for encoding many language features such as exceptions (Pretnar, Reference Pretnar2015), local state (Plotkin & Pretnar, Reference Plotkin and Pretnar2009), coroutines (Kawahara & Kameyama, Reference Kawahara and Kameyama2020), and singleshot continuations (Sivaramakrishnan et al., Reference Sivaramakrishnan, Dolan, White, Kelly, Jaffer and Madhavapeddy2021), this will come close to showing that the speedup we have discussed is unattainable in real languages such as Standard ML, Java, and Python (for some appropriate class of predicate terms).
In this section, we present the definition of our language ${{{{{{\lambda_{\textrm{a}}}}}}}}$ , outlining its relationship to ${\lambda_{\textrm{h}}}$ and ${\lambda_{\textrm{b}}}$ . In the following section, we will prove some key properties of evaluation in this language and use these to establish a version of Theorem 3 for ${{{{{{\lambda_{\textrm{a}}}}}}}}$ .
Our language ${{{{{{\lambda_{\textrm{a}}}}}}}}$ will be essentially a sublanguage of ${\lambda_{\textrm{h}}}$ in which the relevant restriction on the use of resumption variables is enforced by means of an affine type system in the tradition of linear logic. Many approaches are possible here, for instance: Girard’s ntuitionistic linear logic ILL (Girard, Reference Girard1987), Barber’s dual intuitionistic linear logic DILL (Barber, Reference Barber1996), and Benton’s adjoint calculus (Benton, Reference Benton1994). We choose to work with a variant of finegrain callbyvalue based on DILL; an advantage over vanilla ILL is that it readily admits a local encoding of our intuitionistic base calculus.
9.1 ${{{{{{\lambda_{\textrm{a}}}}}}}}$ as a dual intuitionisticaffine calculus
We present the type system of ${{{{{{\lambda_{\textrm{a}}}}}}}}$ in terms of dualcontext judgements $\Delta; \Gamma \vdash \square : A$ , stating that a term $\square$ (which may be a value term V or a computation term M) has type A under intuitionistic type environment $\Delta$ and affine type environment $\Gamma$ . Informally, variables in the intuitionistic environment may be used zero, one or many times within $\square$ , while those in the affine environment may be used at most once.
As before, environments are lists assigning types to variables. For hygiene, we suppose we have disjoint lexical categories of intuitionistic and affine variables (each ranged over by metavariables x,y), and the variables within each of the environments $\Delta,\Gamma$ are required to be distinct.
The syntax of ${{{{{{\lambda_{\textrm{a}}}}}}}}$ is as follows:
The type constructors $\multimap$ , $\otimes$ , $\oplus$ , and $!$ are borrowed from linear logic. Here we informally understand Nat, Unit, $A \multimap B$ , $A \times B$ , and $A \oplus B$ as types of values that may be used at most once, and $!A$ as a type of values of type A that may be used as many times as desired. (The way DILL manifests the latter is to allow a value of $!A$ to be used at most once by binding it to an intuitionistic variable of type A which can subsequently be used as many times as desired. Thus, technically $!A$ is affine just like all other types, but it provides access to an unlimited source of identical affine values of type A.)
The typing rules those shown in Figure 9, along with Exchange, Weakening and Contraction for the intuitionistic environment, and Exchange and Weakening (but not Contraction) for the affine one. To understand the workings of this type system, it is helpful to think of the affine arrow $\multimap$ and the affine environment $\Gamma$ as playing the primary role: for instance, lambda abstraction is supported for affine variables but not intuitionistic ones. The sole purpose of the intuitionistic environment is to allow for multiple uses of values of $!$ type: the rule TLLetBang allows such a value to be bound to an intuitionistic variable. Note too that values of $!$ type are formed via the TLBang rule, which allows a value $W : A$ to be ‘promoted’ to a reusable value $!W: !A$ if all free variables that went into the making of W are themselves reusable (we here write $!\Gamma$ to mean that every type in $\Gamma$ is of the form $!A$ for some A). Similarly, TLBangComp allows promotion of computations. This latter form introduces some minor complications and is not strictly speaking necessary, but we include it nonetheless in order to admit a straightforward translation of letbinding in the inclusion ${{{{{{\lambda_{\textrm{b}}}}}}}}\hookrightarrow {{{{{{\lambda_{\textrm{a}}}}}}}}$ outlined at the end of this section.
The crucial restrictions in the rule for handlers are that the operation argument p and the resumption variable r are now affine. Notice that the success clause may involve affine variables as it will be invoked at most once (this idea will be substantiated in Section 10 below). By contrast, the operation clauses cannot involve affine variables as they may be invoked multiple times.
There is also a small subtlety with $\mathbf{rec}$ . The function argument f is bound in the intuitionistic type environment, allowing f to be used many times within M. The operational intention is that f can be unfolded to $\lambda x.M$ as often as necessary; for this reason, it is required that M involves no affine variables other than x.
All of the above syntactic forms are shared with ${\lambda_{\textrm{h}}}$ , with the exception of $!W$ , $!M$ , and $\mathbf{let} !\;x=V\;\mathbf{in}\;N$ . To give a smallstep operational semantics for ${{{{{{\lambda_{\textrm{a}}}}}}}}$ , we may therefore take all the operational rules for ${\lambda_{\textrm{h}}}$ as given in Section 3 (along with the machinery of evaluation contexts and handler contexts), together with the new rules
and additional evaluation and handler contexts for promoted computations:
As usual, we take ${{{{{{\leadsto}}}}}}^*$ to be the transitive closure of ${{{{{{\leadsto}}}}}}$ , and define the notions of ancestor and descendant in the evident way. This completes our definition of ${{{{{{\lambda_{\textrm{a}}}}}}}}$ .
The notion of normal form may now be defined just as in Definition 1. Once again, the following is straightforward to verify:
Theorem 4 (Type Soundness for ${{{{{{\lambda_{\textrm{a}}}}}}}}$ ). If $ \vdash M : C$ , then either there exists $ \vdash N : C$ such that $M {{{{{{\leadsto}}}}}}^\ast N$ and N is normal with respect to $\Sigma$ , or M diverges.
9.2 Relationship to ${\lambda_{\textrm{h}}}$ and ${\lambda_{\textrm{b}}}$
We now outline how we intend to view ${{{{{{\lambda_{\textrm{a}}}}}}}}$ as a sublanguage of ${\lambda_{\textrm{h}}}$ , and ${\lambda_{\textrm{b}}}$ as a sublanguage of ${{{{{{\lambda_{\textrm{a}}}}}}}}$ . A brief sketch will suffice here, as these translations will only play a minor role in what follows and are mentioned here mainly for the sake of orientation.
For the inclusion ${{{{{{\lambda_{\textrm{a}}}}}}}} \hookrightarrow {{{{{{\lambda_{\textrm{h}}}}}}}}$ , the broad idea is simply that any welltyped term of ${{{{{{\lambda_{\textrm{a}}}}}}}}$ will certainly remain welltyped if all affineness restrictions on variables are waived. Formally, we may define a translation that erases the intuitionistic/affine distinction completely. The translation on types may be defined by
The translation on terms is given in the obvious way for the syntactic forms common to ${{{{{{\lambda_{\textrm{a}}}}}}}}$ and ${\lambda_{\textrm{h}}}$ , the three new forms being treated by
A typing judgement $\Delta;\Gamma \vdash \square:A$ of ${{{{{{\lambda_{\textrm{a}}}}}}}}$ then becomes a judgement , where $\Delta,\Gamma$ is the result of rolling $\Delta$ and $\Gamma$ into a single environment, and is the result of applying to the types of all its variables.
Under this translation, it is easy to check that every derivable typing judgement in ${{{{{{\lambda_{\textrm{a}}}}}}}}$ yields one in ${\lambda_{\textrm{h}}}$ , and also that if $M {{{{{{\leadsto}}}}}} M'$ in ${{{{{{\lambda_{\textrm{a}}}}}}}}$ then in ${\lambda_{\textrm{h}}}$ .
For the inclusion ${{{{{{\lambda_{\textrm{b}}}}}}}} \hookrightarrow {{{{{{\lambda_{\textrm{a}}}}}}}}$ , we give a translation $()^\star$ inspired by the familiar Girard translation from intuitionistic types to linear ones, wrapping each subformula of a type by a ‘ $!$ ’ except for the return types of functions. The translation on types is as follows.
A type environment $\Gamma$ of ${\lambda_{\textrm{b}}}$ translated to the environment $\Gamma^\star ; \cdot$ of ${{{{{{\lambda_{\textrm{a}}}}}}}}$ : that is, all variables are treated as intuitionistic. The translation of value and computation terms therefore needs to eliminate ‘ $!$ ’ types at bindings in favour of intuitionistic variables. We give here a selection of the clauses for the translation on terms; for all syntactic forms not covered here, the translation is defined homomorphically on term structure in the obvious way.
Once again, it is routine to check that every derivable typing judgement in ${\lambda_{\textrm{b}}}$ yields one in ${{{{{{\lambda_{\textrm{a}}}}}}}}$ , and that if $M{{{{{{\leadsto}}}}}} M'$ in ${\lambda_{\textrm{b}}}$ then $M^\star {{{{{{\leadsto}}}}}}^\ast {M'}^\star$ in ${{{{{{\lambda_{\textrm{a}}}}}}}}$ .
10 Affine effect computations and generic count
We begin with some general machinery for managing resumptions and for tracking the evaluation of subterms through reductions, allowing for the ‘threadswitching’ behaviour that ${{{{{{\lambda_{\textrm{a}}}}}}}}$ supports. We expect that this machinery will be quite widely applicable to any kind of reasoning about the behaviour of effectful programs in ${{{{{{\lambda_{\textrm{a}}}}}}}}$ . In Section 10.3, we apply this machinery to the specific scenario of the generic count problem.
Throughout this section, ‘subterm’ will always mean ‘subterm occurrence’.
10.1 Tracking of resumptions
To make the role of resumptions more explicit, it will be convenient to recast the smallstep operational semantics for ${{{{{{\lambda_{\textrm{a}}}}}}}}$ slightly, presenting it as a reduction system for pairs ${{{{{{\langle M \mid \Xi \rangle}}}}}}$ , where M is a term and $\Xi$ is a resumption environment, mapping finitely many resumption variables $\hat{r}$ to terms of the form $\lambda y.\,\mathbf{handle} \; {{{{{{\mathcal{E}}}}}}}[\mathbf{return} \; y] \; \mathbf{with} \; H$ . Note that these terms may themselves involve other resumption variables.
The only reduction rules in which $\Xi$ plays an active role are the following. We write $\Xi \backslash \hat{r}$ for $\Xi$ with the entry for $\hat{r}$ deleted.
All other reduction rules are carried over from the original semantics in the obvious way: for each reduction rule $M {{{{{{\leadsto}}}}}} M'$ except for SOp, we now have a rule ${{{{{{\langle M \mid \Xi \rangle}}}}}} {{{{{{\leadsto}}}}}}{{{{{{\langle M' \mid \Xi \rangle}}}}}}$ . To initiate a reduction sequence for a closed term M, we start from the configuration ${{{{{{\langle M \mid \emptyset \rangle}}}}}}$ .
The main purpose of this semantics is to make explicit the points at which resumptions are invoked (as the points at which SRes is applied). In the original semantics, such steps appear simply as $\beta$ reductions, which may not be distinguishable, on the face of it, from other $\beta$ reductions that occur.
It is intuitively clear that the reduction of ${{{{{{\langle M \mid \emptyset \rangle}}}}}}$ under the new semantics proceeds in lockstep with the reduction of M under the original semantics. One half of this is formalised by the following proposition (we write ${{{{{{\leadsto}}}}}}^m$ for reduction in exactly m steps).
Proposition 2. For any m, if ${{{{{{\langle M \mid \emptyset \rangle}}}}}} {{{{{{\leadsto}}}}}}^m{{{{{{\langle M' \mid \Xi \rangle}}}}}}$ then $M {{{{{{\leadsto}}}}}}^m M''$ , where M” is obtained from M’ by repeatedly expanding all resumption variables as specified by $\Xi$ until no longer possible.
Proof Easy induction on m. Note that in the case of $\text{SOp}'$ , the number of rounds of expansion needed may increase by 1.
The converse to the above proposition — that if $M {{{{{{\leadsto}}}}}}^m M''$ then ${{{{{{\langle M \mid \emptyset \rangle}}}}}} {{{{{{\leadsto}}}}}}^m {{{{{{\langle M' \mid \Xi \rangle}}}}}}$ for some $M',\Xi$ — is not quite clear at this point, because of the worry that an application of SRes might be blocked because the relevant $\hat{r}$ is not present in $dom\,\Xi$ , having been deleted by an earlier application of SRes. We shall see shortly, however, that such blocking never happens, so that our two semantics do indeed work perfectly in lockstep.
Let us say a configuration ${{{{{{\langle M' \mid \Xi \rangle}}}}}}$ is naturally arising if it appears in the course of reduction of ${{{{{{\langle M \mid \emptyset \rangle}}}}}}$ for some closed M.
In the typing rules for ${{{{{{\lambda_{\textrm{a}}}}}}}}$ , the critical typing restriction is that in the handler clauses $\ell\,p\;r \mapsto N_\ell$ , the variable r is used affinely within $N_\ell$ . This does not mean that r can occur at most once within $N_\ell$ (in view of the TLCase rule); and even if it does, the variable r may subsequently be copied in the course of a $\beta$ reduction (again because of TLCase). However, the affineness restriction does buy us the following crucial property:
Lemma 2 (Singleshot resumptions). For any naturally arising ${{{{{{\langle M \mid \Xi \rangle}}}}}}$ and any $\hat{r} \in dom(\Xi)$ , the reduction sequence starting from ${{{{{{\langle M \mid \Xi \rangle}}}}}}$ contains at most one instance of SRes for the variable $\hat{r}$ .
Proof Since ${{{{{{\langle M \mid \Xi \rangle}}}}}}$ is naturally arising, we have ${{{{{{\langle M_0 \mid \emptyset \rangle}}}}}} {{{{{{\leadsto}}}}}}^\ast {{{{{{\langle M \mid \Xi \rangle}}}}}}$ for some $M_0$ , and we may as well assume that ${{{{{{\langle M \mid \Xi \rangle}}}}}}$ appears as early as possible in this reduction, i.e. at the point where $\hat{r}$ is introduced, so that M has the form $N[V/p,\,\hat{r}/r]$ as in the $\text{SOp}'$ rule.
We first claim that in this situation, $M,\Xi$ satisfy the following two conditions, writing $\hat{r}_0,\ldots,\hat{r}_{k1}$ for the elements of $dom(\Xi)$ .

1. $\hat{r}$ appears in at most one of the $k+1$ terms $M,\Xi(\hat{r}_0),\ldots,\Xi(\hat{r}_{k1})$ .

2. If N is one of these terms and $\hat{r}$ appears in N, then no two occurrences of $\hat{r}$ within N share the same set of enclosing $\mathbf{case}$ clauses. (A $\mathbf{case}$ clause is a subphrase $\mathbf{inl}\;x \mapsto P$ or $\mathbf{inr}\;y \mapsto Q$ within a $\mathbf{case}$ expression.)
Condition 1 holds because $\hat{r}$ is fresh and so does not appear in any of the $\Xi(\hat{r}_i)$ . Condition 2 is a general property of occurrences of affine variables within terms: an inspection of the typing rules shows that the TLCase rule is the only possible source of multiple occurrences of $\hat{r}$ , and it is clear that if we know the set of enclosing $\mathbf{case}$ clauses then the occurrence is uniquely determined.
Next, we claim that Conditions 1 and 2 above are maintained as invariants by all the reduction rules of our new semantics. Since Condition 2 is a general property of affine variables, and our reduction rules are easily seen to respect the type system, the preservation of this condition is automatic, so it will suffice to show that Condition 1 is preserved. We reason by cases on the possible forms for a reduction ${{{{{{\langle M \mid \Xi \rangle}}}}}} {{{{{{\leadsto}}}}}} {{{{{{\langle M' \mid \Xi' \rangle}}}}}}$ .

For $\text{{SOp}}'$ (applied within some handler context ${{{{{{\mathcal{H}}}}}}}$ and introducing a fresh $\hat{r}'$ ): Suppose $\hat{r}$ appears within the relevant subterm $\mathbf{handle} \; {{{{{{\mathcal{E}}}}}}}[\mathbf{do} \; \ell \, V] \; \mathbf{with} \; H$ (the situation for occurrences of $\hat{r}$ elsewhere is straightforward). Since this subterm is in evaluation position, it is not within a $\mathbf{case}$ clause, so by Conditions 1 and 2 for ${{{{{{\langle M \mid \Xi \rangle}}}}}}$ , there are no other occurrences of $\hat{r}$ elsewhere, and all occurrences are within just one of ${{{{{{\mathcal{E}}}}}}}[~], V, H$ . If they are within V, then because of the affineness of p within N, $\hat{r}$ may appear within the resulting term $N[V/p, \hat{r}'/r]$ , but will not appear in the new $\Xi'(\hat{r}')$ or elsewhere. If within ${{{{{{\mathcal{E}}}}}}}[~]$ or H, the occurrences of $\hat{r}$ will all be moved to $\Xi'(\hat{r}')$ , and there will be none elsewhere.

For SRes (applied to some subterm $\hat{r}'W$ within some ${{{{{{\mathcal{H}}}}}}}$ ): If $\hat{r}$ occurs within W, it may appear within the resulting $R[W/y]$ , but not elsewhere. If $\hat{r}$ occurs within $\Xi(\hat{r}')$ (i.e. within R), then it does not appear elsewhere in ${{{{{{\langle M \mid \Xi \rangle}}}}}}$ . So after the application of SRes and the deletion of $\hat{r}'$ from $\Xi$ , $\hat{r}$ may appear in $R[W/y]$ but nowhere else.

For the rules carried over from the original semantics (applied within some ${{{{{{\mathcal{H}}}}}}}$ ), the preservation of Condition 1 is immediate, since $\Xi$ is unchanged.
To complete the proof, suppose that within the reduction sequence from some naturally arising ${{{{{{\langle M \mid \Xi \rangle}}}}}}$ we have an application of SRes for a given variable $\hat{r}$ : that is, we have some configuration ${{{{{{\langle {{{{{{\mathcal{H}}}}}}}[\hat{r}W] \mid \Xi' \rangle}}}}}}$ . Since this satisfies Conditions 1 and 2, we see that the highlighted occurrence of $\hat{r}$ is its only appearance within ${{{{{{\mathcal{H}}}}}}}[\hat{r}W]$ or the range of $\Xi'$ , and it follows that $\hat{r}$ does not appear at all within the resulting configuration ${{{{{{\langle {{{{{{\mathcal{H}}}}}}}[R[W/y]] \mid \Xi' \backslash \hat{r} \rangle}}}}}}$ . There is therefore no danger of a later instance of SRes for $\hat{r}$ .
We can now lay to rest the worry mentioned earlier:
Proposition 3.

(i) For any naturally arising ${{{{{{\langle M \mid \Xi \rangle}}}}}}$ , all free variables appearing within M or any $\Xi(\hat{r})$ are contained in $dom\;\Xi$ .

(ii) If $M {{{{{{\leadsto}}}}}}^m M''$ under the original rules, then ${{{{{{\langle M \mid \emptyset \rangle}}}}}} {{{{{{\leadsto}}}}}}^m {{{{{{\langle M' \mid \Xi \rangle}}}}}}$ for some $M',\Xi$ .
Proof

(i) The property in question clearly holds for initial configurations ${{{{{{\langle M \mid \emptyset \rangle}}}}}}$ with M closed, and it is easy to see that it is preserved by all reduction steps, given that SRes completely expunges the variable $\hat{r}$ as established within the proof of Lemma 2.

(ii) From (i) we know that an application of SRes will never be blocked by the failure of a lookup $\Xi(\hat{r})$ fails. A reduction $M {{{{{{\leadsto}}}}}}^m M''$ can therefore be lifted to one ${{{{{{\langle M \mid \emptyset \rangle}}}}}} {{{{{{\leadsto}}}}}}^m {{{{{{\langle M' \mid \Xi \rangle}}}}}}$ where $M',\Xi,M''$ are related as in Proposition 2, by induction on m and an easy comparison between the two reduction systems.
Lemma 2 is the crucial property of ${{{{{{\lambda_{\textrm{a}}}}}}}}$ on which our whole argument hinges. This property is flagrantly violated by ${\lambda_{\textrm{h}}}$ , as illustrated by effcount with its essential use of multishot resumptions. Our next task is to show how, in view of Lemma 2, the evaluation of a given subterm may be tracked in a sequential way through a reduction sequence.
10.2 Tracking of active subterms
We shall say a subterm S of M is active if it occurs in an evaluation position, i.e. $M = {{{{{{\mathcal{H}}}}}}}[S]$ for some handler context ${{{{{{\mathcal{H}}}}}}}$ . We introduce the following concepts for tracking the evaluation of S through the reduction of M with respect to some resumption context $\Xi$ .
Clearly, if ${{{{{{\langle M \mid \Xi \rangle}}}}}}$ is naturally arising and $M = {{{{{{\mathcal{H}}}}}}}[S]$ , any reduction sequence ${{{{{{\langle S \mid \Xi \rangle}}}}}} {{{{{{\leadsto}}}}}}^\ast {{{{{{\langle S' \mid \Xi' \rangle}}}}}}$ will yield a reduction sequence
We then say the occurrence of S’ highlighted by ${{{{{{\mathcal{H}}}}}}}[S']$ is an active reduct of the original occurrence of S highlighted by ${{{{{{\mathcal{H}}}}}}}[S]$ .
In this situation, there are four possibilities:

1. The reduction of ${{{{{{\langle S \mid \Xi \rangle}}}}}}$ may continue forever.

2. The reduction may terminate in some ${{{{{{\langle \mathbf{return}\;V \mid \Xi' \rangle}}}}}}$ where V is a value.

3. The reduction of ${{{{{{\langle S \mid \Xi \rangle}}}}}}$ may get stuck at some configuration ${{{{{{\langle {{{{{{\mathcal{E}}}}}}}[\mathbf{do}\;\ell\,V] \mid \Xi' \rangle}}}}}}$ where the $\mathbf{do}\;\ell\,V$ is not handled anywhere within ${{{{{{\mathcal{H}}}}}}}[{{{{{{\mathcal{E}}}}}}}[\mathbf{do}\;\ell\,V]]$ — in this situation, we say the entire computation is absolutely blocked.

4. The reduction may get stuck at some ${{{{{{\langle {{{{{{\mathcal{E}}}}}}}[\mathbf{do}\;\ell\,V] \mid \Xi' \rangle}}}}}}$ , where the $\mathbf{do}\;\ell\,V$ is not handled within ${{{{{{\mathcal{E}}}}}}}[\mathbf{do}\;\ell\,V]$ itself, but is handled further out within ${{{{{{\mathcal{H}}}}}}}[]$ .
In case 4, ${{{{{{\mathcal{H}}}}}}}[]$ will have the form ${{{{{{\mathcal{H}}}}}}}'[\mathbf{handle} \; {{{{{{\mathcal{F}}}}}}}[] \; \mathbf{with} \; H]$ where ${{{{{{\mathcal{F}}}}}}}$ is an evaluation context, and the $\text{{SOp}}'$ rule will then apply to ${{{{{{\langle \mathbf{handle} \; {{{{{{\mathcal{F}}}}}}}[{{{{{{\mathcal{E}}}}}}}[\mathbf{do}\;\ell\,V]] \;\mathbf{with}\; H \mid \Xi' \rangle}}}}}}$ . This will result in a new resumption environment entry
and we may call the subterm ${{{{{{\mathcal{E}}}}}}}[\mathbf{return} \; y]$ here a dormant reduct of the original S.
As the reduction of the original ${{{{{{\langle M \mid \Xi \rangle}}}}}}$ continues, this environment entry will remain unaffected until, if ever, $\hat{r}$ is activated by SRes (and by Lemma 2, this will happen at most once). This activation step will have the form
where ${{{{{{\mathcal{H}}}}}}}_1'[\mathbf{handle}\;{{{{{{\mathcal{F}}}}}}}[]\;\mathbf{with}\;H]$ is itself a handler context, which we shall write as ${{{{{{\mathcal{H}}}}}}}_1[]$ for compatibility with our earlier convention. So writing $S_1$ for ${{{{{{\mathcal{E}}}}}}}[\mathbf{return}\;W]$ , we have arrived at
and we shall again designate this occurrence of $S_1$ as an active reduct of the original S.
For the purpose of tracking the fate of the original subterm S, it will also be convenient to say that ${{{{{{\langle S \mid \Xi \rangle}}}}}}$ gives rise in this context to a pseudoreduction sequence
in which all steps but the last are genuine reductions, but the step flagged by ${{{{{{\leadsto}}}}}}^!$ is considered as a ‘pseudoreduction’ (note that this step has a seemingly ‘nondeterministic’ character in that it depends crucially on information from outside ${{{{{{\langle {{{{{{\mathcal{E}}}}}}}[\mathbf{do}\;\ell\,V] \mid \Xi' \rangle}}}}}}$ ). The point is simply to have a way of saying how the evaluation of ${{{{{{\mathcal{E}}}}}}}[\mathbf{do}\;\ell\,V]$ continues after being temporarily suspended by a switch to another thread of control.
We may now repeat exactly the same procedure starting from ${{{{{{\langle {{{{{{\mathcal{H}}}}}}}_1[S_1] \mid \Xi_1 \rangle}}}}}}$ , potentially yielding further (active and dormant) reducts of the original S:
In this way, we obtain an extended pseudoreduction sequence for S, consisting of ordinary reduction sequences interspersed with pseudoreductions of the above kind, jumping straight from some ${{{{{{\langle {{{{{{\mathcal{E}}}}}}}_i[\mathbf{do}\;\ell_i\,V_i] \mid \Xi_i \rangle}}}}}}$ to the corresponding ${{{{{{\langle {{{{{{\mathcal{E}}}}}}}_i[\mathbf{return}\;W_i] \mid \Xi_{i+1} \rangle}}}}}}$ .
This pseudoreduction sequence may continue forever, or it may be absolutely blocked, or it may end with a dormant reduct in a resumption environment entry that is never subsequently activated, or it may terminate in some ${{{{{{\langle \mathbf{return}\;V \mid \Xi'_i \rangle}}}}}}$ where V is a value. In the last case, we say the evaluation of the original S completes (in the context of ${{{{{{\langle {{{{{{\mathcal{H}}}}}}}[S] \mid \Xi \rangle}}}}}}$ ).
It is thanks to Lemma 2 that the evaluation behaviour of S may be represented in this way by a single linear reduction sequence rather than by a branching tree. The notion of pseudoreduction sequence thus allows us to reason about subterm evaluations much as in the familiar setting, rendering the threadswitching machinery largely transparent, its only trace being in the ‘nondeterministic’ character of the pseudoreduction steps.
It is also clear that the notions of ancestor and descendant make sense for subterms appearing within pseudoreduction sequences, providing one considers configurations ${{{{{{\langle S \mid \Xi \rangle}}}}}}$ as a whole: a subterm within the main term may have descendants within the resumption environment, and vice versa. For a pseudoreduction step ${{{{{{\langle S \mid \Xi \rangle}}}}}} {{{{{{\leadsto}}}}}}^! {{{{{{\langle S' \mid \Xi' \rangle}}}}}}$ , we say a subterm of the righthand side is a descendant of one on the left iff it is a descendant with respect to the genuine reduction sequence that witnesses this pseudostep.
10.3 Application to generic count
We now apply the above notions to the analysis of generic counting in ${{{{{{\lambda_{\textrm{a}}}}}}}}$ , obtaining a lower bound analogous to that of Theorem 3 for ${\lambda_{\textrm{b}}}$ . Proceeding as in Section 8, we fix $n \in {{{{{{\mathbb{N}}}}}}}$ , and suppose that K is some program of ${{{{{{\lambda_{\textrm{a}}}}}}}}$ that correctly counts all canonical nstandard predicates P (noting that all such predicates are actually terms from our base language ${\lambda_{\textrm{b}}}$ ). Once again, focusing on this restricted class of predicates will greatly simplify our task, while still giving all we need for a worstcase lower bound.
We recall here that we are thinking of ${\lambda_{\textrm{b}}}$ as included in ${{{{{{\lambda_{\textrm{a}}}}}}}}$ via the intuitionistic encoding $()^\star$ defined in Section 9. Since our intention is that our lower bound for ${{{{{{\lambda_{\textrm{a}}}}}}}}$ should generalise the one for ${\lambda_{\textrm{b}}}$ , this means that the types Point and Predicate appear within ${{{{{{\lambda_{\textrm{a}}}}}}}}$ as
Formally, then, we will be considering the reduction behaviour of ${{{{{{\langle K\,(!P) \mid \emptyset \rangle}}}}}}$ , where $P:\mathrm{Predicate}$ is the $\star$ translation of a canonical nstandard predicate, K is a generic count program of ${{{{{{\lambda_{\textrm{a}}}}}}}}$ assumed to count all such predicates correctly. (We may assume without loss of generality that K is a closed term.) By hypothesis, this reduction will terminate in some ${{{{{{\langle \mathbf{return}\;k \mid \Xi_{end} \rangle}}}}}}$ where k is a numeral.
By an application of P, we shall mean an occurrence of a term $P\,Q$ in evaluation position in some reduct of ${{{{{{\langle K\,(!P) \mid \emptyset \rangle}}}}}}$ (so that