1 Introduction
1.1. What predicativism, and why?
In [Reference Shapiro44] the basic historic problem of the research in foundations of mathematics (FOM) is formulated as follows:
How to reconstruct mathematics on a secure basis, one maximally immune to rational doubts.
The predicativist program [Reference Crosilla, Jaeger and Sieg11, Reference Crosilla, Mainzer, Schuster and Schwichtenberg12, Reference Feferman and Shapiro24, Reference Weaver49, Reference Weaver51] has been one of the attempts to solve this basic problem of FOM. It seeks to establish certainty in mathematics without revolutionizing it or changing its underlying classical logic (as the intuitionistic program does). The program was initiated by Poincaré [Reference Poincaré36–Reference Poincaré39]. Its viability was demonstrated by Weyl, who seriously developed it for the first time in his famous small book “Das Kontinuum” [Reference Weyl52, Reference Weyl, Pollard and Bole54]. Weyl, and then Feferman [Reference Feferman22, Reference Feferman25], have shown that a very large part of classical analysis can be developed within their predicative systems. Feferman further argued that predicative mathematics in fact suffices for developing all the mathematics that is actually indispensable to presentday natural sciences. Hence the predicativist program has been successful in solving the basic problem of FOM. (In my opinion it is the only one about which this can truly be said.)
Poincaré’s predicativism started as a reaction to the settheoretical paradoxes. However, in the writings of both Poincaré and Weyl, predicativity derives not so much from the need to avoid paradoxes, but from their definitionist view that infinite objects, such as sets or functions, exist only in so far as they are introduced through legitimate definitions:
“No one can describe an infinite set other than by indicating properties which are characteristic of the elements of the set. And no one can establish a correspondence among infinitely many things without indicating a rule, i.e., a relation, which connects the corresponding objects with one another. The notion that an infinite set is a ‘gathering’ brought together by infinitely many individual arbitrary acts of selection, assembled and then surveyed as a whole by consciousness, is nonsensical.” [Reference Weyl, Pollard and Bole54, p. 23]
The implications of the above principle concerning infinite objects depend of course in a crucial way on the question: What definitions should be accepted as ‘legitimate’? Therefore it is no wonder that ‘predicativism’ (like ‘constructivism’) becomes a name of a group of approaches to mathematics and its foundations [Reference Crosilla, Mainzer, Schuster and Schwichtenberg12, Reference Feferman and Shapiro24].Footnote ^{1} We emphasize that in this paper we reserve this name solely to the program as it was initiated by Poincaré and pursued by Weyl. That program is known nowadays as ‘predicativity given the natural numbers’, since in addition to the definitionist principle mentioned above, it also accepts the collection $\mathcal {N}$ of the natural numbers as a well understood mathematical concept that constitutes a set. Moreover, it views the idea of iterating an operation or a relation a finite number of times as fundamental, and accepts induction on the natural numbers as a universally valid method of proof.Footnote ^{2} Still, even with this restriction, the word ‘predicative’ has two different interpretations, corresponding to Poincaré’s “two distinct diagnoses of the source of the paradoxes” ([Reference Feferman and Shapiro24]; see also [Reference Crosilla, Jaeger and Sieg11]). We call them ‘Russell’s predicativity’ and ‘Poincaré–Weyl predicativity’. This paper is devoted to the second one. However, since it is the first which is usually identified with predicativism, we discuss it first.
1.2 Russell’s predicativity
Adopting the analysis indicated in Richard’s paper [Reference Richard40], Poincaré’s first diagnosis was that the definition of Richard’s paradoxical real number is circular: it uses the totality of all definitions, to which it already belongs. The corresponding Vicious Circle Principle, VCP, was adopted by Russel in [Reference Russell41] and in Principia Mathematica [Reference Whitehead and Russell55]. According to the latter, a vicious circle arises when we assume that “a collection of objects may contain members which can only be defined in terms of the collection itself”.Footnote ^{3} A clearer (and stronger) formulation of the VCP has been given by Kreisel in [Reference Kreisel30]:
“A predicative definition of a set (say, of natural numbers) is required to use quantification only over ‘previously defined’ totalities; the set of natural numbers themselves is supposed to be given, or the notion of ‘finite’ is supposed to be welldefined.”
Kreisel then went on and note that
“The traditional way of making the idea of a predicative definition explicit is by introducing a ramified hierarchy.”
The idea of ramified hierarchy was introduced and used by Russell in Principia Mathematica. Later it was generalized by Wang [Reference Wang48] and Kreisel. In the secondorder context the generalization is explained in [Reference Feferman and Shapiro24] as follows:
“The basic step in that hierarchy consists in passing from a collection D of subsets of $\mathcal {N}$ to a new collection $D^\star $ by putting a set S in $D^\star $ just in case there is a formula $\varphi (x)$ of secondorder arithmetic such that for all n, $n\in S\leftrightarrow (\varphi (n))^D,$ where the superscript ‘D’ indicates that all secondorder quantifiers in $\varphi $ are relativized to range over D. Then we can define the collections $R_\alpha $ for arbitrary ordinals $\alpha $ by $R_0=\emptyset , R_{\alpha +1}=(R_\alpha )^\star , \mbox { and for limit } \alpha , R_\alpha =\bigcup _{\beta <\alpha }R_\beta $ .”
This description raises the question: What ordinals $\alpha $ can serve for the purpose of constructing this ramified hierarchy of $R_{\alpha }$ s? To answer this, Kreisel proposed in [Reference Kreisel29] an autonomous process, where a wellordering becomes available at some stage only if it has been defined and recognized (as a wellordering of $\omega $ ) at an earlier stage. Without the ‘recognition’ criterion, which introduces prooftheoretic considerations, we are left with a purely semantic condition that allows to go up to every wellordering $<\omega _1^{CK}$ (Church–Kleene’s first nonrecursive ordinal). With the recognition condition, Feferman [Reference Feferman15] and Schütte [Reference Schütte, Crossley and Dummett42] independently replaced $\omega _1^{CK}$ by the much smaller $\Gamma _0$ (the Feferman–Schütte ordinal). Following their work, the hierarchies of formal systems up to $\Gamma _0$ which were developed by them (on the basis of the intuitive semantics of the $R_{\alpha }$ s) have become the “canonical reference: one considers predicative any formal system which can be reduced to a system in that hierarchy” [Reference Crosilla, Jaeger and Sieg11]. Accordingly, $\Gamma _0$ is almost universally accepted as the ‘ordinal of predicativity’. An example of the implications of this is given in [Reference Crosilla, Oliveri, Ternullo and Boscolo13]: “The fact that the prooftheoretic strength of theories of inductive definitions exceeds the strength of the whole ramified hierarchy is taken as clear indication that generalized inductive definitions involve impredicativity.”
Up to now, the only mathematician to reject the ‘ $\Gamma _0$ thesis’ has been Weaver, who forcefully attacked this thesis in [Reference Weaver50]. Unfortunately, his (in my opinion quite justified) criticism of the various justifications given by Feferman for this thesis has been almost totally ignored by the logical community. Nevertheless, as a true predicativist (which is what I am taking myself to be), it is clear to me that the identification of predicativity with the ramified systems of Feferman and Schütte cannot be correct. A first, very simple, problem with it is that no predicativist (and for that matter—no mathematician) thinks in terms of ramified systems. Moreover, Feferman admits that “ramified theories are unsuitable as a framework for the development of analysis” [Reference Feferman and Shapiro24]. Another problem, that was repeatedly noted by Feferman himself, is that the general notions of wellordering and ordinal on which they are based are not predicatively acceptable.Footnote ^{4} Third, and most important: just the description and understanding of the ramified hierarchy rely on principles that are not included in the theories in [Reference Feferman15] and [Reference Schütte43] which are based on this hierarchy. This is rather clear in case $\alpha $ is a limit ordinal: $R_{\alpha }$ is actually defined in this case as $\bigcup \{R(x)\mid x\in \{\beta \mid \beta <\alpha \}\}$ . Hence it is based on accepting at least some instances of ZF’s axioms of union and replacement. However, if we let $R_0=\mathcal {N}$ (which is the predicativist natural starting point, rather then $R_0=\emptyset $ ), then similar problems exist even if we do not use transfinite ordinals, but only the natural numbers (as Russell did). Thus already in constructing elements of $R_2$ we allow quantification over $R_1$ . This should mean that $R_1$ is taken as a “complete totality”. But $R_1$ is not obtained using just “quantification only over previously defined totalities”, and it is actually unclear what $R_1$ is at the first place. Usually it is identified with the collection (set?) of arithmetical subsets of $\mathcal {N}$ . If so, then each element of $R_1$ is indeed obtained using just quantification only over $\mathcal {N}$ . But this does not mean that so is $R_1$ itself. (For example, if we identify ordinals with von Neumann’s ordinals, then $\Gamma _0$ is not predicatively definable according to the $\Gamma _0$ thesis, even though each element of $\Gamma _0$ is.) In fact, the only reasonable definition of the collection of arithmetical sets I can think of is the following:
where True is the set of true sentences in the firstorder language of PA, and $predicate(n)$ means that n is a formula with exactly one free variable. (As usual, we identify here a formula with its Gödel number.) However, this definition relies on the availability of the set True, which is not arithmetical. It follows that if $R_1$ is the collection of arithmetical sets then it can only be defined using a set that at best belongs to some $R_i$ such that $i>1$ , and so is defined in terms of $R_1$ . This is obviously circular. To construct the ramified hierarchy we have therefore either to accept in addition to $\mathcal {N}$ infinitely many other sets as ‘given’ to us, or to realize that some hidden predicatively acceptable principles are involved already in the passage from $\mathcal {N}$ to $R_1$ . The first option completely goes against the central ideas of Poincaré and Weyl. So we are left with the second.
Note 1.1. A close inspection shows that in general, the passage from D to $D^\star $ implicitly involves accepting two principles:

• One may use the modeltheoretic operation that associates with any formula $\varphi (x)$ of secondorder arithmetic the set $\{n\in N\mid N\models (\varphi (n))^D\}$ .

• One may take as predicatively valid the instance of the replacement axiom that allows us to construct the image of this modeltheoretic operation.
Note 1.2. There are some other indications that the ‘ $\Gamma _0$ thesis’ might be wrong.

• A system T which is prooftheoretically reducible to $\bigcup _{\alpha <\Gamma _0} R_{\alpha }$ is called in the literature ‘locally predicative’. About such T Feferman wrote in [Reference Feferman and Shapiro24]:
Though the system T as a whole may not be justifiable predicatively, each theorem $\varphi $ of T rests on predicative grounds, at least indirectly. In practice, more can be said: T is conservative over the autonomous ramified progression for arithmetic sentences, i.e., if $\varphi $ is arithmetical and provable in T then it is provable in that progression. For secondorder T this can often be strengthened to conservativity for $\Pi _1^1$ sentences.

• In the conclusion of [Reference Pohlers, Kahle and Rathjen35], Pohlers made the following observation:
The studies in the present paper lead to the suspicion that the role of $\Gamma _0$ as the limit of predicativity might not depend on deep philosophical but rather on technical reasons.
1.3 Poincaré and Weyl’s view of predicativity as invariance
As noted above, a notion of predicativity which is quite different from the Russellian one was introduced by Poincaré in [Reference Poincaré38]. This was particularly emphasized in [Reference Crosilla, Jaeger and Sieg11]:
“For Poincare impredicative definitions are problematic as they treat as completed infinite classes which are instead openended or incomplete by their very nature. Predicative definitions, instead, guarantee that the classes so defined are stable and invariant.”
In Poincaré own words:
“Hence a distinction between two species of classifications, which are applicable to the elements of infinite collections: the predicative classifications, which cannot be disordered by the introduction of new elements; the non predicative classifications, which are forced to remain without end by the introduction of new elements.”
This view of predicativity underlies Weyl’s great work in [Reference Weyl52]. A careful reading (see [Reference Adams and Luo2, Reference Avron6]) of this book and of related papers of Weyl on the subject shows that predicativity as invariance is based in his work on the following principles:Footnote ^{5}

1. Sets are ‘produced’ genetically, that is, from applying legitimate operations to sets which are accepted as basic, or had previously been produced.
“If we imagine, as is appropriate for an intuitive understanding, that the relations and corresponding sets are ‘produced’ genetically, then this production will …occur in merely parallel individual acts (so to speak).” [Reference Weyl, Pollard and Bole54, p. 40]

2. Accordingly, the elements of a set are logically prior to that set.

3. Sets are extensional, and the identity of a set is fully determined by the identity of its elements—sets that have the same elements are identical.

4. There is no single, complete intended universe V of sets. The ‘universe of sets’ is created in stages, and is always open and growing. To each stage corresponds what Weyl called a sphere of operation (i.e., a definite universe of sets equipped with some (finite) collection of predicates and operations) in which terms and formulas take values.
“Thus, contrary to Cantor’s proposal, no universal scale of infinite ordinal and cardinal numbers applicable to every sphere of operation can exist.” [Reference Weyl, Pollard and Bole54, p. 24]
“The numbers can (in any sphere of operation) be used to determine the cardinality of sets.” [Reference Weyl, Pollard and Bole54, p. 55]

5. The current sphere of operation may be expanded in the future, e.g., by introducing new legitimate methods of defining sets, which in turn might produce new sets. The truth values of formulas may then be changed.
“If we regard the principles of definition as an ‘open’ system, i.e., if we reserve the right to extend them when necessary by making additions, then in general the question of whether a given function is continuous must also remain open (though we may attempt to resolve any delimited question). For a function which, within our current system, is continuous can lose this property if our principles of definition are expanded and, accordingly, the real numbers ‘presently’ available are joined by others.” [Reference Weyl, Pollard and Bole54, p. 87]

6. Values of terms and truthvalues of formulas are always evaluated with respect to some definite sphere of operation—never with respect to the whole open ‘universe of sets’. Hence classical logic is accepted as valid. For example, $\neg \varphi \vee \varphi $ is valid in any sphere of operation, even though the truth value of $\varphi $ may change when the current sphere of operation is expanded.

7. In contrast, the identity of already existing objects (and so the value of terms) should remain the same even if the current sphere of operation is expanded. Accordingly, a definition of an object is legitimate (or ‘predicative’) if, and only if, the identity of the object it defines is invariant under extension. (Because of this principle, there are certain constraints in Weyl’s system on the use of quantification in definitions. However, there are no constrains in that system on the use of quantifiers in building formulas.) Weyl called an object so defined “a definite, selfexistent object”. Similarly, a definition of an operation is legitimate if the results of its applications depend only on the identity of the arguments, but not on the specific sphere of operation in which the application is made.

8. Any theory we develop should be true not only in the current sphere of operation, but in any future one. Hence our current theories impose constraints on future spheres of operation. Accordingly, expanding our spheres of operation and extending our theories are done simultaneously. Moreover:
“Our principles for the formation of derived relations can be formulated as axioms concerning sets and functions; and, in fact, mathematics will proceed in such a way that it draws the logical consequences of these axioms.” [Reference Weyl, Pollard and Bole54, p. 44]

9. The predicates of elementhood ( $\in $ ) and equality (=) are basic.

10. Using ramification in definitions, or classifying sets according to ‘levels’, should be avoided.
“The temptation to pass beyond the first level of construction must be resisted; instead, one should try to make the range of constructible relations as wide as possible by enlarging the stock of basic operations.” [Reference Weyl53]
It should be emphasized that according to the last quotation, Principle 10 is not due just to the inconvenience which is caused by using ramified systems. There is also a direct conflict between the Russellian approach and Weyl’s approach. The former is implicitly based on the view that there are no predicatively legitimate methods of defining subsets of $\mathcal {N}$ beyond those that are allowed in the construction of the hierarchy of $R_{\alpha }$ s. This view entails that the union of the $R_{\alpha }$ s includes all the predicatively acceptable subsets of $\mathcal {N}$ . In contrast, already the collection of firstlevel subsets of $\mathcal {N}$ is an open collection according to Weyl’s view. Hence it may always be extended “by enlarging the stock of basic operations”. Weyl indeed did just that when he added the very important operation of iteration to this stock. As discovered by Feferman in [Reference Feferman, Cellucci and Sambin20], this operation makes it possible to define nonarithmetical subsets of $\mathcal {N}$ . Hence Weyl recognized as firstlevel subsets of $\mathcal {N}$ sets that do not belong to what is called above $R_1$ !Footnote ^{6}
Note 1.3. Actually, the invariance criterion has been used by Feferman too in some of his unramified systems. Thus, IR, the first of the two unramified secondorder locally predicative systems which were given in [Reference Feferman15], uses the Hyperarithmetic Comprehension Rule $\Delta _1^1$ CR. This is justified in [Reference Feferman and Shapiro24] as follows:
“The motivation for $\Delta _1^1$ CR is the recognizable absoluteness (or invariance) of provably $\Delta _1^1$ definitions, in the following sense. At each stage one has recognized certain closure conditions on the ‘open’ universe of sets, and the definitions $D(x)$ of sets introduced at the next stage should be independent of what further closure conditions may be accepted. In the words of Poincaré, the definitions used of objects in an incomplete totality should not be “disturbed by the introduction of new elements.” Thus if U represents a universe of sets (subsets of $\mathcal {N}$ ) satisfying given closure conditions and is extended to $S^\prime $ (satisfying the same closure conditions and possibly further ones) one wants D to be provably invariant or absolute in the sense that $(\forall x)[D^U(x)\leftrightarrow D^{U^\prime }(x)]$ . This requirement is easily seen to hold for provably $\Delta _1^1$ formulas D.”
This passage contains a rather accurate description of the invariance criterion. However, it provides no explanation how it is connected with the Russellian approach, on which Feferman’s ramified systems are based.
1.4 Predicative set theories: Why and how
1.4.1 Why
Feferman’s systems (and to a lesser degree also Weyl’s system in [Reference Weyl52], as formalized in [Reference Avron6]) have one big drawback: they are practically inaccessible to the majority of the mathematical community. We believe that the major reason for this is that those systems do not use the framework of axiomatic set theory, which is almost universally accepted as the basic framework that provides the foundations of mathematics. What is more: Feferman’s systems are by far more complicated in comparison to impredicative axiomatic set theories like ZF, which are currently used for developing the whole of present day mathematics.
Another reason to prefer the settheoretical framework is that some of its principles are anyway underlying the constructions on which the secondorder ramified systems of Feferman and Schütte are based. Thus we have seen in Section 1.2 that the construction of the hierarchy of $R_\alpha $ s uses instances of the union and replacement axioms of $\mathbf {ZF}$ . But in what cases is such a use predicatively justified? It seems to me that developing predicative set theories is the only way to answer such questions. What is more, the notion of ordinal, which is crucial for the ramified systems but is also very problematic in their context, is not problematic at all in the settheoretical one (assuming that the $\in $ relation is wellfoundedFootnote ^{7} ). In the latter, one can use the notion of von Neumann ordinals, and those are defined by a simple, absolute formula.
Finally, it is worth noting that predicativism was born as a reaction to the settheoretical paradoxes, and was intended to provide a satisfactory solution to them. So (at least in my opinion) it should be most natural to develop predicative mathematics in the framework in which it has started.
Note 1.4. Locally predicative (in the sense of being prooftheoretically reducible to the systems of [Reference Feferman15]) set theories have already been introduced by Feferman in [Reference Feferman16, Reference Feferman, Scott and Jech18]. We shall say more about them in Section 4.4.
1.4.2 How
In the case of pure set theories, the main principle of the predicativity as invariance view of Poincaré and Weyl can be expressed as follows: a set exists if and only if can be determined by an invariant definition. Accordingly, the main two features of the system $\mathbf {PW}$ which is developed in the sequel are:

(I) Any existence claim which is made in one of those axioms of $\mathbf {PW}$ whose purpose is to allow the expansion of the sphere of operation is actually an existence and uniqueness claim. In other words, positive occurrences of the quantifier $\exists $ in such an axiom are in the form $\exists !$ . (This, of course, rules out the axiom of choice, as well as the axiom of $\Delta _0$ collection of Kripke–Platek set theory as given in [Reference Barwise9].)

(II) Following Principles 1 and 7 in Section 1.3, any definition of a set which is made in $\mathbf {PW}$ is invariant. This is ensured by employing a syntactically defined invariance relation $\succ $ between formulas and finite sets of variables. The intended meaning of “ $\varphi (x_1,\ldots \!,\! x_n,y_1,\ldots \!,\! y_k)\succ \{x_1,\ldots \!,\! x_n\}$ ” is: “The identity of $\{\langle x_1,\ldots \!,\! x_n \rangle \mid \varphi \}$ is invariant: it depends only on the values assigned to $y_1,\ldots \!,\! y_k$ , but not on the surrounding universe”.Footnote ^{8}
Note 1.5. Principle (I) is not applicable to general validities like logically valid formulas or instances of $\in $ induction. A trivial example is given by $\forall x\neg \varphi \vee \exists x\varphi $ , where $\varphi $ is arbitrary. In general, theorems of $\mathbf {PW}$ of this sort cannot be used for introducing new sets, or for providing absolute identification of existing ones.
The following other important features of $\mathbf {PW}$ also directly correspond to the principles of Weyl and Poincaré that were described in Section 1.3:

1. Like in ZF, and unlike in the systems of Weyl and Feferman, our system has a single type (or ‘category’, in Weyl’s terminology) of objects: sets. (This partially corresponds to Principle 10 in Section 1.3.)

2. Like in most of Feferman’s systems, $\mathbf {PW}$ is practically not really a single theory, but involves many theories, all of them firstorder. In each stage of working within it, we do have a single theory T, but we have two options how to proceed: We may simply derive new theorems in T, but we may also move to a strictly stronger theory $\mathbf {T}^\star $ in an expanded language.Footnote ^{9} (This feature of $\mathbf {PW}$ implements Principles 4, 5, and 8 in Section 1.3.)

3. The logic of all theories in $\mathbf {PW}$ is classical logic (Principle 6 in Section 1.3).

4. The initial language of the system includes just two predicate symbols: $=$ and $\in $ (Principle 9 in Section 1.3) and a constant $\omega $ for the set of natural numbers (taken to be the finite von Neumann ordinals). The inclusion of the latter is actually not essential, but it reflects well the central place that the natural numbers have in the predicativism of Poincaré and Weyl.

5. The following axiom and axiom schemas are included in all theories in $\mathbf {PW}$ :

• The axiom of extensionality (Principle 3 in Section 1.3).

• Comprehension for invariant formulas (Principle 7 in Section 1.3).

• $\in $ induction (which implements the vague Principle 2 in Section 1.3).Footnote ^{10}


6. Our main method of extending a given predicative set theory T to a stronger predicative set theory $\mathbf {T}^\star $ is by adding a new symbol to the signature of T, together with an axiom that defines it. (In addition, we include of course in $\mathbf {T}^\star $ all the instances in the extended language of the axiom schemas of T.) Such an extension is done by applying one of the syntactic methods that $\mathbf {PW}$ provides for this purpose.

7. As usual, extending T by an operation symbol is allowed only if T proves some corresponding existence and uniqueness conditions. Still, the extension is usually not conservative.

8. Adding an nary predicate symbol P is allowed only if its defining axiom implies its absoluteness. Similarly, adding an nary operation symbol F is allowed only if its defining axiom implies that the formula $y=F(x_1,\ldots ,x_n)$ is invariant with respect to y in case $y,x_1,\ldots ,x_n$ are distinct.
Note 1.6. In designing $\mathbf {PW}$ we have adopted two additional principles:

• Platonists should be able to accept any theory in our framework. In particular, every such theory is actually a subsystem of $\mathbf {ZF}(P)$ (ZF without the powerset axiom). This principle immediately rules out, e.g., Axiom VIII (Enumerability) of $\mathbf {PS_1E}$ from [Reference Feferman, Scott and Jech18] and the Axiom of Countability of $\mathbf {ATR_0^{Set}}$ from [Reference Simpson46] (which says that every set is countable).Footnote ^{11}

• Every rule or axiom of $\mathbf {PW}$ is a very close counterpart of some rule or axiom that was used by Feferman in one of his predicative (or locally predicative) systems. Hence $\mathbf {PW}$ should be accepted as predicative by anyone who accepts those systems of Feferman’s as predicative.
Note 1.7. Among the axioms of ZFC, $\mathbf {PW}$ completely rejects the axiom of powerset and the axiom of choice, and it restricts the axiom schema of separation to the case in which the separating condition is absolute. It also accepts only special cases of the axiom schema of replacement. In our opinion, these are properties that should be shared by any predicative set theory.
Note 1.8. At least in my opinion, $\mathbf {PW}$ is simpler and easier to work in than any of Feferman’s systems (see Section 4.4). Moreover, we believe that as long as we confine ourselves to $\mathbf {ZF}(P)$ (i.e., neither the powerset axiom nor the axiom of choice is allowed to be used), working in $\mathbf {PW}$ exactly reflects the way work in set theory is done in reality. In particular, one can use abstraction terms as well introduce new operation and predicate symbols in the usual way. Only when there is a worry whether what is done is predicatively justified, a need arises to check whether certain syntactic conditions are satisfied. In most cases this only involves checking whether a certain formula is bounded or is a $\Sigma $ formula in the usual sense of this concept (see Note 4.7). However, there is one important case (the comprehension schema) in which one should check for some formula $\varphi $ whether $\varphi \succ \{x_1,\ldots ,x_n\}$ or not (where $x_1,\ldots ,x_n$ are variables). This is also rather easy once one gets used to the syntactic definition of $\succ $ , since that definition closely follows the logical structure of $\varphi $ . In fact, what is involved is just a short mechanical check, which is usually much easier than proving that the existence of a certain set logically follows from the standard comprehension axioms of $\mathbf {ZF}(P)$ (or full $\mathbf {ZF}$ ).
1.5 The structure of the paper
Section 2 explains our notations and terminology. Section 3 establishes the predicativity of the set of natural numbers, using a rather weak subsystem of $\mathbf {PW}$ . $\mathbf {PW}$ itself is precisely defined, justified, and compared with Feferman’s Systems in Section 4. Section 5 includes important examples of the power of $\mathbf {PW}$ . Section 6 develops in $\mathbf {PW}$ the fundamentals of the theory of von Neumann ordinals. Section 7 includes the main results of this paper: that $\mathbf {PW}$ provides terms which define $\Gamma _0$ and much bigger ordinals, and that it can prove the main properties of those ordinals. We conclude in Section 8 with some remarks and directions for further research.
2 Terminology and notations
2.1 The difference between operations and functions
In standard textbooks on firstorder theory, it is common to refer to the symbols in a signature of a firstorder language as ‘relation symbols’ and ‘function symbols’. We cannot use this terminology here, since we reserve the words ‘relation’ and ‘function’ to their official meaning in set theory, that is, to sets of pairs satisfying certain conditions. Instead, we use the name ‘predicate’ for any “relation” that is not a set (like the predicates $\in $ or $=$ ), and we use the name ‘operation’ for any “function” that is not a set (like the operation of union on sets or the operation of addition on ordinals).Footnote ^{12} Accordingly, the symbols of a firstorder signature are divided in this paper into predicate symbols, operation symbols, and constants. The latter may actually be viewed as operations with arity 0, except that they should always be interpreted as sets.
2.2 Notations
We assume that every firstorder language for a set theory has infinitely many variables for sets, officially taken here to be $v_0,v_1,\ldots $ . We use letters (small or capital) from the end of the Latin alphabet to vary over these variables, and letters from the beginning of that alphabet as general variables for sets in the metalanguage. As usual, $i,j,k,l,m,n$ are used as special variables for natural numbers (in both the metalanguage and the formal language). t and s serve as variables (in the metalanguage) for terms, and $\varphi ,\psi ,\theta $ (and sometimes also A and B) as variables for formulas. In all cases the variables may be decorated with subscripts or superscripts. We denote by $Fv(\varphi )$ ( $Fv(t)$ ) the set of free variables of $\varphi $ (of t). When we denote a formula by $\varphi (x_1,\ldots ,x_n)$ it means that $\{x_1,\ldots ,x_n\}\subseteq Fv(\varphi )$ . On the other hand, when we write $\varphi (\vec {y},x)$ it means that $\vec {y}=\langle y_1,\ldots ,y_n \rangle $ for some n (whose identity may be obtained from the context or it does not matter); the variables $x,y_1,\ldots ,y_n$ are all distinct from each other; and $Fv(\varphi )=\{x,y_1,\ldots ,y_n\}$ .
The substitution of a term t for a free variable y in a formula $\varphi $ (a term s) is denoted by $\varphi \{t/y\}$ ( $s\{t/y\}$ ). However, when we denote a formula by $\varphi (y)$ ( $\varphi (x,y)$ , $\varphi (\vec {x},y)$ ) we might simply write $\varphi (t)$ ( $\varphi (x,t)$ , $\varphi (\vec {x},t)$ ) instead.
Given a firstorder signature $\sigma $ , we take a structure for $\sigma $ to be a pair $\mathcal {M}=\langle \mathcal {D},I \rangle $ , where $\mathcal {D}\neq \emptyset $ is the domain of $\mathcal {M}$ and I is its interpretation function. If r is one of the symbols in $\sigma $ we shall usually write $r^I$ instead of $I(r)$ . If $\nu $ is an assignment of elements of $\mathcal {D}$ to variables of the language, $x_1,\ldots ,x_n$ are n distinct variables, and $\vec {a}\in \mathcal {D}^n$ , we denote by $\nu \{\vec {x}:=\vec {a}\}$ the assignment which is obtained from $\nu $ by assigning $a_i$ to $x_i$ ( $i=1,\ldots ,n$ ). We denote by $\nu _{\mathcal {M}}[t]$ the element of $\mathcal {D}$ that $\nu $ assigns according to I to the term t of $\sigma $ . Similarly, if f is an operation symbol of $\sigma $ then we use in the metalanguage square brackets to denote applications of $f^I$ to arguments in $\mathcal {D}$ . Thus, if f is nary, and $\nu $ is an assignment in $\mathcal {D}$ , then $\nu _{\mathcal {M}}[f(t_1,\ldots ,t_n)]=f^I[\nu _{\mathcal {M}}[t_1],\ldots ,\nu _{\mathcal {M}}[t_n]]$ . We write $\mathcal {M},\nu \models \varphi $ in case $\nu $ satisfies in $\mathcal {M}$ the formula $\varphi $ of $\sigma $ . If $Fv(\varphi )=\{x_1,\ldots ,x_n\}$ , and $\nu [x_i]=a_i$ ( $i=1,\ldots ,n$ ) then we might write instead $\mathcal {M}\models \varphi (a_1,\ldots ,a_n)$ .
Finally, when we refer in the metalanguage to the collection of things that satisfy a certain condition C we shall denote it by $[x\mid C(x)]$ , reserving the notation $\{x\!\mid \!\varphi (x)\}$ for being used in our formal system. Moreover, in case there is a danger of confusion, we shall use ‘:’ in the metalanguage instead of ‘ $\in $ ’. (Recall that the latter is a basic symbol of the language of our system.)
3 The predicativity of the natural numbers
Recall that all of our systems are based on classical firstorder logic with identity.
Our system $\mathbf {PW}$ includes a constant $\omega $ for the natural numbers. Before presenting $\mathbf {PW}$ in the next section, we should justify this inclusion on the basis of the invariance criterion. We do that by:

1. Providing a bounded formula $N(x)$ in the language of set theory that defines when x is a natural number (i.e., a finite von Neumann ordinal).

2. Present a basic predicative set theory $\mathbf {VBS}$ , in which one can show that $N(x)$ is adequate for the task. This is done by proving in it all the properties that one expects from a formula that defines the natural numbers.

3. Give an intuitive proof in the metalanguage that the formula $N(x)$ is invariant, and so it may be used for defining a new set.
We start by presenting $\mathbf {VBS}$ .Footnote ^{13} The axioms of this system include the Extensionality axiom [Ext] and the $\in $ induction axiom schema [ $\in $ ind] from Section 4.1.6, as well as the following four elementary instances of the general predicative comprehension scheme:

1. Empty Set ([Em]):
$$ \begin{align*}\exists Z\forall x(x\in Z\leftrightarrow x\neq x).\end{align*} $$ 
2. Pairing ([Pa]):
$$ \begin{align*}\forall x\forall y\forall\exists Z\forall w(w\in Z\leftrightarrow w=x\vee w=y).\end{align*} $$ 
3. Union ([U]):
$$ \begin{align*}\forall x\exists Z\forall w(w\in Z\leftrightarrow\exists y(y\in x\wedge w\in y)).\end{align*} $$ 
4. Difference ([D]):
$$ \begin{align*}\forall x\forall y\exists Z \forall w(w\in Z\leftrightarrow w\in x\wedge w\not\in y).\end{align*} $$
Note 3.1. It is easy to see that the structure $\langle \mathsf {HF},\in \rangle $ , where HF (which is identical to $V_{\omega }$ ) is the set of the hereditarily finite sets, forms the minimal model of $\mathbf {VBS}$ . Moreover, $\langle V_\alpha ,\in \rangle $ is a model of $\mathbf {VBS}$ whenever $\alpha $ is a limit ordinal.Footnote ^{14}
In order to present the formula $N(x)$ it is convenient (though not really necessary) to use the usual procedure of extension by definitions, and develop $\mathbf {VBS}$ in an enriched language in which the four axioms above are replaced by:Footnote ^{15}

1. [Em]: $\forall x(x\not \in \emptyset ).$

2. [Pa]: $\forall x\forall y\forall w(w\in \{x,y\}\leftrightarrow w=x\vee w=y).$

3. [U]: $\forall x\forall w(w\in \bigcup x\leftrightarrow \exists y(y \in x\wedge w\in y)).$

4. [D]: $\forall x\forall y\forall w(w\in xy\leftrightarrow w\in x\wedge w\not \in y).$
Note 3.2. In general, one should be careful when applying the extension by definitions procedure to theories with axioms schemas, since the extension of such a schema to the expanded language involves the addition of infinitely many new axioms besides those that are allowed by the procedure. This is not a problem in cases like we have here, where every axiom schema is pure in the sense that no constraint is imposed on the formulas to which it may be applied. Therefore we may assume that every instance of $[{\in}{}ind]$ in the expanded language is an axiom of $\mathbf {VBS}$ . However, one should be cautious about the issue of being conservative when the procedure is applied in more complicated cases, like in case we have $[{\in}{}ind]$ restricted to some class of formulas (e.g., bounded formulas).
Proposition 3.3. For every $n\geq 0$ ,
Proof We do the case $n=2$ (which trivially implies the cases $n=0$ and $n=1$ ). The proof for any other n is similar.
Given $x_0$ , to show that $\forall x_1\in x_0\forall x_2\in x_1.x_0\not \in x_2$ , we may assume (using $[{\in}{}ind]$ ) that $(\star )\mbox {\ \ \ } \forall y_0 \in x_0 \forall y_1\in y_0\forall y_2\in y_1.y_0\not \in y_2$ . Suppose now that there are $x_1$ and $x_2$ such that: $x_1\in x_0\wedge x_2\in x_1\wedge x_0\in x_2$ Since $x_1\in x_0$ , we may apply $(\star )$ with $y_0=x_1$ , $y_1=x_2$ , and $y_2=x_0$ , to get that $x_1\not \in x_0$ , which is a contradiction. So no such $x_1$ and $x_2$ exist.
We are ready to introduce our definition of the notion of a natural number:
Definition 3.4.

1. $S(x):=x\cup \{x\}$ (where $\{x\}=\{x,x\}$ ).

2. $N(x):=\forall y\in S(x)( y=\emptyset \vee \exists z\in x. y=S(z)).$
Note 3.5. Officially, $N(x)$ is the following formula:
Proposition 3.6. Let $0:=\emptyset $ . The following are provable in $\mathbf {VBS}$ :

1. $N(0)$ .

2. $\forall x(N(x)\leftrightarrow N(S(x)))$ .

3. $S(x)\neq 0.$

4. $S(x)=S(y) \to x=y.$
Proof Points 1 and 3 are trivial. Point 4 easily follows from Proposition 3.3. We show point 2.

• Suppose that $N(x)$ . We show that $N(S(x))$ , i.e., that
$$ \begin{align*}\forall y\in S(S(x))( y=0\vee\exists z\in S(x). y=S(z)).\end{align*} $$So let $y\in S(S(x))$ . We show that $y=0\vee \exists z\in S(x). y=S(z)$ . Since we assume $N(x)$ , this is obvious in case $y\in S(x)$ , because every $z\in x$ is also in $S(x)$ . There remains the case $y=S(x)$ , but this case is trivial, since $x\in S(x)$ (and so x is an element z in $S(x)$ such that $y=S(z)$ ). 
• Suppose that $N(S(x))$ , i.e.,
$$ \begin{align*}\forall y\in S(S(x))( y=0\vee\exists z\in S(x). y=S(z)).\end{align*} $$We show that $N(x)$ . So let $y\in S(x)$ . We show that $y=0\vee \exists z\in x. y=S(z)$ . Suppose that the first disjunct fails, i.e., $y\neq 0$ . Since $S(x)\subseteq S(S(x))$ , our assumption implies that there is $z\in S(x)$ such that $y=S(z)$ . It is impossible that $z=x$ , since in this case we would get that $S(x)\in S(x)$ . Hence $z\in x$ , and we are done.
Proposition 3.7. $\vdash _{\mathbf {VBS}}\varphi \{0/x\}\wedge \forall x(\varphi \to \varphi \{S(x)/x\})\to \forall x(N(x)\to \varphi ).$
Proof Assume $\mathrm {(I)}\ \varphi \{0/x\}\wedge \forall x(\varphi \to \varphi \{S(x)/x\})$ . To show $\forall x(N(x)\to \varphi )$ , it suffices by $[{\in}{}ind]$ to show that $\forall x(\forall y\in x(N(y)\to \varphi \{y/x\})\to (N(x)\to \varphi ))$ . So assume that $\mathrm {(II)}\ \forall y\in x(N(y)\to \varphi \{y/x\})$ and $\mathrm {(III)}\ N(x)$ . We show $\varphi $ . If $x=0$ then this is implied by (I). If not, then it follows from (III) that there is $z\in x$ such that $x=S(z)$ . Hence (II) entails that $N(z)\to \varphi \{z/x\}$ . But $N(z)$ follows from (III) by the second item of Proposition 3.6. Therefore $\varphi \{z/x\}$ . From this $\varphi $ (which is equivalent to $\varphi \{S(z)/x\}$ ) follows by (I).
Proposition 3.8. Let $<:=\in $ . The following are provable in $\mathbf {VBS}$ :

1. $x\not <0.$

2. $x<S(y) \leftrightarrow x<y\vee x=y.$

3. $N(x)\wedge y<x\to N(y).$
Proof The first two items are trivial. We prove the third by induction on x (i.e., by using Proposition 3.7). So let $\varphi :=\forall y(N(x)\wedge y<x\to N(y))$ . Obviously, $\vdash _{\mathbf {VBS}}\varphi \{0/x\}$ . We show that $\vdash _{\mathbf {VBS}}\forall x(\varphi \to \varphi \{S(x)/x\})$ . So assume $\varphi (x)$ and that $N(S(x))$ and $y<S(x)$ . By Proposition 3.6, the first assumption implies that $N(x)$ . Hence $y=x$ implies that $N(y)$ . Otherwise $y<x$ , and so the induction hypothesis implies that $N(y)$ .
Finally we show that $N(x)$ (intuitively) defines an invariant collection. Since this is an intuitive theorem in the metalanguage, the proof is intuitive too. Still, it employs only predicatively acceptable principles.
Proposition 3.9. Let $\mathcal {M}_1$ and $\mathcal {M}_2$ be transitive models of VBS (in the usual sense of set theory) such that $\mathcal {M}_1\subseteq \mathcal {M}_2$ . Then:
Proof The fact that $N(x)$ is bounded (and so absolute) implies that $\omega _1\subseteq \omega _2$ . For the converse, we show that $\forall a\in \mathcal {M}_2((\mathcal {M}_2\models N(a))\to (a\in \mathcal {M}_1\wedge \mathcal {M}_1\models N(a)))$ . So let $a\in \mathcal {M}_2$ , and assume that $\mathcal {M}_2\models N(a)$ . By $[{\in}{}ind]$ we may assume also that $\forall b\in a((\mathcal {M}_2\models N(b))\to (b\in \mathcal {M}_1\wedge \mathcal {M}_1\models N(b)))$ . Since $\mathcal {M}_2\models N(a)$ , either $a=\emptyset $ , or there exists $b\in a$ such that $\mathcal {M}_2\models a=S(b)$ . In the first case $a\in \mathcal {M}_1$ holds trivially. So assume the second case. Then item 2 of Proposition 3.6 implies that $\mathcal {M}_2\models N(b)$ . Hence the induction hypothesis implies that $b\in \mathcal {M}_1\wedge \mathcal {M}_1\models N(b)$ . Again by Proposition 3.6, this entails that $S(b)\in \mathcal {M}_1\wedge \mathcal {M}_1\models N(S(b))$ . Hence $a\in \mathcal {M}_1\wedge \mathcal {M}_1\models N(a)$ .
4 The system $\mathbf {PW}$
4.1 A description of the system
The language of the system $\mathbf {PW}$ is an onesorted firstorder language with equality. As in Section 2, we take its variables to be $v_0,v_1,\ldots $ , and use letters (both small and capital) from the end of the Latin alphabet to vary over them. All other components of the language (predicate and operation symbols, terms, formulas, the invariance relation $\succ $ and the $\Sigma $ formulas) are simultaneously generated as described in Sections 4.1.1–4.1.5. The axioms and rules of $\mathbf {PW}$ are presented in Sections 4.1.6 and 4.1.7, respectively. (Note that in the formulation of the last rule [Unif] there is a use of the formula $Fun$ , the operation $Dom$ , and the term $f(x)$ . They are all introduced in Section 5.1 without using [Unif]. $Fun(f)$ says that f is a function. $Dom(f)$ and $f(x)$ have their usual meanings.)
4.1.1 Predicate symbols and operation symbols

1. $=$ and $\in $ are binary predicate symbols.

2. $\omega $ is a constant (i.e., an 0ary operation symbol).

3. If $\varphi \succ \emptyset $ and $Fv(\varphi )=\{v_0 ,\ldots \!,\! v_{n}\}$ then $P_{\varphi }$ is an $n+1$ predicate symbol.

4. If $\varphi $ is $\Sigma $ and $Fv(\varphi )=\{v_0 ,\ldots \!,\! v_{n}\}$ then $F_{\varphi }$ is an nary operation symbol.
4.1.2 Terms

1. Every variable is a term.

2. $F(t_1 ,\ldots \!,\! t_n)$ is a term if F is an nary operation, and $t_1 ,\ldots \!,\! t_n$ are terms.
4.1.3 Formulas

1. $P(t_1 ,\ldots \!,\! t_n)$ is a formula if P is an nary predicate, $t_1 ,\ldots \!,\! t_n$ are terms.

2. The formulas are closed under $\neg $ , $\wedge $ , $\vee $ , and $\to $ .

3. If $\varphi $ is a formula and x is a variable, then $\exists x\varphi $ and $\forall x\varphi $ are formulas.
4.1.4 The invariance relation $\succ $

1. $t\in s\succ \emptyset $ and $t=s\succ \emptyset $ .

2. $P_{\varphi }(t_0 ,\ldots \!,\! t_n)\succ \emptyset $ .

3. $\varphi \succ \{x\}$ if $\varphi \in \{x\neq x,x=t,t=x,x\in t\}$ and $x\not \in Fv(t)$ .

4. $\neg \varphi \succ \emptyset $ if $\varphi \succ \emptyset $ .

5. $\varphi \vee \psi \succ X$ if $\varphi \succ X$ and $\psi \succ X$ .

6. $\varphi \wedge \psi \succ X\cup Y$ if $\varphi \succ X$ , $\psi \succ Y$ and $Y\cap Fv(\varphi )=\emptyset $ .

7. $\exists y \varphi \succ X\{y\}$ if $y\in X$ and $\varphi \succ X$ .
4.1.5 $\Sigma $ formulas

1. If $\varphi \succ \emptyset $ then $\varphi $ is $\Sigma $ . Such formulas are called absolute.

2. If $\varphi $ and $\psi $ are $\Sigma $ then so are $\varphi \vee \psi $ and $\varphi \wedge \psi $ .

3. If $\varphi $ is $\Sigma $ then so is $\exists x\varphi $ .

4. If $\varphi \succ \{y_1,\ldots \!,\! y_k\}$ , and $\psi $ is $\Sigma $ , then $\forall y_1\dots \forall y_k(\varphi \to \psi )$ is $\Sigma $ .
4.1.6 Axioms

1. [Fol]: Every formula which is valid in firstorder logic with equality.Footnote ^{16}

2. [Ext] (Extensionality): $\forall z (z\in x \leftrightarrow z\in y)\to x=y$ .

3. [Comp] ( $\succ $ Comprehension): $\exists !Z\forall x(x\in Z\leftrightarrow \varphi )$ , provided that $\varphi \succ \{x\}$ .Footnote ^{17}

4. [ $\in $ ind] ( $\in $ induction): $(\forall x(\forall y(y\in x\to \varphi \{y/x\})\to \varphi ))\to \forall x\varphi $ .

5. [Inf] (Infinity): $\forall x(x\in \omega \leftrightarrow N(x))$ , where $N(x)$ is presented in Note 3.5.

6. [PrI] $P_{\varphi }(v_0 ,\ldots \!,\! v_{n})\leftrightarrow \varphi $ (provided that $\varphi $ is as in Section 4.1.1).
4.1.7 Rules

1. [MP]: From $\varphi $ and $\varphi \to \psi $ infer $\psi $ .

2. [Gen]: From $\vdash _{\mathbf {PW}}\varphi $ infer $\vdash _{\mathbf {PW}}\forall x\varphi $ .

3. [OpI] From $\vdash _{\mathbf {PW}}\forall v_1\dots \forall v_n\exists !v_0\varphi $ infer $\vdash _{\mathbf {PW}} F_{\varphi }(v_1 ,\ldots \!,\! v_n)=v_0\leftrightarrow \varphi $ (provided that $\varphi $ is as in Section 4.1.1, i.e., $\varphi $ is $\Sigma $ and $Fv(\varphi )=\{v_0 ,\ldots \!,\! v_{n}\}$ ).

4. [Unif] (Unification Rule):
$$\begin{align*}\frac {\vdash_{\mathbf{PW}}\forall y_1\forall y_2(\varphi\{y_1/y\}\wedge\varphi\{y_2/y\}\to y_1=y_2)} {\vdash_{\mathbf{PW}}\forall x\in Z\exists y \varphi\to\exists!f(Fun(f)\wedge Dom(f)=Z\wedge \forall x\in Z\varphi\{f(x)/y\})}. \end{align*}$$Provided $\varphi $ is $\Sigma $ , x and y are distinct variables in $Fv(\varphi )$ , and $Z\not \in Fv(\varphi )$ .
Definition 4.1.

1. A proof in $\mathbf {PW}$ of a formula $\varphi $ is a sequence of formulas ending with $\varphi $ , such that each element of the sequence is either an axiom of $\mathbf {PW}$ or it can be inferred from previous elements of the sequence by one of the four rules of $\mathbf {PW}$ . $\varphi $ is a theorem of $\mathbf {PW}$ if it has a proof in $\mathbf {PW}$ .

2. A derivation in $\mathbf {PW}$ of a formula $\varphi $ from a set $\mathbf {T}$ of assumptions is a sequence of formulas ending with $\varphi $ , such that each element of the sequence is either a theorem of $\mathbf {PW}$ , or an element of $\mathbf {T}$ , or it can be inferred from two previous elements of the sequence by [MP]. $\varphi $ is derivable in $\mathbf {PW}$ from $\mathbf {T}$ if it has a derivation in $\mathbf {PW}$ from $\mathbf {T}$ .
Note 4.2. It should be emphasized that by Definition 4.1 (as well as by the difference in the way the rules of $\mathbf {PW}$ are formulated in Section 4.1.7), [MP] is the only rule of derivation of $\mathbf {PW}$ . All the rest are only rules of proof. This means that they can be applied only in assumptionsfree derivations (i.e., pure derivations from axioms). It follows that the deduction theorem obtains for $\mathbf {PW}$ : to show $\mathbf {T}\vdash _{\mathbf {PW}}(\varphi \to \psi )$ it suffices to prove that $\mathbf {T},\varphi \vdash _{\mathbf {PW}}\psi $ . However, while using this theorem in order to show that $\mathbf {T}\vdash _{\mathbf {PW}}(\varphi \to \psi )$ , one should be careful not to rely on a proof of $\psi $ from $\mathbf {T}\cup \{\varphi \}$ in which [Gen],[OpI] or [Unif] are applied to a formula which depends on an assumption in $\mathbf {T}\cup \{\varphi \}$ .Footnote ^{18}
Note 4.3. In the formulation above we have used the quantifier $\exists !$ in [Comp] and in [Unif]. This was done according to Principle (I) in Section 1.4.2. However, in both cases we can actually use the simpler connective $\exists $ . In the case of [Comp] this is due to the axiom [Ext], while in the case of [Unif] it follows from the premise of that rule.
Note 4.4. Like in the system PZF from [Reference Avron and Schindler5], the predicativity of definitions of sets is ensured $\mathbf {PW}$ by using an appropriate syntactic invariance relation $\succ $ between a formula $\varphi $ and subsets of $Fv(\varphi )$ . (The intended meaning of $\succ $ was described in item (II) at the beginning of Section 1.4.2.) Relations of this sort have originally been introduced in [Reference Avron and Hendricks3, Reference Avron4] in order to provide a unified theory of constructions and operations as they are used in different branches of mathematics and computer science, including set theory, computability theory, and database theory (see footnote 8). Further important theorems about them can be found in [Reference Avron, Lev and Levi8].Footnote ^{19}
Note 4.5. It is easy to show by induction that if $\varphi \succ X$ and $Y\subseteq X$ then $\varphi \succ Y$ . Therefore we have not included this important condition from [Reference Avron, Lev and Levi8] in our present definition of $\succ $ , but we shall use it freely in what follows. Another important condition which we shall treat as if it included in the definition of $\succ $ is that $\forall x_1\dots \forall x_n(\varphi \to \psi )\succ \emptyset $ if $\varphi \succ \{x_1 ,\ldots \!,\! x_n\}$ and $\psi \succ \emptyset $ . The reason is that every consequence of this condition can easily be derived without it (because $\forall x_1\dots \forall x_n(\varphi \to \psi )$ is logically equivalent to $\neg \exists x_1\dots \forall x_n(\varphi \wedge \neg \psi )$ .)
Note 4.6. The clauses in the definition of $\Sigma $ formulas are taken from [Reference Avron4]. This definition is a straightforward generalization of the usual definition of $\Sigma $ formulas. In particular, it includes all the formulas which are called ‘essentially existential HFformulas’ in [Reference Feferman17]. From Theorem 4.1 of [Reference Feferman17] it follows that every persistent formula is equivalent in $\mathbf {PW}$ to a $\Sigma $ formula, but we shall not use this fact here.
Note 4.7. By the results of [Reference Avron, Lev and Levi8], it is possible to use in [Pri], [OpI], and [Unif] bounded formulas instead of our absolute formulas, and the ordinarily defined class of $\Sigma $ formulas instead of our somewhat bigger class.
Note 4.8. Actually, the use of [PrI] does not really increase the power of $\mathbf {PW}$ , and so we can omit it from $\mathbf {PW}$ . However, it is very convenient to include it.
4.2 $\mathbf {PW}$ and $\mathbf {ZF}(P)$
In this section we prove that $\mathbf {PW}$ is a subsystem of (an extension by definitions of) $\mathbf {ZF}(P)$ . It follows that every theorem of $\mathbf {PW}$ is acceptable to the ordinary mathematicians.
Definition 4.9. Let e be a term or a formula or an operation symbol of $\mathbf {PW}$ .

1. $\sigma _{e}$ is the minimal signature $\sigma $ that satisfies the following conditions:

• It includes $\in $ and $=$ .

• It includes all the predicate symbols and operation symbols (including constants) that occur in e.

• If either $P_\psi $ or $F_\psi $ is in $\sigma $ , then so are all the predicate symbols and operation symbols that occur in $\psi $ .


2. e is legal if the premises of [OpI] obtain whenever $F_{\varphi }$ is in $\sigma _{e}$ .

3. For legal e, $\mathbf {PW}_{e}$ is the set of all theorem of $\mathbf {PW}$ in the language of $\sigma _{e}$ .
Definition 4.10. Let e be legal. A structure $\mathcal {M}$ is adequate for e if:

• $\mathcal {M}$ is a structure for a signature that contains $\sigma _{e}$ .

• $\mathcal {M}$ is a model of $\mathbf {PW}_{e}$ .
Definition 4.11. Let $\mathcal {M}$ be a transitive set or class. $\mathcal {M}^e$ is the structure for $\sigma _{e}$ which is obtained from $\mathcal {M}$ by defining an interpretation $I_e$ of $\sigma _{e}$ in $\mathcal {M}$ using the following recursive definition:

• $\in ^{I_e}=\in $ .

• $P_{\psi }^{I_e}=[\vec {a}: \mathcal {M}^n\mid \mathcal {M}^e,\{\vec {v}:=\vec {a}\}\models \psi ]$ in case $P_{\psi }$ is an nary predicate.

• If $F_{\psi }$ is an nary operation, then $F_{\psi }^{I_e}[\vec {a}]$ is the unique $b\in \mathcal {M}$ such that $\mathcal {M}^e,\{\vec {v}:=\vec {a}\;; \;v_n:=b\}\models \psi $ in case such a unique b exists, $\emptyset $ otherwise.
Theorem 4.12. Let $\mathcal {M}$ be a transitive model of $\mathbf {ZF}(P)$ , and let $\theta $ be a formula of $\sigma _{e}$ . Suppose that $\theta \succ \{x_1,\ldots ,x_n\}$ where $n>0$ . Then the following collection is an element of $\mathcal {M}$ for every assignment $\nu $ in $\mathcal {M}$ :
Proof By induction on the structure of $\theta $ , using the relevant clauses in the definition of $\succ $ .

• If $\theta $ is $x\neq x$ then $S_{\theta ,x}^{\nu }=\emptyset $ for every $\nu $ . Hence $S_{\theta ,x}^{\nu }\in \mathcal {M}$ .

• If $\theta $ is $x\in t$ where $x\not \in Fv(t)$ , then $S_{\theta ,x}^{\nu }=\nu [t]$ (because $\mathcal {M}$ is transitive). Hence $S_{\theta ,x}^{\nu }\in \mathcal {M}$ .

• If $\theta $ is $x=t$ where $x\not \in Fv(t)$ , then $S_{\theta ,x}^{\nu }=\{\nu [t]\}$ . Hence $S_{\theta ,x}^{\nu }\in \mathcal {M}$ by the pairing axiom.

• If $\theta $ is $\varphi \vee \psi $ , where $\varphi \succ \{x_1,\ldots \!,\! x_n\}$ and $\psi \succ \{x_1,\ldots \!,\! x_n\}$ , then $S_{\theta ,\vec {x}}^{\nu }=S_{\varphi ,\vec {x}}^{\nu }\cup S_{\psi ,\vec {x}}^{\nu }$ . Hence $S_{\theta ,\vec {x}}^{\nu }\in \mathcal {M}$ by the induction hypotheses for $\varphi $ and $\psi $ .

• Suppose $\theta $ is $\varphi \wedge \psi $ , where $\varphi \succ \{w_1,\ldots w_l\}$ , $\psi \succ \{y_1,\ldots ,y_k\}$ , $n=l+k$ , $\{y_1,\ldots ,y_k\}\cap Fv(\varphi )=\emptyset $ , and $\vec {x}=\langle w_1,\ldots w_l,y_1,\ldots ,y_k \rangle $ . We have three subcases to consider here.

– $k=0$ and $n=l$ . Then $S_{\theta ,\vec {x}}^{\nu }= [\vec {a}\in S_{\varphi ,\vec {x}}^{\nu }\mid \psi ]$ . Hence $S_{\theta ,\vec {x}}^{\nu }\in \mathcal {M}$ by the induction hypothesis for $\varphi $ and the separation axiom.

– $l=0$ and $n=k$ . This case is similar to the previous one.

– $l>0$ and $k>0$ . To simplify notation, assume that $l=k=1$ , $Fv(\varphi )=\{w,z\}$ , $Fv(\psi )=\{w,y,z\}$ . For $c\in \mathcal {M}$ , let
$$ \begin{align*}Z(c)=[a\in \mathcal{M}\mid \mathcal{M}^e\models \varphi(a,c)].\end{align*} $$Since $\varphi \succ \{x\}$ , $Z(c)\in \mathcal {M}$ by the induction hypothesis for $\varphi $ . By the induction hypothesis for $\psi $ , this and the fact that $\psi \succ \{y\}$ imply that $[b\in \mathcal {M}\mid \mathcal {M}^e\models \psi (d,b,c)]\in \mathcal {M}$ for every $c\in \mathcal {M}$ and $d\in Z(c)$ . Denote this set by $W(c,d)$ . Then $[\langle a,b \rangle \in \mathcal {M}^2\mid \mathcal {M}^e\models \theta (a,b,c)]$ equals $\bigcup _{d\in Z(c)}\{d\}\times W(c,d)$ . Hence $[\langle a,b \rangle \in \mathcal {M}^2\mid \mathcal {M}^e\models \theta (a,b,c)]\in \mathcal {M}$ by the axioms of replacement and union.


• Suppose $\theta $ is $\exists y\varphi $ , where $\varphi \succ \{x_1,\ldots \!,\! x_n,y\}$ . Then
$$ \begin{align*}S_{\theta,\vec{x}}^{\nu}=[\vec{a}\in \mathcal{M}^n\mid \exists b\in M.\langle a_1,\ldots,a_n,b \rangle\in S_{\varphi,\vec{x}}^{\nu}].\end{align*} $$Hence $S_{\theta ,\vec {x}}^{\nu }\in \mathcal {M}$ , by the induction hypothesis for $\varphi $ and the fact that $\mathcal {M}$ is a model of $\mathbf {Z}(P)$ (and so is closed under the projection operation).
Corollary 4.13. From a platonist point of view, every collection which is defined by [Comp] is set (i.e., an element of the platonist universe $\mathsf {V}$ ).
Proof $\mathsf {V}$ is (believed to be) a model of $\mathbf {Z}(P)$ . Hence Theorem 4.12 applies.
Note 4.14. Another platonist collection which is known to be a model of $\mathbf {Z}(P)$ is $\mathsf {HC}$ (the collection of hereditarily countable sets). From Theorem 4.24 it follows that if $\theta \succ \{x_1 ,\ldots \!,\! x_n\}$ then the collection $S_{\theta ,\vec {x}}^{\nu }$ is the same in $\mathsf {V}$ and in $\mathsf {HC}$ for every assignment $\nu $ in $\mathsf {HC}$ . Hence it is countable for every such $\nu $ .
Theorem 4.15. Every transitive model $\mathcal {M}$ of $\mathbf {Z}(P)$ can be expanded to a model of $\mathbf {PW}$ .
Proof Obviously, the recursive definition given in Definition 4.11 can be extended to provide an expansion of $\mathcal {M}$ to a structure $\mathcal {M}^{\mathbf {PW}}$ for the language of $\mathbf {PW}$ . Now the axiom [Comp] is valid in this structure by Theorem 4.12, while the validity of [Unif] follows from the fact that if we replace this rule by the corresponding implication, we get a schema which is equivalent (in ZF without powerset and replacement) to the replacement schema. Hence [Unif] is valid in $\mathcal {M}^{\mathbf {PW}}$ . That all other axioms and rules of $\mathbf {PW}$ are valid there is obvious.
Corollary 4.16. $\mathsf {HC}$ is a model of $\mathbf {PW}$ . (More precisely, $\mathsf {HC}$ can be expanded to such a model.)
Theorem 4.17. $\mathbf {PW}$ is equivalent to a subsystem of $\mathbf {ZF}(P)$ .
Proof This easily follows from Theorem 4.15 using ZF and modeltheoretic methods. Alternative (and also predicatively acceptable) method is to turn the proof of Theorem 4.12 to a proof that every theorem of $\mathbf {PW}$ is provable in (an extension by definition of) $\mathbf {ZF}(P)$ . This is not difficult. That the other axioms and rules of $\mathbf {PW}$ are derivable in $\mathbf {ZF}(P)$ is (almost) obvious.
4.3 Predicative justification of $\mathbf {PW}$
From the nonlogical axioms and rules of $\mathbf {PW}$ , [Ext] and [PrI] need no justification, while [Inf] was justified in Section 3. It remains to justify the other nonlogical axioms and rules of $\mathbf {PW}$ .
4.3.1 [Comp] and [OpI]
Theorem 4.12 alone is not sufficient for justifying [Comp]. In order to really justify this axiom from a predicative point of view, we should show that the identity of the sets it defines is invariant. Similarly, in order to justify [OpI] we should show that the operation defined by it is invariant. The first step towards these goals is to provide a precise notion of invariance which is adequate for the settheoretical context.
Obviously, talking about invariance of definitions of sets and operations, when we expand one sphere of operation $\mathcal {M}_1$ to a bigger one $\mathcal {M}_2$ , can make sense only if the identities of the elements of $\mathcal {M}_1$ are preserved in $\mathcal {M}_2$ . Since in the context of set theory we take the identity of a set to be fully determined by the identity of its elements, this means that the same objects should belong to an element a of $\mathcal {M}_1$ in both $\mathcal {M}_1$ and $\mathcal {M}_2$ . Accordingly, we define the following:
Definition 4.18. Let $\mathcal {M}_1=\langle \mathcal {D}_1,I_1 \rangle $ and $\mathcal {M}_2=\langle \mathcal {D}_2,I_2 \rangle $ be structures for signatures that contain $\in $ , and let both be models of [Ext]. $\mathcal {M}_2$ is an $\in $ extension of $\mathcal {M}_1$ if $\mathcal {D}_1\subseteq \mathcal {D}_2$ , and the following holds for every element a of $\mathcal {D}_1$ :
Note 4.19. From a platonistic point of view, if $\in ^{I_1}$ and $\in ^{I_2}$ are both the ‘real’ $\in $ of $\mathsf {V}$ , then $\mathcal {M}_2$ is an $\in $ extension of $\mathcal {M}_1$ iff $\mathcal {D}_1$ is a transitive subset of $\mathcal {D}_2$ .
Definition 4.20. For legal e, $\langle \mathcal {M}_1,\mathcal {M}_2 \rangle $ is an epair if:

• $\mathcal {M}_1$ and $\mathcal {M}_2$ are adequate for e.

• $\mathcal {M}_2$ is an $\in $ extension of $\mathcal {M}_1$ .

• $[a\in \mathcal {D}_2\mid \mathcal {M}_2\models N(a)]=[a\in \mathcal {D}_1\mid \mathcal {M}_1\models N(a)].$
Definition 4.21.

1. A legal term t of $\mathbf {PW}$ is invariant if $\nu _{\mathcal {M}_1}[t]=\nu _{\mathcal {M}_2}[t]$ whenever $\langle \mathcal {M}_1,\mathcal {M}_2 \rangle $ is a tpair, and $\nu $ is an assignment in $\mathcal {M}_1$ .

2. A legal nary operation F of $\mathbf {PW}$ is invariant if
$$ \begin{align*}F^{\mathcal{M}_1}[a_1 ,\ldots\!,\! a_n]=F^{\mathcal{M}_2}[a_1 ,\ldots\!,\! a_n]\end{align*} $$whenever $\langle \mathcal {M}_1,\mathcal {M}_2 \rangle $ is an Fpair, and $a_1 ,\ldots \!,\! a_n$ are elements of $\mathcal {M}_1$ .

3. A legal formula $\varphi $ of $\mathbf {PW}$ such that $\{x_1 ,\ldots \!,\! x_n\}\subseteq Fv(\varphi )$ is invariant with respect to $\{x_1 ,\ldots \!,\! x_n\}$ if the following holds for every assignment $\nu $ in $\mathcal {M}_1$ :
$$ \begin{align*}[\vec{a}: \mathcal{D}_2^n\mid \mathcal{M}_2,\nu\{\vec{x}:=\vec{a}\}\models\varphi]= [\vec{a}: \mathcal{D}_1^n\mid \mathcal{M}_1,\nu\{\vec{x}:=\vec{a}\}\models\varphi].\end{align*} $$ 
4. A legal formula $\varphi $ of $\mathbf {PW}$ is persistent if $\mathcal {M}_1,\nu \models \varphi $ implies that ${\mathcal {M}_2,\nu \models \varphi }$ as well, whenever $\langle \mathcal {M}_1,\mathcal {M}_2 \rangle $ is a $\varphi $ pair, and $\nu $ is an assignment in $\mathcal {M}_1$ .
Note 4.22. The definition of invariability of formulas can also be formulated as follows: a legal formula $\varphi $ of $\mathbf {PW}$ such that $Fv(\varphi )=\{x_1 ,\ldots \!,\! x_n,y_1 ,\ldots \!,\! y_k\}$ is invariant with respect to $\{x_1 ,\ldots \!,\! x_n\}$ if the following holds whenever $\langle \mathcal {M}_1,\mathcal {M}_2 \rangle $ is a $\varphi $ pair, and $c_1, \ldots , c_k$ are elements of $\mathcal {M}_1$ :
(where $\langle a_1 ,\ldots \!,\! a_n \rangle ^{\frown }\!\langle c_1 ,\ldots \!,\! c_k \rangle = \langle a_1 ,\ldots \!,\! a_n,c_1 ,\ldots \!,\! c_k \rangle $ .)
Note 4.23. Identifying $\langle \rangle $ with $\emptyset $ , we get that in case $n=0$ , the collection $[\vec {a}: \mathcal {D}^n\mid \mathcal {M}\models \varphi (\vec {a}^{\frown }\!\vec {c})]$ is either 1 ( $=\{\emptyset \}$ ) or 0 ( $=\emptyset $ ), depending on whether $\mathcal {M}\models \varphi (\vec {c})$ or not. Hence invariance with respect to $\emptyset $ is simply absoluteness.
Convention. From now on, when we talk about terms, we shall mean legal terms. The same convention applies to formulas and operation symbols.
Theorem 4.24.

1. Every term t of $\mathbf {PW}$ is invariant.

2. Every operation F of $\mathbf {PW}$ is invariant.

3. If $\varphi \succ \{x_1 ,\ldots \!,\! x_n\}$ in $\mathbf {PW}$ then $\varphi $ is invariant with respect to $\{x_1 ,\ldots \!,\! x_n\}$ .

4. Every $\Sigma $ formula of $\mathbf {PW}$ is persistent.
Proof We prove all parts simultaneously, using induction on the complexity of e, where e is a term or a formula or an operation symbol of $\mathbf {PW}$ . Nevertheless, to facilitate reading and understanding, we split the various induction steps into groups that correspond to the four parts of the theorem.
In what follows we assume that for every e we consider, $\langle \mathcal {M}_1,\mathcal {M}_2 \rangle $ is an epair ( $\mathcal {M}_1=\langle \mathcal {D}_1,I_1 \rangle $ , $\mathcal {M}_2=\langle \mathcal {D}_2,I_2 \rangle $ ), and $\nu $ is an assignment in $\mathcal {D}_1$ .

Operations: Suppose that e is the nary operation symbol $F_{\varphi }$ . For convenience of presentation, assume that $n=1$ . Let $a:\mathcal {D}_1$ , and let $b=F_{\varphi }^{I_1}[a]$ . Since $\mathcal {M}_1$ is a model of $\mathbf {PW}_e$ , the legality of $F_{\varphi }$ and the rule [OpI] imply that $\mathcal {M}_1\models \varphi (\langle a,b \rangle )$ . Since $\varphi $ should be a $\Sigma $ formula, the induction hypothesis for $\varphi $ implies that $\mathcal {M}_2\models \varphi (\langle a,b \rangle )$ too. Since $\mathcal {M}_2$ is a model of $\mathbf {PW}_e$ , this in turn implies (with the help of [OpI]) that $b=F_{\varphi }^{I_2}[a]$ as well.

Terms:

• The case where e is a variable is trivial.

• The case $e=\omega $ is immediate from the third item of Definition 4.20.

• Suppose that e is the term $F_{\varphi }(s_1 ,\ldots \!,\! s_n)$ . Then
$$ \begin{align*}\nu_{\mathcal{M}_1}[e]=F_{\varphi}^{I_1}[\nu_{\mathcal{M}_1}[s_1] ,\ldots\!,\! \nu_{\mathcal{M}_1}[s_n]]\ \ \ \nu_{\mathcal{M}_2}[e]=F_{\varphi}^{I_2}[\nu_{\mathcal{M}_2}[s_1] ,\ldots\!,\! \nu_{\mathcal{M}_2}[s_n]].\end{align*} $$Now from the induction hypothesis for $F_{\varphi }$ we get:
$$\begin{align*}F_{\varphi}^{I_1}[\nu_{\mathcal{M}_1}[s_1] ,\ldots\!,\! \nu_{\mathcal{M}_1}[s_n]]= F_{\varphi}^{I_2}[\nu_{\mathcal{M}_1}[s_1] ,\ldots\!,\! \nu_{\mathcal{M}_1}[s_n]]\end{align*}$$while from the induction hypotheses for $s_1 ,\ldots \!,\! s_n$ we get that
$$\begin{align*}F_{\varphi}^{I_2}[\nu_{\mathcal{M}_1}[s_1] ,\ldots\!,\! \nu_{\mathcal{M}_1}[s_n]]= F_{\varphi}^{I_2}[\nu_{\mathcal{M}_2}[s_1] ,\ldots\!,\! \nu_{\mathcal{M}_2}[s_n]].\end{align*}$$It follows that $\nu _{\mathcal {M}_1}[e]=\nu _{\mathcal {M}_2}[e]$ .


The invariance relation: $\succ $

•

– To show the absoluteness (invariance with respect to $\emptyset $ ) of $s\in t$ , let $\nu $ be an assignment in $\mathcal {M}_1$ . Assume first that $\mathcal {M}_1,\nu \models s\in t$ . Then $\nu _{\mathcal {M}_1}[s]\in ^{I_1}\nu _{\mathcal {M}_1}[t]$ . It follows by the induction hypotheses for s and t and the fact that $\mathcal {M}_2$ is an $\in $ extension of $\mathcal {M}_1$ that $\nu _{\mathcal {M}_2}[s]\in ^{I_2}\nu _{\mathcal {M}_2}[t]$ too. Hence $\mathcal {M}_2,\nu \models s\in t$ . The proof of the converse (i.e., that if $\mathcal {M}_2,\nu \models s\in t$ then $\mathcal {M}_1,\nu \models s\in t$ ) is similar.

– We leave the simpler proof that $s=t$ is absolute to the reader.

– The case $e=P_{\varphi }(t_1,\ldots \!,\! t_n)$ is immediate from the induction hypothesis for $\varphi $ , and the $\varphi $ instance of [PrI] (which is in $\mathbf {PW}_e$ ).


•

– That $x\neq x$ is invariant with respect to x follows from the fact that for every $\mathcal {M}=\langle \mathcal {D},I \rangle $ and $\nu $ , $[a:\mathcal {D}\mid \mathcal {M},\nu \{x:=a\}\models x\neq x]=\emptyset $ .

– Let e be the formula $x= t$ , where $x\not \in Fv(t)$ . Then the collection $[a\in \mathcal {D}_1\mid \mathcal {M}_1,\nu \{x:=a\}\models x=t]$ is the singleton of $\nu _{\mathcal {M}_1}[t]$ , while $[a\in \mathcal {D}_2\mid \mathcal {M}_2,\nu \{x:=a\}\models x=t]$ is the singleton of $\nu _{\mathcal {M}_2}[t]$ . Hence the induction hypothesis for t implies that the two sets are equal.

– Let e be the formula $x\in t$ , where $x\not \in Fv(t)$ . Then
$$ \begin{align*}[a\in\mathcal{D}_1\mid \mathcal{M}_1,\nu\{x:=a\}\models x\in t]=[a\in\mathcal{D}_1\mid a\in^{\mathcal{M}_1}\nu_{\mathcal{M}_1}[t]],\end{align*} $$$$ \begin{align*}[a\in\mathcal{D}_2\mid \mathcal{M}_2,\nu\{x:=a\}\models x\in t]=[a\in\mathcal{D}_2\mid a\in^{\mathcal{M}_2}\nu_{\mathcal{M}_2}[t]].\end{align*} $$Since $\nu _{\mathcal {M}_1}[t]=\nu _{\mathcal {M}_2}[t]$ by the induction hypothesis for t, these two equations and the fact that $\mathcal {M}_2$ is an $\in $ extension of $\mathcal {M}_1$ imply:$$ \begin{align*}&[a\in\mathcal{D}_1\mid \mathcal{M}_1,\nu\{x:=a\}\models x\in t]\\&\quad=[a\in\mathcal{D}_2\mid \mathcal{M}_2,\nu\{x:=a\}\models x\in t].\end{align*} $$


• The proof that if $\varphi $ is absolute then so is $\neg \varphi $ is left to the reader.

• Let e be $\varphi \vee \psi $ , where $\varphi \succ \{x_1,\ldots \!,\! x_n\}$ and $\psi \succ \{x_1,\ldots \!,\! x_n\}$ . Then the collection $[\vec {a}\in \mathcal {D}_1^n\mid \mathcal {M}_1,\nu \{\vec {x}:=\vec {a}\}\models \varphi \vee \psi ]$ is the union of $[\vec {a}\in \mathcal {D}_1^n\mid \mathcal {M}_1,\nu \{\vec {x}:=\vec {a}\}\models \varphi ]$ and $[\vec {a}\in \mathcal {D}_1^n\mid \mathcal {M}_1,\nu \{\vec {x}:=\vec {a}\}\models \psi ]$ . A similar equation holds for $[\vec {a}\in \mathcal {D}_2^n\mid \mathcal {M}_2,\nu \{\vec {x}:=\vec {a}\}\models \varphi \vee \psi ]$ . The equality of these two collections follows therefore from the induction hypotheses for $\varphi $ and $\psi $ .

• Let e be $\theta =\varphi \wedge \psi $ , where $\varphi \succ X$ , $\psi \succ Y$ , and $Y\cap Fv(\varphi )=\emptyset $ . To simplify notation, assume that $Fv(\varphi )=\{x,z\}$ , $Fv(\psi )=\{x,y,z\}$ , $X=\{x\}$ , $Y=\{y\}$ . For $c\in \mathcal {D}_1$ , let $Z(c)=[a\in \mathcal {D}_2\mid \mathcal {M}_2\models \varphi (a,c)]$ . Since $\varphi \succ X$ , $Z(c)=[a\in \mathcal {D}_1\mid \mathcal {M}_1\models \varphi (a,c)]$ as well (by the induction hypothesis for $\varphi $ ). Hence $Z(c)\subseteq \mathcal {D}_1$ . By the induction hypothesis for $\psi $ , this and the fact that $\psi \succ Y$ imply that if $d\in Z(c)$ then
$$ \begin{align*}[b\in \mathcal{D}_2\mid \mathcal{M}_2\models \psi(d,b,c)]= [b\in \mathcal{D}_1\mid \mathcal{M}_1\models \psi(d,b,c)].\end{align*} $$Denote this set by $W(c,d)$ . Now both $[\langle a,b \rangle \in \mathcal {D}_2^2\mid \mathcal {M}_2\models \theta (a,b,c)]$ and $[\langle a,b \rangle \in \mathcal {D}_1^2\mid \mathcal {M}_1\models \theta (a,b,c)]$ equal $\bigcup _{d\in Z(c)}\{d\}\times W(c,d)$ . Hence these two sets are the same (for every $c\in \mathcal {D}_1$ ). 
• We leave to the reader the case $e=\exists y\varphi $ , where $\varphi \succ X$ and $y\in X$ .

•

$\Sigma $ formulas:

• Invariance implies persistence. Hence if $\varphi $ is absolute then it is persistent by what we have shown above.

• It is well known that persistence of formulas is closed under disjunction, conjunction, and existential quantification, so we leave to the reader the standard proofs of these cases.

• Suppose that $\theta $ is $\forall x(\varphi \to \psi )$ , where $\varphi \succ \{x\}$ and $\psi $ is a $\Sigma $ formula. By the induction hypothesis for $\varphi $ and $\psi $ , $\varphi $ is invariant with respect to x, and $\psi $ is persistent. We show that $\theta $ is persistent too. So let $\nu $ be an assignment in $\mathcal {M}_1$ such that $\mathcal {M}_1,\nu \models \theta $ . We show that $\mathcal {M}_2,\nu \models \theta $ . So let $a\in \mathcal {D}_2$ . We should show that $\mathcal {M}_2,\nu \{x:=a\}\models \varphi \to \psi $ . This is certainly true in case $\mathcal {M}_2,\nu \{x:=a\}\not \models \varphi $ . So assume that $\mathcal {M}_2,\nu \{x:=a\}\models \varphi $ . Then the invariability of $\varphi $ with respect to x implies that $a\in \mathcal {D}_1$ , $\nu \{x:=a\}$ is an assignment in $\mathcal {D}_1$ , and $\mathcal {M}_1,\nu \{x:=a\}\models \varphi $ . Since $\mathcal {M}_1,\nu \models \theta $ , also $\mathcal {M}_1,\nu \{x:=a\}\models \varphi \to \psi $ . It follows that $\mathcal {M}_1,\nu \{x:=a\}\models \psi $ . This implies that $\mathcal {M}_2,\nu \{x:=a\}\models \psi $ , since $\psi $ is persistent. It follows that $\mathcal {M}_2,\nu \{x:=a\}\models \varphi \to \psi $ in this case too.

Clearly, Theorem 4.24 provides the predicative justification of [Comp] (according to the invariance criterion) that Theorem 4.12 does not. As for [OpI], its premise ensures that $F_{\varphi }$ can be introduced using the usual procedure of extension by definitions. That the resulting operation is invariant follows again from Theorem 4.24.
4.3.2 [ $\in $ ind]
That [ $\in $ ind] is predicatively valid should be obvious: In any sphere of operation, the empty set is the starting building block of all sets, every other set A is formed from the elements of A, and those elements are logically prior to A. Therefore in any sphere of operation, a property that the empty set has and is inherited by a set from its elements should necessarily hold for all sets. Hence [ $\in $ ind] is valid in any acceptable sphere of operation.
Another way to look at the matter is by asking what properties the basic predicate $\in $ should have according to the predicativist view of sets. Since predicatively accepted sets are constructed bottomup, it should be clear that the ‘wellfoundedness’ of $\in $ should be one of those properties. But only the general, openended schema of [ $\in $ ind] fully exploits what we really have in mind when we say that $\in $ should be well founded. (In contrast, the intuitive absoluteness of the wellfoundness of $\in $ is not fully captured by any of the standard formal definitions of this notion.Footnote ^{20} For example, whether any subset of a given set A has a minimal element depends on what subsets of A are available, while the nonexistence of a descending $\in $ chain of elements of A depends on what sequences of elements of A are available.)
Still another point of view that might be taken here is that of a modest platonist who looks at the predicative spheres of operation. As was emphasized in Note 1.6 and shown in Theorem 4.17, in $\mathbf {PW}$ we restrict ourselves to predicative subsystems (of some extension by definitions) of $\mathbf {ZF}(P)$ . Assuming that the latter is consistent, it is impossible to define in it a set which have elements that form an infinite descending chain with respect to $\in $ . Hence this certainly cannot be done in more constrained predicative systems like $\mathbf {PW}$ .
Here it is worth noting that the predicativist conception of sets that underlies $\mathbf {PW}$ and the platonist cumulative one have a lot in common. According to both, all sets are created in stages, where the latter can be taken to be the von Neumann’s ordinals. (The difference is that according to predicativists, the “creation” is done by using legitimate definitions, while for platonists this can also be done by methods that go beyond what actual people can use.) Moreover, as we are going to see, both predicativists and platonists associate with every set A which is available to the former the same stage (called $rank(A)$ ) in which it first becomes available (even though the platonist set $\mathsf {V}_\alpha $ is available to predicativists only for $\alpha \leq \omega $ ). It seems to me obvious that this view of sets that predicativists and platonists share dictates for both the validity of [ $\in $ ind].
Two related notes:

1. In a private correspondence, Gerhard Jäger has raised two objections against accepting full [ $\in $ ind] as a predicatively accepted principle:

• [ $\in $ ind] is a minimality condition on the universe and thus leads to an inherent vicious circle.
I do not share this view. In my opinion, [ $\in $ ind] is not a minimality condition, but only a constraint that our spheres of operation should respect. These are two different things. What is more: I believe (though this belief is not reflected in the present paper) that not all forms of ‘circularity’ should be rejected from the invariance point of view. Thus, although one might claim that all sorts of recursive definitions are inherently circular, some forms of them, like primitive recursive definitions, are certainly predicatively acceptable.
In any case, the same argument can be raised against accepting the schema of induction on the natural numbers. However this general schema was accepted and used Weyl in [Reference Weyl52] (see [Reference Adams and Luo2, Reference Avron6]), and in fact it is accepted as predicative by almost everyone interested in the subject.

• In the form $\exists x \varphi [x] \to \exists x ( \varphi [x] \land (\forall y \in x)(\neg \varphi [y])$ the schema of [ $\in $ ind] claims the existence of a set, without presenting an explicit definition of this set, and without ensuring that its identity is invariant.
This issue has already been dealt with above in Note 1.5. As was emphasized there, every predicative system which is based on classical logic proves pure existential propositions of this sort, and there is nothing impredicative about that, as long as such propositions are understood and used in an appropriate way.


2. Although this is not really an acceptable argument for the predicative validity of [ $\in $ ind], it is still interesting to note that Feferman himself included this schema in the predicative set theories he constructed in [Reference Feferman, Scott and Jech18]. This clearly implies that he saw this schema as predicatively valid. In response one may argue that Feferman justified his set theories in [Reference Feferman, Scott and Jech18] only by reducing them to what he did in [Reference Feferman15]. However, Feferman did not take his set theories out of the blue; actually he too was led to them by pursuing the invariance criterion (see also [Reference Feferman17]). Hence the fact that he included [ $\in $ ind] already in the system $\mathbf {PS}_0$ , which is by far the weakest system studied in [Reference Feferman, Scott and Jech18], is telling. Moreover, Feferman explicitly said about the system PS that he studied in [Reference Feferman16] (note the name of that paper!) that an ideal predicativist can recognize as correct any particular axiom and rule of inference of that system. Since any instance of [ $\in $ ind] can be derived from the axioms and rules of PS, this means that according to Feferman a predicativist can recognize as correct any instance of [ $\in $ ind].
A final remark: personally, I have no doubt that the answer to the question whether [ $\in $ ind] is predicatively valid is positive, and that the arguments given above for this answer should be convincing (and would be accepted as such by Weyl and even Feferman). However, this question is not a mathematical one, and so some people might have different views on this point. Since I see debates on the exact meaning of a given word as useless, I simply take such conflicting views as indicating that like ‘predicativity’ in general, also ‘predicativity as invariance’ is a family of approaches to the foundations of mathematics. I would be quite happy to call mine ‘predicativity given that $\in $ is wellfounded’.
4.3.3 [Unif]
Let $\varphi $ be $\Sigma $ . For convenience, let $Fv(\varphi )=\{x,y\}$ . Suppose that at a certain point of working in $\mathbf {PW}$ we have reached a stage $\mathsf {s}$ in which the premise of [Unif] has been derived. Then from that point on it is valid in any sphere of operation that we reach. Let a be an object in some such sphere of operation. If $\forall x\in Z\exists y \varphi $ is false for $Z:=a$ in every sphere of operation that includes a which is reached at stage $\mathsf {s}$ or later, then certainly the corresponding conclusion of [Unif] is valid in all such spheres. Otherwise there is a stage $\mathsf {s}^\prime $ in which $\forall x\in Z\exists y \varphi $ is true for $Z:=a$ . Since $\varphi $ is $\Sigma $ , so is $\forall x\in Z\exists y \varphi $ . Hence Theorem 4.24 implies that $\forall x\in Z\exists y \varphi $ remains true for $Z:=a$ at any stage from $\mathsf {s}^\prime $ on. This and the validity of the premise of [Unif] in the sphere of operation $\mathcal {M}$ of each such stage imply that the collection of pairs $\langle c,d \rangle $ such that $c\in a$ and $\langle c,d \rangle $ satisfies $\varphi $ in $\mathcal {M}$ forms (‘in $\mathsf {V}$ ’) a function f on a. From the fact that $\varphi $ is $\Sigma $ it again follows that as a collection of pairs f remains invariant. Therefore f is entitled to be added at some stage to the sphere of operation of that stage, and from that point the corresponding instance of [Unif] remains valid.
Note 4.25. The only justification given by Feferman in [Reference Feferman and Lorenz19] for the version of [Unif] which is used in his auxiliary system $\exists /P$ is that he does not believe there can be a real dispute about it. However, one might wonder why his rule cannot simply be replaced by an axiom stating that its premise implies its conclusion (both without the ‘ $\vdash $ ’ in front). Why could there be a real dispute about this implication, but not about the rule? The answer is clear from the argument given above: For the truth of the conclusion of the rule for some a in some $\mathcal {M}$ such that $ \forall x\in Z\exists y \varphi $ holds in $\mathcal {M}$ for $Z:=a$ , it is crucial that the premise of the rule (i.e., the unicity condition) remains valid in any sphere of operation that contains $\mathcal {M}$ ; its truth just at $\mathcal {M}$ itself is insufficient.
4.4 Comparison with Feferman’s systems
The design of $\mathbf {PW}$ has a lot in common with  and was greatly influenced by  the secondorder system $P+\exists /P$ for predicative analysis that Feferman has developed in [Reference Feferman and Lorenz19]. In particular:

• In practice, neither $\mathbf {PW}$ nor $P+\exists /P$ has a signature which is fixed in advance, and the method of repeatedly extending the language and adding new corresponding axioms is an essential component of the work in both.

• The definitions of both systems involve a simultaneous recursive construction of their sets of symbols, terms, formulas, axioms, and rules.

• The two special rules of $\mathbf {PW}$ , [OpI] and [Unif], which (as is shown below) give this system its strong power (far beyond that of the systems $\mathbf {RST}\omega $ and $\mathbf {PZF}$ investigated in [Reference Avron and Schindler5, Reference Avron and Cohen7, Reference Cohen and Avron10]), generalize (respectively) to set theory the following two rules from [Reference Feferman and Lorenz19]:

Functional defining rule: This rule allows to infer
$$ \begin{align*}\vdash_P\forall v_1\forall v_0(\varphi(v_1,v_0)\leftrightarrow v_0=F_\varphi(v_1))\end{align*} $$from the premises $\vdash _P \forall v_1\forall v_2\forall v_3(\varphi (v_1,v_2)\wedge \varphi (v_1,v_3)\to v_2=v_3)$ and $\vdash _{\exists /P}\forall v_1\exists v_0\varphi (v_1,v_0)$ , provided that $\varphi $ is a formula in which quantification is made only over the lowest type N (implying that $\varphi $ is absolute). 
Unification rule: This rule allows to infer
$$ \begin{align*}\vdash_{\exists/P} \forall v_1\in\mathbf{N}\exists v_0\varphi(v_1,v_0)\to\exists f\forall v_1\in\mathbf{N}\varphi(v_1,f(v_1))\end{align*} $$from $\vdash _P \forall v_1\in \mathbf {N}\forall v_2\forall v_3(\varphi (v_1,v_2)\wedge \varphi (v_1,v_3)\to v_2=v_3)$ , provided that $\varphi $ is a formula in which quantification is made only over N.
It should be noted that a preliminary version, called [F], of a combination of [OpI] and [Unif] appears already in the last section of [Reference Feferman15], where predicativity at higher types is discussed. [F] allows to infer from $\forall x\exists ! y \varphi (x,y)$ the formula $\exists f\forall x\forall y(\varphi (x,y)\leftrightarrow y=f(x))$ , provided that $\varphi $ is a $\Sigma $ formula. (Note that unlike in the present context of typefree set theory, in [F] the variable f is of type higher than that of x.) The rule [F] was then implicitly split in [Reference Feferman and Lorenz19] into the two rules described above.

Nevertheless, there are also important differences between $\mathbf {PW}$ and ${P+\exists /P}$ :

1. $\mathbf {PW}$ is a single system. In contrast, $P+\exists /P$ is a combination of two different ones: P and $\exists /P$ . P is taken to be the major system, while the stronger system $\exists /P$ is taken to be only an auxiliary one, which is needed for the precise definition of P. The connection between these two systems involves some choices for which no justification is given in [Reference Feferman and Lorenz19]. (See Weaver’s criticism, with which I fully agree, in Section 1.6 of [Reference Weaver50].) We believe, in fact (though we have not tried to show), that with different (but still predicatively justified) choices, $P+\exists /P$ can be strengthened to a single theory which is as strong as $\mathbf {PW}$ .

2. P and $\exists /P$ are based on an extensive system of types, which has types of all finite levels (but actually uses only those of levels 0, 1, and 2). $\mathbf {PW}$ , in contrast, is a typefree, singlesorted set theory.

3. Unlike $\mathbf {PW}$ , which is purely firstorder, P and $\exists /P$ have secondorder variables in addition to the firstorder ones. In the case of $\exists /P$ it is even allowed (under certain conditions) to quantify on them. Moreover, both systems employ specific secondorder rules, like substitution of terms, and even (again under certain conditions) of formulas, for secondorder variables. (No rule of substitution is needed in the case of $\mathbf {PW}$ .)

4. The set of natural numbers is taken as given in P. In $\mathbf {PW}$ it is defined.
Note 4.26. Because of these differences, and because $\mathbf {PW}$ has simpler language and less rules than $P+\exists /P$ , we believe that it is fair to say that $\mathbf {PW}$ is significantly simpler then $P+\exists /P$ . The latter, in turn, is according to Feferman himself more perspicuous than the original predicative systems of [Reference Feferman15]. These observations justify the claim made at the beginning of Note 1.8.
Other systems of Feferman which are obviously related to $\mathbf {PW}$ are the pure set theories $\mathbf {PS_i}$ and $\mathbf {PS_iE}$ ( $i=0,1$ ) which were introduced in [Reference Feferman, Scott and Jech18] (see Note 1.4). Like $\mathbf {PW}$ , they too are intuitively motivated by the invariance criterion, and so $\Sigma $ formulas and absolute formulas play an important role in their formulation. Accordingly, those systems are less restricted than the secondorder systems of [Reference Feferman15]. Thus the union operation is allowed in them, although it is taken as impredicative in most of Feferman’s papers.Footnote ^{21} On the other hand, unlike $\mathbf {PW}$ , none of those systems respects our principle (I) from Section 1.4.2, since their axiomatizations include purely existential principles. An example is given by the $\Sigma $ reflection rule, which allows to infer $\exists a\psi ^{(a)}$ from $\psi $ in case $\psi $ is a $\Sigma $ formula (where $\psi ^{(a)}$ is obtained from $\psi $ by restricting each quantifier in it to a). $\mathbf {PS_iE}$ ( $i=0,1$ ) violates Principle (II) (from Section 1.4.2) as well, since it includes an axiom that says that every set is enumerable. Another very significant point of difference is due to the fact (shown in [Reference Feferman, Scott and Jech18]) that $\mathbf {PS_1}$ and $\mathbf {PS_1E}$ has a minimum model consisting of all sets constructible before $\Gamma _0$ . In contrast, the results of Section 7 imply than any transitive model of $\mathbf {PW}$ should contain $\Gamma _0$ . It follows that although the predicativity of the systems in [Reference Feferman, Scott and Jech18] is dubious, in some important sense they are all weaker than $\mathbf {PW}$ .Footnote ^{22}
5 Some examples of the power of $\mathbf {PW}$
5.1 Abstraction terms and RST
Let $Fv(\varphi )=\{x,v_1 ,\ldots \!,\! v_n\}$ ( $n\geq 0$ ), and suppose that $\varphi \succ \{x\}$ . Using [Comp], this entails
Since the formula $\varphi ^\star :=\forall x(x\in v_0\leftrightarrow \varphi )$ is $\Sigma $ ,Footnote ^{23} an application of [OpI] yields
Henceforth we shall write $\{x\mid \varphi (x,v_1 ,\ldots \!,\! v_n)\}$ instead of $F_{\varphi ^\star }(v_1 ,\ldots \!,\! v_n)$ (where $\varphi ^\star $ is defined from $\varphi $ as above), and call this an abstraction term. Obviously, $\vdash _{\mathbf {PW}} \forall x(x\in \{x\mid \varphi \}\leftrightarrow \varphi )$ . It follows that $\mathbf {PW}$ is an extension of the theory $\mathbf {RST}$ from [Reference Avron and Schindler5]. By the results of that paper, this implies that every rudimentary operation is definable in $\mathbf {PW}$ . Here are some examples of terms available in $\mathbf {RST}$ , and so in $\mathbf {PW}$ :

• ${\emptyset }=_{Df}\{x\mid x\neq x\}.$

• ${\{t_1 ,\ldots \!,\! t_n\}}=_{Df}\{x\mid x=t_1\vee \dots \vee x=t_n\}.$

• ${\langle t,s \rangle }=_{Df}\{\{t\},\{t,s\}\}.$

• ${\{x\in t\mid \varphi \}}=_{Df}\{x\mid x\in t\wedge \varphi \}$ (provided $\varphi \succ \emptyset $ and $x\not \in Fv(t)$ ).

• ${\{t(x)\mid x\in s\}}=_{Df}\{y\mid \exists x.x\in s\wedge y=t\}.$

• ${\bigcup t}=_{Df}\{x\mid \exists y.y\in t\wedge x\in y\}.$

• ${s\times t}=_{Df}\{x\mid \exists a \exists b. a\in s \wedge b\in t \wedge x=\langle a,b \rangle \}.$

• ${\iota x\varphi }=_{Df}\bigcup \{x\mid \varphi \}$ (provided $\varphi \succ \{x\}$ ).

• ${\lambda x\in s.t}=_{Df} \{\langle x,t \rangle \mid x\in s\}.$

• ${f(x)}=_{Df} \iota y.\exists z\exists v(z\in f\wedge v\in z\wedge y\in v\wedge z=\langle x,y \rangle ).$

• $Dom(f)=\{x\in \bigcup f\mid \exists y\in \bigcup \bigcup f.\langle x,y \rangle \in f\}.$

• $Im(f)=\{x\in \bigcup f\mid \exists y\in \bigcup \bigcup f.\langle y,x \rangle \in f\}.$

• $f\upharpoonright s=_{Df}\{\langle x,f(x) \rangle \mid x\in s\}$ (where x is new).
Note 5.1. The definition of $\succ $ does not directly imply that $z=\langle x,y \rangle \succ \{y\}$ . Therefore we could not have defined above $f(x):=\iota y.\exists z(z\in f\wedge z=\langle x,y \rangle )$ . However, it is not difficult to construct a formula $\psi (x,y,z)$ that is equivalent in $\mathbf {PW}$ to $z=\langle x,y \rangle $ and $\psi (x,y,z)\succ \{y\}$ , and even $\psi (x,y,z)\succ \{x,y\}$ . (Implicitly, this is what is done above in the definition of $f(x)$ .) Therefore from now in we assume that $z=\langle x,y \rangle \succ \langle x,y \rangle $ . (In addition $z=\langle x,y \rangle \succ \{z\}$ too, of course.)
The following are examples of easy related theorems of RST:

• $\exists ! x\varphi (x)\to \forall x(\varphi (x)\leftrightarrow x=\iota x\varphi (x))$ (if $\varphi \succ \{x\}$ ).

• $u\in s\to (\lambda x\in s.t)u=t\{u/x\}$ (if u is free for x in t).

• $Fun(f)\to (\langle x,y \rangle \in f\leftrightarrow y=f(x))$ , where $Fun(f)$ is the following absolute formula (which says that f is a function):
$$ \begin{align*}\forall z\in f\exists x\exists y(z=\langle x,y \rangle)\wedge\forall x\forall y_1\forall y_2( \langle x,y_1 \rangle\in f\wedge \langle x,y_2 \rangle\in f\to y_1=y_2).\end{align*} $$
5.2 Explicit definitions and the extended [OpI]
5.2.1 Explicit definitions
Explicit definitions of operations are particularly simple case of applying [OpI]:
Proposition 5.2. Let t be a term of $\mathbf {PW}$ such that $Fv(t)=\{v_1 ,\ldots \!,\! v_n\}$ . Then $\mathbf {PW}$ has an operation F such that $\vdash _{\mathbf {PW}}\forall v_1\cdots \forall v_n F(v_1 ,\ldots \!,\! v_n)=t$ .
Proof $F=F_\varphi $ , where $\varphi $ is $v_0=t$ .
5.2.2 The extended [OpI]
The use of [OpI] can be made more effective with the help of the following proposition: