That Does Not Compute: David Lewis on Credence and Chance

Gordon Belot

doi:10.1017/psa.2023.11

That Does Not Compute: David Lewis on Credence and Chance

Published online by Cambridge University Press: 16 February 2023

Gordon Belot

Show author details

Gordon Belot*: Affiliation:
Department of Philosophy, University of Michigan, Ann Arbor, MI, USA
*: Email: belot@umich.edu

Article contents

Abstract
Introduction
Mathematical framework
Worlds
Chance and credence
The Best System Account
Trouble
Options
Footnotes
References

Rights & Permissions

Abstract

Like Lewis, many philosophers hold reductionist accounts of chance (on which claims about chance are to be understood as claims that certain patterns of events are instantiated) and maintain that rationality requires that credence should defer to chance (in the sense that under certain circumstances one’s credence in an event must coincide with the chance of that event). It is a shortcoming of an account of chance if it implies that this norm of rationality is unsatisfiable by computable agents. This shortcoming is more common than one might have hoped.

Type: Contributed Paper
Information: Philosophy of Science , Volume 90 , Issue 5 , December 2023 , pp. 1130 - 1139

DOI: https://doi.org/10.1017/psa.2023.11 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2023. Published by Cambridge University Press on behalf of the Philosophy of Science Association

1. Introduction

David Lewis was a reductionist about chance.^{Footnote 1} He thought that a full description of the pattern of instantiation of fundamental properties and relations at a possible world would not mention any facts about chances. But Lewis suggested that our actual cognitive abilities and limitations determine an ideal scientific theory for each possible world—the theory that best balances (by our standards) simplicity with informativeness about the pattern of events at that world.^{Footnote 2} On his Best System Account of chance the true claims about chance at a world are those that are implied by the ideal theory for that world.

Lewis also had an account of the relation between chance and rational credence: the Principal Principle.^{Footnote 3} Our primary interest here will be in a consequence of the Principal Principle: A credence function is rationally permitted only if, for any event $E$ when conditionalized on the chance of $E$ being $x$ , $c$ assigns credence $x$ to $E$ .

For Lewis, the Principal Principle tells us something about what it means to be rational and at the same time encapsulates important truths about the concept of chance.^{Footnote 4} I suggest the following as an example. The Principal Principle gives us possible grounds for criticizing reductionist accounts of chance: If we are thinking of theories of chance as something like summaries meant to be used by beings like us, there is something pathological about a theory of chance relative to which no priors count as rationally permitted.

I aim here to explain why it is harder than one might expect to find a satisfying package combining the Best System Account of chance and the Principal Principle.^{Footnote 5} One can show that for a certain prima facie attractive version of the Best System Account of chance, the only priors that satisfy the Principal Principle are non-computable. So fans of the Lewisian package must either find a more suitable version of the Best System Account, weaken the Principal Principle, or maintain that rationality requires us to perform tasks beyond the capability of any Turing machine.^{Footnote 6}

2. Mathematical framework

A binary sequence is a map $S:{\mathbb{N}^ + } \to \left\{ {0,1} \right\}$ . We denote by ${2^\omega }$ the set of all binary sequences. For any binary sequence $S$ and natural number $n$ , we write $S{\upharpoonright} n$ for the binary string consisting of the first $n$ bits of $S$ . If $\tau $ is a binary string, then we denote by ${[[}{ \tau} {]]}$ the set of all binary sequences whose initial bits are given by $\tau $ . We call such ${[[} \tau {]]}$ the basic subsets of ${2^\omega }$ .

We will call a subset of ${2^\omega }$ an event if it can be constructed out of basic subsets by taking complements and unions a countable number of times in any order.^{Footnote 7} A probability measure on ${2^\omega }$ is a map $\nu $ that assigns numbers between 0 and 1 (inclusive) to events, that assigns 1 to ${2^\omega }$ , and that is countably additive (i.e., if ${E_1},{E_2}, \ldots $ are pairwise disjoint events, then $\nu \left( { \cup {E_k}} \right) = \sum \nu \left( {{E_k}} \right)$ ). Any such map is completely determined by its behavior on basic subsets.

A binary sequence $S$ is computable if there is a Turing machine that on input of any $n \in \mathbb{N}$ gives as output $S{\upharpoonright}n$ . A real number $x$ is computable if there exists a Turing machine that on input $n$ returns a rational number within $1/n$ of $x$ . A probability measure $\nu $ on ${2^\omega }$ is computable if there exists a Turing machine that on input of $n$ and binary string $\tau $ returns a rational number within $1/n$ of $\nu ( [[{ \tau ]]})$ .

If $S$ is a binary sequence then the Dirac measure ${\delta _S}$ concentrated on $S$ is the probability measure that assigns probability 1 to event $E$ if $S \in E$ and assigns probability 0 to $E$ otherwise. A Dirac measure is computable if and only if the sequence it is concentrated on is.

If $r$ is a number (strictly) between zero and one, then the Bernoulli measure ${\nu _r}$ with parameter $r$ is the probability measure that, for any binary string $\tau $ containing $\ell $ 0s and $m$ 1s, assigns the basic subset $[[{ \tau} ]]$ probability ${r^\ell }{(1 - r)^m}$ . We call ${\nu _{.5}}$ the fair coin measure. A Bernoulli measure is computable if and only if its parameter is.

3. Worlds

We will be concerned with simple worlds whose histories can be encoded in binary sequences: At each such world, time has the structure of the natural numbers and the history of such a world just tells us, for each time, whether or not a single property is instantiated at that time. We work with these worlds in order to have a precise and tractable framework in which to interpret Lewis’s ideas about chance and credence. The lessons we learn can be translated to richer settings (our “possible worlds” could always be interpreted as subsystems of more interesting worlds).

A probability measure $\nu $ on ${2^\omega }$ can be thought of as an instruction manual for building a world. Nature is equipped with a set of coins with each possible bias. A probability measure $\nu $ on ${2^\omega }$ tells Nature to flip a coin with bias $\nu \left( {[[} 0{]]} \right)$ to determine whether the first bit is a 0. More generally, $\nu $ tells nature to flip a coin of bias $\nu ({[[}{\tau 0}{]]}|{[[} {\tau {]]}})$ to determine whether the next bit will be 0, given that $\tau $ gives the history so far.^{Footnote 8} A Dirac measure ${\delta _S}$ instructs Nature to construct a history by flipping coins of maximal bias in a way guaranteed to generate $S$ . A Bernoulli measure ${\nu _r}$ tells Nature to generate each bit by flipping a coin of bias $r$ .

4. Chance and credence

Reductionism about chance is the thesis that chance-y facts supervene on non-chance-y facts. So in our context, a reductionist account of chance can be encoded in a map from ${2^\omega }$ to the space of probability measures over ${2^\omega }$ (this map will be merely partially defined if there are lawless worlds without chance facts). We call a probability measure $\lambda $ a chance law of such an account if the relevant map assigns it to some world.

For present purposes, a Bayesian prior is a probability measure on ${2^\omega }$ that assigns positive probability to each basic subset of ${2^\omega }$ .^{Footnote 9}

It has seemed to many that rationality requires credence to defer to chance in a certain sense: In some situations, if you are rational, then your credence in an event must coincide with the chance of that event.

Knowing only that the chance of drawing a red ball from an urn is 0.95, everyone agrees, in accordance with the law of likelihood, that a guess of “red” about some trial is much better supported than one of “not-red.” But nearly everyone will go further, and agree that 0.95 is a good measure of the degree to which “red” is supported by the limited data. (Hacking Reference Hacking1965, 136)

[T]he chancemaking pattern in the arrangement of qualities must be something that would, if known, correspondingly constrain rational credence. Whatever makes it true that the chance of decay is 50% must also, if known, make it rational to believe to degree 50% that decay will occur. (Lewis Reference Lewis1994, 478)

Fix a reductionist account of chance (such as the Best System Account). Corresponding to any chance law $\lambda $ of the theory, there is the set of worlds ${\rm{\Lambda }}$ at which $\lambda $ gives the chance facts. We call this set the chance hypothesis corresponding to that law. This is the chancemaking proposition: the proposition that the chance facts are given by $\lambda $ .

Definition 1 (Chance-Credence Principle). Relative to a given reductionist account of chance, a prior $\mu $ on ${2^\omega }$ satisfies the Chance-Credence Principle if, for any chance law $\lambda $ with chance hypothesis ${\rm{\Lambda }}$ and for any event $A$ , we have $\mu (A|{\rm{\Lambda }}) = \lambda \left( A \right)$ .

The requirement that rational priors should satisfy the Chance-Credence Principle for a given reductionist theory of chance is a weak form of Lewis’s Principal Principle.^{Footnote 10}

A prior $\mu $ satisfies the Chance-Credence Principle for a given reductionist theory of chance if and only if: (i) the theory of chance admits only countably many chance laws ${\lambda _1},{\lambda _2}, \ldots $ (with chance hypotheses ${{\rm{\Lambda }}_1},{{\rm{\Lambda }}_2}, \ldots $ ); (ii) each chance law ${\lambda _k}$ of the theory is proper, in the sense that ${\lambda _k}\left( {{{\rm{\Lambda }}_k}} \right) = 1$ ; and (iii) $\mu $ can be written as a weighted sum of probability measures in which each ${\lambda _k}$ appears with non-zero weight (and in which any other summands are concentrated on the set of lawless worlds).

5. The Best System Account

Lewis (Reference Lewis1994, ${\rm{\S}}$ 4) tells us a bit about the map that encodes his favored reductionist theory of chance. He works in the context of finite worlds that can be encoded in binary strings. In this context, under a straightforward frequentist approach a world $\sigma $ would be assigned a Bernoulli measure ${\nu _r}$ as its chance law if and only if $r$ gives the relative frequency of 0s at $\sigma $ . Lewis emphasizes that his Best System Account departs in two ways from this sort of frequentism.

Some worlds will not be assigned a Bernoulli measure as their Best-System chance law: some worlds exhibit patterns that render it natural to think of Nature as generating bits by following instructions that call for a coin of bias ${r_1}$ to be flipped to generate the first bit; a coin of bias ${r_2}$ to be flipped to generate the second bit; and so on. As an extreme case, Lewis mentions a world where history alternates between 0 and 1, which is naturally thought of as generated by alternating between flipping coins with maximal bias in favor of 0 and 1—in other words, the chance facts there are encoded in a Dirac measure.
On the other hand, Lewis thinks that in some cases the chance law at a world should be a Bernoulli measure ${\nu _r}$ even though $r$ does not give the relative frequency of 0s at that world (e.g., in a case where the relative frequency of 0s at a world is given by a messy number ${r^{\rm{*}}}$ close to $.5$ then the fair coin measure may provide a better candidate than ${\nu _{{r^{\rm{*}}}}}$ to be the Best-System chance law of the world—although of course for long-lived worlds, the chance of getting a relative frequency of 0s differing even a little from $.5$ by tossing a fair coin will be minuscule).

How does this look when transposed to the setting of worlds whose histories are coded in infinite binary sequences? It is helpful to note two important features of the infinite context.

The idea behind the Best System Approach is that the chance facts at a world are given by the best scientific summary of that world—where best means something like best by our ordinary scientific standards. It is crucial to this picture, then, that the systems in competition be finitarily specifiable objects—otherwise, the question of which of them is best according to ordinary scientific standards loses all sense. The specification of an arbitrary real number is an infinitary task—requiring, in general, the specification of infinitely many bits. So the specification of an arbitrary probability measure on ${2^\omega }$ is infinitary twice over—requiring, in general, the assignment of a real number to each of the infinitely many binary strings. Now, as Turing (Reference Turing1936, 236) tells us: “The ‘computable’ numbers may be described briefly as the real numbers whose expressions as a decimal are calculable by finite means.” In the same way, the finitarily specifiable probability measures are the computable probability measures. So in the infinite context, only computable measures should be eligible candidates to be chance laws under the Best System Approach.
In this setting there is a natural way to make precise the notion of a random-looking sequence: the notion of a Martin–Löf random sequence.^{Footnote 11} If $S$ is a binary sequence and $\mu $ is a computable probability measure, then, roughly speaking, $S$ is $\mu $ -Martin–Löf random just in the case that $S$ exhibits no finitarily specifiable patterns of behavior that would be arbitrarily surprising to an agent expecting to see a data stream generated by $\mu $ .^{Footnote 12}

Example 1 If a computable Bernoulli measure ${\nu _r}$ considers a binary sequence $S$ to be Martin–Löf random, then: $S$ is non-computable; the relative frequency of 0s in $S$ is $r$ ; each finite string occurs infinitely often in $S$ (and with the right relative frequency); as one looks at longer and longer initial segments, one finds that the relative frequency of 0s is greater than $r$ infinitely often (and less than $r$ infinitely often); and, further, $S$ satisfies every other effectively specifiable criterion for a data stream to be sampled from ${\nu _r}$ .

I claim that any plausible version of the Best System Analysis should conform to the following constraints.

BS1 If $S$ is assigned chance law $\lambda ,$ then $S$ is Martin–Löf random relative to $\lambda $ .

BS2 Each computable sequence is assigned a law of chance.

BS3 A sequence is assigned a computable Bernoulli measure ${\nu _r}$ as its law of chance if and only if it is ${\nu _r}$ -Martin–Löf random.

Remark 1 (On BS1). Let $S$ be a sequence and $\lambda $ a computable measure. If $S$ is not $\lambda $ -Martin–Löf random then a computable agent whose data consists of sufficiently long initial segments of $S$ will eventually be in a position to reject at any level of significance the hypothesis that the data is being sampled from $\lambda $ —which would appear to rule out the possibility that the Best System Account ratifies $\lambda $ as the chance law for $S$ . Note that each computable probability measure assigns measure one to the set of sequences that it considers Martin–Löf random.^{Footnote 13}

Remark 2 (On BS2). Let $S$ be computable. Then ${\delta _S}$ is prima facie a very good candidate to be a chance law for $S$ : $S$ is ${\delta _S}$ -Martin–Löf random; ${\delta _S}$ , being computable, is only finitely complex; at the same time, it gives a maximally informative characterization of the infinite object $S$ . It may be that some simpler measure achieves a better balance of simplicity and informativeness (maybe $S$ begins with a long string that looks like it was generated by flipping a fair coin before settling down to a regular pattern).^{Footnote 14} So perhaps ${\delta _S}$ is not the Best-System chance law of $S$ —but given that ${\delta _S}$ is available as an option, it is implausible that our ordinary standards would decree that $S$ has no chance law whatsoever. Note that it then follows via BS1 that the chance law $\lambda $ of a computable sequence $S$ must satisfy $\lambda \left( {\left\{ S \right\}} \right) \gt 0$ .^{Footnote 15}

Remark 3 (On BS3). Suppose that $S$ is Martin–Löf random relative to a computable Bernoulli measure ${\nu _r}$ . Then there is a sense in which ${\nu _r}$ provides a maximally simple theory of $S$ : If $S$ is also Martin–Löf random relative to $\mu $ , then, as $\mu $ is conditionalized on longer and longer initial segments of $S$ , the probabilities that it gives for the next bit to be 0 must converge to $r$ .^{Footnote 16} It is easy, however, to generate rivals to ${\nu _r}$ that are in a sense more informative than it: If $\tau $ is any initial segment of $S$ , then $S$ will be Martin–Löf random relative to the measure $\nu _r^\tau $ that says that the initial bits of history are given by $\tau $ and the rest generated by tossing a coin of bias $r$ . So, intuitively, if $S$ begins in some striking way—say with a hundred consecutive 0s—but otherwise looks like it was generated by flipping a fair coin, then a measure of the form $\nu _{.5}^\tau $ would offer a better balance of informativeness and simplicity than ${\nu _{.5}}$ does. But I think that this intuition is an artifact of our habitual attention to small data sets. Maybe if the first three tosses of a coin come up Heads, we are a bit suspicious that the coin is not fair—but this suspicion is washed away when we see enough data to be reassured that each three-bit string occurs equally often (and in no special pattern). We should have the same reaction to a random-looking binary sequence that begins with a hundred 0s; by the time you have seen the billionth block of a billion consecutive 0s, that initial hundred will not look impressive anymore—because you will have seen that each one-hundred-bit string occurs (essentially) equally often (and in no special pattern).

6. Trouble

BS1–BS3 and the Chance-Credence Principle are mutually consistent. Consider the version of the Best System Account in which each computable sequence is assigned the corresponding Dirac measure as its chance law, each non-computable sequence is assigned a computable Bernoulli measure as its chance law if and only if it is Martin–Löf random with respect to it, and all other sequences are lawless. Then any prior that can be written as a (non-trivial) weighted sum of each of the countably many computable Dirac measures and computable Bernoulli measures will satisfy the Chance-Credence Principle.

But no computable prior can satisfy the Chance-Credence Principle for any reductionist account of chance obeying BS1 and BS2. As noted above, BS1 and BS2 imply that each computable sequence $S$ is assigned a chance law that assigns $\left\{ S \right\}$ positive probability. It follows that if a prior $\mu $ satisfies the Chance-Credence Principle for an account of chance obeying BS1 and BS2, then it must be possible to write $\mu $ as a weighted sum of probability measures in which each computable Dirac measure appears with positive weight. We will show that no such prior can be computable.^{Footnote 17} Recall that $S{\upharpoonright}k$ is the binary string consisting of the first $k$ bits of $S$ . So $\mu ({[[} S{\upharpoonright}\left( {n + 1} \right) {]]}|{[[}S {{\upharpoonright}n} {]]})$ is the probability that $\mu $ gives to seeing the next bit of $S$ after having been shown the first $n$ bits of $S$ . Let us say that prior $\mu $ learns sequence $S$ if, after seeing sufficiently long initial segments of $S$ , $\mu $ is always able to correctly predict the next bit with credence exceeding some fixed cut-off.^{Footnote 18} Consider any prior $\mu $ that can be written in the form $\mu : = \nu + c \cdot {\delta _S}$ , where $S$ is a binary sequence, $c \gt 0$ , and $\nu $ is a measure that considers $\left\{ S \right\}$ a nullset. Since $\nu \left( {\left\{ S \right\}} \right) = 0$ , we can make $\nu \left( {[[} {S{\upharpoonright}n} {]]} \right)$ —and hence also $\nu \left( {[[} {S{\upharpoonright}\left( {n + 1} \right)} {]]}\right)$ —as small as we like by choosing $n$ sufficiently large.^{Footnote 19} It follows that $\mu $ learns $S$ . So, in particular, if $\mu $ satisfies the Chance-Credence Principle for a Best System Account obeying BS1 and BS2, then $\mu $ must learn each computable sequence. But given any computable prior $\mu $ , we can construct a computable binary sequence $S_\mu ^\ast $ that $\mu $ does not learn: $S_\mu ^\ast $ consists of blocks of 0s separated from each other by individual 1s; a 1 is called for whenever $\mu $ has just seen at least one 0 and thinks that it is more likely than not that the next bit will be a 0. So no computable prior can satisfy the Chance-Credence Principle for a reductionist account of chance obeying BS1 and BS1.

A variant of this argument shows that no computable probability measure can be written as a weighted sum of probability measures in which each computable Bernoulli measure appears with non-zero weight—from which it follows that no computable prior can satisfy the Chance-Credence Principle for any reductionist account of chance obeying BS3.

7. Options

Roughly speaking, the problem that we have run into is that on Lewis’s Best System Account, the chance laws at a world are just good scientific summaries of the patterns of events at that world; and his Principal Principle implies that if you are rational, then to the extent that you are confident in a chance hypothesis it should guide your estimates of probability. Taken together, the package seems to imply that in order to be rational, you need to be a certain sort of universal learner: Whatever the chance law at your world, if you are a good scientist and you see enough data, you should become confident in something like that law and so you should eventually mimic it in your estimates of probability. As Lewis (Reference Lewis1986, 121) puts it: “if we start with a reasonable initial credence function and do enough feasible investigation, we may expect our credences to converge to the chances.” But this is a setting in which no computable universal learner can exist.

I do not think we should rest content with an account of chance and credence that tells us that rationality requires us to adopt a non-computable credence function—any more than we would be satisfied with an account of rationality that required rational agents to be able to solve the halting problem. So I think we ought to be interested in revising some part of the framework used above, rejecting one or more of BS1–BS3, or weakening the Chance-Credence Principle. I end with a few observations about these options.

(1) Would countenancing rational credal states represented by merely finitely additive probability measures help? No. For suppose that a prior $M$ of this kind exists such that: (i) for any computable $S$ , $M\left( {\left\{ S \right\}} \right) \gt 0$ ; and (ii) $M\left( {[[}\tau {]]} \right)$ is defined for each binary string $\tau $ . Then there must exist a countably additive $\mu $ such that $\mu \left( {[[} \tau {]]} \right) = M\left( {[[} \tau {]]} \right)$ for each string $\tau $ , and such that $\mu \left( {\left\{ S \right\}} \right) \gt 0$ for each computable sequence $S$ .^{Footnote 20} Our diagonalization argument above then tells us that such an $M$ cannot be computable even in the sense that its restriction to basic sets is computable.
(2) Our Chance-Credence Principle requires that each law of chance $\lambda $ be proper, in the sense that it assigns probability one to its own chance hypothesis ${\rm{\Lambda }}$ . We could relax this restriction to allow $\lambda \left( {\rm{\Lambda }} \right) \in \left( {0,1} \right]$ . In that case it would be natural to replace the Chance-Credence Principle by the following.^{Footnote 21}

Definition 2 (New Chance-Credence Principle). Relative to a given reductionist account of chance, a prior $\mu $ on ${2^\omega }$ satisfies the New Chance-Credence Principle if, for any chance law $\lambda $ with chance hypothesis ${\rm{\Lambda }}$ and for any event $A$ , we have $\mu (A|{\rm{\Lambda }}) = \lambda (A|{\rm{\Lambda }})$ .

But this turns out to be of no help with the problem we have run into above concerning accounts of chance satisfying BS1 and BS2.

(3) Some philosophers amend the Principal Principle to require priors to satisfy substantive conditions relative to chance laws only if they assign the corresponding chance hypotheses positive credence.^{Footnote 22} I reject such approaches on the grounds that the amended principle is too weak to embody the requirement of learnability of chances by rational priors that Lewis, rightly in my mind, built into the Principal Principle (consider, e.g., a prior concentrated on the lawless worlds of a reductionist account of chance).
(4) We could replace BS2 with the assumption that the set of computable sequences to be assigned laws of chance is a set that is learnable in sense considered above (and proceed analogously with BS3). There is a sense in which the sets of so-learnable sequences are those that can be computed quickly.^{Footnote 23} So in making this move, there is a sense in which the computable sequences that we deem lawless are ones that exceed a certain complexity cut-off—which might seem consonant with the spirit of the Best System Approach. Maybe. But there is always going to be a worry about arbitrariness here: If a set of computable binary sequences is learnable in the relevant sense, then so is the result of enlarging that set by adding any finite set of computable binary sequences to it; and if two sets of computable sequences are learnable in this sense, then so is their union.
(5) There are many notions of learning that one might substitute for the one implicit in the Principal Principle—and thereby construct weaker alternatives to the Chance-Credence Principle that may comport better with computability. One possibility would be to require that as a rational prior is conditionalized on longer and longer initial segments of a (typical) sequence associated with a given chance law, its probability estimates for the next bit converge to those given by that chance law (conditional on the same data).^{Footnote 24} An interesting feature of this requirement is that it is consistent with BS3: the so-called indifference (or Bayes–Laplace) prior meets the requisite condition with respect to each computable Bernoulli measure. But problems remain with BS1 and BS2: A family of Dirac measures is learnable in this weaker sense if and only if it is learnable in the sense relevant to the Chance-Credence Principle.

The moral is: work remains to be done for anyone who is attracted to Lewis’s accounts of chance and credence and who is also inclined to take computability to be a constraint on rational priors.

Acknowledgments

Previous versions of the paper were presented at NYU, CMU, and the PSA. For helpful correspondence and suggestions, thanks very much to Dave Baker, Cian Dorr, John Earman, Dmitri Gallow, Kevin Kelly, Alex Meehan, Krzysztof Mierzewski, Richard Pettigrew, Laura Ruetsche, Tom Sterkenburg, Francesca Zaffora Blando, Snow Zhang, and, especially, Chris Porter.

Footnotes

¹ See Lewis (Reference Lewis1986, Reference Lewis1994).

² Following Loewer (Reference Loewer2004), I will speak of informativeness where Lewis speaks of strength and fit.

³ See again Lewis (Reference Lewis1986, Reference Lewis1994).

⁴ In addition to Lewis (Reference Lewis1986, Reference Lewis1994), see Letters 659 and 660 in Beebee and Fisher (Reference Beebee and Fisher2021).

⁵ For (all too) full details, see Belot (Reference Belot2023).

⁶ It is convenient here to focus on the Lewisian package because of its clarity and precision, but it is far from being the only account of chance, credence, and their relation that faces this sort of problem.

⁷ For this (transfinite) construction, see, for example, Folland (Reference Folland1999, ${\rm{\S}}$ 1.6). The basic subsets generate the product topology on ${2^\omega }.$ The family of events is the corresponding $\sigma $ -algebra of Borel sets.

⁸ So a probability measure on ${2^\omega }$ gives the complete set of what Lewis (Reference Lewis1986, 97) calls “history to chance conditionals”—and hence gives what he calls “the complete theory of chance” for a world.

⁹ So we are forbidding rational agents to be certain a priori that any particular finite data set will never arise. Note that the set of binary strings is countable.

¹⁰ On this point see Lewis (Reference Lewis1986, 276f.) and Pettigrew (Reference Pettigrew2012, fn. 4).

¹¹ For details, see, for example, Shen et al. (Reference Shen, Uspensky and Vereshchagin2017, Chapter 3). Here is the rough idea: A sequence is not Martin–Löf random relative to a given computable measure $\mu $ just in the case that there is an effectively definable $\mu $ -nullset of ${2^\omega }$ to which it belongs.

¹² If $\mu = {10^{ - 6}}{\delta _{\bar 0}} + \left( {1 - {{10}^{ - 6}}} \right){\delta _{\bar 1}}$ (where ${\bar 0}$ is the all-0s sequence and ${\bar 1}$ is the all-1s sequence), then an agent expecting to see data sampled from $\mu $ will be very surprised to see a data set beginning with an initial 0—but no more surprised to see an all-0 data set of any larger size. An agent expecting to see data sampled from the fair coin measure would not be very surprised by an initial 0—but would be arbitrarily surprised by arbitrarily large all-0 data sets.

¹³ Since there are only countably many Turing machines, there can be only countably many effective $\mu $ -nullsets—and the union of countably many $\mu $ -nullsets is a $\mu $ -nullset.

¹⁴ Thanks to Cian Dorr for emphasizing this point to me.

¹⁵ Since $S$ is computable, $\left\{ S \right\}$ is an effectively definable $\lambda $ -nullset if it is a $\lambda $ -nullset.

¹⁶ See Vovk (Reference Vovk1987, Theorem 3).

¹⁷ The following is a variant of the diagonalization argument of Putnam (Reference Putnam1963).

¹⁸ For what follows, we need to choose a computable cut-off $C$ such that there is no string $\tau $ with $\mu \left( {[[} \tau {]]} \right) = C$ . This is always possible (otherwise we could use $\mu $ to show that the computable sequences are uniformly computable—which they are not).

¹⁹ Because probability measures are continuous from above—see, for example, Folland (Reference Folland1999, Theorem 1.8).

²⁰ See Propositions 3.2.7 and 3.3.1 of Bhaskara Rao and Bhaskara Rao (Reference Bhaskara Rao and Bhaskara Rao1983).

²¹ This stands to the New Principle of Hall (Reference Hall1994) and Lewis (Reference Lewis1994) as the Chance-Credence Principle stands to the Principal Principle of Lewis (Reference Lewis1986). For another direction in which one might generalize the Principal Principle in the face of improper laws of chance, see Roberts (Reference Roberts2001) and Ismael (Reference Ismael2008). For critical discussion, see Briggs (Reference Briggs2009).

²² For this approach, see, for example, Pettigrew (Reference Pettigrew2012).

²³ See Blum and Blum (Reference Blum and Blum1975, 127).

²⁴ For this notion see Kalai and Lehrer (Reference Kalai and Lehrer1994).

References

Beebee, Helen and Fisher, A. R. J., eds. 2021. Philosophical Letters of David Lewis, vol. 1. Oxford: Oxford University Press.Google Scholar

Belot, Gordon. 2023. “Unprincipled.” To appear, The Review of Symbolic Logic.Google Scholar

Bhaskara Rao, K. P. S. and Bhaskara Rao, M.. 1983. Theory of Charges: A Study of Finitely Additive Measures. London: Academic Press.Google Scholar

Blum, Lenore and Blum, Manuel. 1975. “Toward a Mathematical Theory of Inductive Inference.” Information and Control 28 (2):125–55.Google Scholar

Briggs, R. A. 2009. “The Anatomy of the Big Bad Bug.” Noûs 43 (3):428–49.Google Scholar

Folland, Gerald. 1999. Real Analysis: Modern Techniques and Their Applications. 2nd ed. New York: Wiley.Google Scholar

Hacking, Ian. 1965. Logic of Statistical Inference. Cambridge: Cambridge University Press.Google Scholar

Hall, Ned. 1994. “Correcting the Guide to Objective Chance.” Mind 103 (412):505–17.Google Scholar

Ismael, Jenann. 2008. “Raid! Dissolving the Big, Bad Bug.” Noûs 42 (2):292–307.Google Scholar

Kalai, Ehud and Lehrer, Ehud. 1994. “Weak and Strong Merging of Opinions.” Journal of Mathematical Economics 23 (1):73–86.Google Scholar

Lewis, David. 1986. “A Subjectivist’s Guide to Objective Chance.” In Philosophical Papers, vol. II, 83–132. Oxford: Oxford University Press.Google Scholar

Lewis, David. 1994. “Humean Supervenience Debugged.” Mind 103 (412):473–90.Google Scholar

Loewer, Barry. 2004. “David Lewis’s Humean Theory of Objective Chance.” Philosophy of Science 71 (5):1115–25.Google Scholar

Pettigrew, Richard. 2012. “Accuracy, Chance, and the Principal Principle.” Philosophical Review 121 (2):241–75.Google Scholar

Putnam, Hilary. 1963. Probability and Confirmation. Washington, DC: United States Information Agency.Google Scholar

Roberts, John. 2001. “Undermining Undermined: Why Humean Supervenience Never Needed to be Debugged (Even If It’s a Necessary Truth).” Philosophy of Science 68 (Proceedings):S98–S108.Google Scholar

Shen, Alexander, Uspensky, Vladimir, and Vereshchagin, Nikolay. 2017. Kolmogorov Complexity and Algorithmic Randomness. Providence, RI: American Mathematical Society.10.1090/surv/220CrossRef Google Scholar

Turing, Alan. 1936. “On Computable Numbers, with an Application to the Entscheidungsproblem. ” Proceedings of the London Mathematical Society 42 (1):230–65.Google Scholar

Vovk, Vladimir. 1987. “On a Criterion for Randomness.” Soviet Mathematics Doklady 35:656–60.Google Scholar

Article contents

That Does Not Compute: David Lewis on Credence and Chance

Abstract

1. Introduction

2. Mathematical framework

3. Worlds

4. Chance and credence

5. The Best System Account

6. Trouble

7. Options

Acknowledgments

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests