Hostname: page-component-8448b6f56d-mp689 Total loading time: 0 Render date: 2024-04-20T03:46:24.490Z Has data issue: false hasContentIssue false

The Mathematical Analysis of Supreme Court Decisions: The Use and Abuse of Quantitative Methods

Published online by Cambridge University Press:  02 September 2013

Franklin M. Fisher
Affiliation:
Society of Fellows, Harvard University

Extract

In the last decade, more and more political scientists have speculated on the possible applications of mathematical analysis to political phenomena. It is the position of this paper that such discussion, when in the abstract, serves little purpose, for the question of whether or not quantitative techniques can fruitfully be so applied is essentially an empirical one and can only be resolved by experiment. Yet even a specifically experimental approach becomes challengeable if it can be shown to misunderstand and hence misemploy otherwise sound techniques. The claim to have solved problems whose mathematical features have not, in fact, been comprehended seems especially harmful in a field where the application of mathematics is as yet in its infancy, and this not only because minor impurities at the base of a growing framework may assume major proportions at its apex, but because exposure of error may breed unjustified disenchantment or give solace to those who prefer a casual, imprecise impressionism in the social sciences.

Type
Research Article
Copyright
Copyright © American Political Science Association 1958

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

2 For examples see especially the work of Herbert A. Simon. See also Downs, Anthony, An Economic Theory of Democracy (New York, 1957)Google Scholar; Shapley, L. S. and Shubik, Martin, “A Method for Evaluating the Distribution of Power in a Committee System,” this Review, Vol. 48 (Sept. 1954), pp. 787–92Google Scholar; and Arrow, Kenneth J., “Mathematical Models in the Social Sciences,” in Lerner, Daniel and Lasswell, Harold D., eds., The Policy Sciences (Stanford, 1951), pp. 129–54Google Scholar.

3 Pritchett, C. H., The Roosevelt Court (New York, 1948)Google Scholar.

4 Perhaps the ultimate expression of this has been provided in Lasswell, Harold D., “Current Studies in the Decision Process: Automation versus Creativity,” Western Political Quarterly, Vol. 8 (Sept. 1955), pp. 381–99CrossRefGoogle Scholar. Lasswell suggests (p. 398) that “the time is approaching when machines will be sufficiently well developed to make it practicable for trial runs to be carried out in which human decision-makers and robots are pitted against one another. When machines are more perfect a bench of judicial robots, for example, can be constructed.” We shall see here, however, that the time “when machines are more perfect” is likely to come only when the law in a particular area has become so settled that the construction of “a bench of judicial robots” is both trivial and unnecessary.

5 This Review, Vol. 51 (March, 1957), pp. 1–12.

6 Moore v. Michigan, 355 U. S. 155 (December 1957).

7 Examples are: youth of the defendant; no assistance of counsel at various times in the proceedings; coercion to plead guilty; crime involved subject to capital punishment; and so forth. A complete list of the factors used by Kort is given in Table 1 below. We discuss the problem of their identification in the next section.

8 In the case of factors which can appear more than once in a case we can choose either of two procedures. Either we can let the variable corresponding to such a factor take on a value for a particular case equal to the number of appearances of the factor in the case, or we can define a separate one-zero variable for each appearance—that is, treat each new appearance as a separate factor. Which treatment is chosen makes little difference. The second is most convenient for expository and the first for computational purposes.

9 Unless a case with no factors present is a pro, in which case K will be zero. This is trivial, however. See footnote 15 below.

10 This procedure is called “normalization.” It amounts to changing the scales on a graph by multiplying or dividing them by the same number. As in changing the scales on a graph, it is purely a convenience and doesn't change anything.

11 Strictly speaking, since a 1 or a 2 can be zero, the line might be vertical or horizontal. The important point is that it can't slope uphill. That this construction is indeed the geometrical equivalent of (2) in two dimensions may be seen as follows: For any point above the line (3), say (X 10, X 20), there must exist a point on the line, say (X 10, X 21) such that X 10X 11, and X 20X 21 with at least one inequality holding. If a 1 and a 2 are both positive, the left-hand side of (3) will be greater for (X 10, X 20) than for (X 11, X 21), that is, greater than unity. (If one of the a's, say a 1, is zero, the expression may equal unity, but then the first factor doesn't count anyway and two cases with equal X 2 are really identical.) Similar remarks hold, mutatis mutandis, for points below the line, and a similar situation obtains in the more general case of n factors.

12 The reader should satisfy himself that the following two statements are really equivalent to Condition 1 and hence to each other, in the sense that any one of the three Conditions implies the other two.

13 Note, however, that we do not require the existence of cases in which they are actually followed. Cases presenting such clear features are, of course, not likely to come before the Court. What we require is merely the weak condition that if such a case does arise it is decided in the obvious way. Our Conditions would be satisfied if no such cases ever arose in fact.

14 The reader should satisfy himself that these statements really are the interpretation of Condition 1 and its equivalents.

15 Figure 3b is the trivial situation mentioned in footnote 9 above where a case with no factors present is a pro. Here (2) does not apply, since division by zero is impossible. However, any set of positive weights will perfectly predict. The line drawn in the Figure illustrates one such set.

16 That is, it is impossible to draw the requisite hyperplane using only the results of fourteen cases. At best only fourteen weights could be determined. Kort is guessing at least twelve of the weights. We shall return to this below.

17 Although, of course, any weight we assigned would do as well. The point is that there is no information in the cases considered as to what the “true” weight should be.

18 I have, throughout this paper, ignored the fairly obvious point that the effects of the various factors in the minds of the justices are almost certainly not additive as Kort assumes. The fact that a defendant is young and illiterate does not simply add something to his chances; it also increases the weight given to the various procedural guarantees in question. Similarly for factors involving the seriousness of the punishment. However, Bince Theorem 1 ensures the existence of perfect linear (i.e., additive) predictore, this does not seem practically important. If a linear predictor always works as well as a non-linear one, it is difficult to find much meaning in the statement that the latter is correct. Besides, information about the relative importance of youth, illiteracy, previous experience in court, and the like in influencing the weight given to the various procedural guarantees can be inferred from the linear analysis, which is also much simpler to perform.

19 If the factor does not appear in a pro at all, the con in which it appears with the greatest number of votes for the original defendant is taken.

20 Kort, op. cit., p. 6.

21 It is indeed quite possible that Kort's procedure here provides a roughly correct way of telling which factors are most important. This surmise is not supported by the test given below, however.

22 Only Kort knows the truth here, but one suspects that he arrived at his formula after first trying other simpler ones (such as the second one given in the text) on the data. If the data used in such a procedure included the cases in the “test group” then that group in reality formed part of the “source.”

23 I regret that I could not be as generous to Dr. Kort in this paper as he has been to me in promptly and graciously providing me with a list of the appearances of the various factors in each case, as noted by him.

24 I here pass over a number of minor disagreements. I do not understand, for example, why Kort observes the factor “jurisdictional issue” only in Rice v. Olson, 324 U. S. 786, when White v. Ragen, 324 U. S. 760, was actually dismissed for want of jurisdiction.

25 See Mills, F. C., Statistical Methods (New York, 1955) 3d ed., pp. 311 ff.Google Scholar

26 The “backwards” weighting scheme, by the way, would misclassify at least two of the cases—not counting Canizio. Since a scheme that gives each of the factors equal weight does equally as well, this is not impressive. Again, if the Kort solution is correct, it ought to work equally well both ways.

27 A technical and detailed description of this method, which was first applied in taxonomy and has since found major uses in psychology and sociology, may be found in Tintner, G., Econometrics (New York, 1952), pp. 96102Google Scholar. The mechanics of the method are too complex to be dealt with here.

28 As we have seen, in samples such as that used by Kort, these will generally be the factors of most interest and importance. We have hence not bothered to perform the analysis on Kort's sample.

We add a technical point. If the sample is such that the appearances of two or more factors are not independent (e.g., if two factors always appear together), no inference as to the weights of all such factors will be possible because there will be more unknowns to find than independent equations to solve. In particular, to take Kort's factors as an example, the four “seriousness of punishment” factors are not all independent since one and only one of them must appear in any given case; the value of the variable corresponding to any one of them will equal one minus the sum of the values of the variables corresponding to the others. It is thus impossible to observe or infer the effects of any one of them, “other things being equal.” In cases like this, discriminant analysis is impossible without removing one of the factors (i.e., assigning it zero weight or combining it with some other factor). In the present example, the factor “crime subject to five or ten years imprisonment” should be given a zero weight and removed since every case has at least this much punishment (assuming that the more serious the punishment the higher the weight of the factor). See Suits, Daniel, “The Use of Dummy Variables in Regression Equations,” Journal of American Statistical Association, Vol. 52 (Dec. 1957), pp. 548–51CrossRefGoogle Scholar. I am indebted to John R. Meyer for this reference.

29 Note, however, that such violation of Condition 1 could always be found by simple inspection so that such a test provides merely an efficient check. The actual test is as follows: perform the analysis with the full set of factors, discard all factors with negative weights (such weights should be approximately zero) and recompute. Repeat this until only non-negative weights are obtained. (This should happen immediately.) If the weights thus found do not perfectly discriminate, then the Court has been inconsistent.