“Ecological” Inference: The Use of Aggregate Data to Study Individuals*

W. Phillips Shively

doi:10.2307/1955079

“Ecological” Inference: The Use of Aggregate Data to Study Individuals*

Published online by Cambridge University Press: 01 August 2014

W. Phillips Shively

Show author details

W. Phillips Shively*: Affiliation:
Yale University

Article contents

Extract
Footnotes
References

Get access

Rights & Permissions

Extract

Because they are inexpensive and easy to obtain, because they may be available under circumstances in which survey data are unavailable, and because they eliminate many of the measurement problems of survey research, data on geographic units such as counties or census tracts are often used by political scientists to measure individual behavior. This has involved us in the long-standing problem of inferring individual-level relationships from aggregate data, which was first raised by W. S. Robinson in the early nineteen fifties.

In this paper, I shall first discuss the problem raised by Robinson. I shall then review three partial solutions to the problem—the Duncan-Davis method of setting limits, Blalock's version of ecological regression, and Goodman's version of ecological regression. Finally, I shall propose some ways in which Goodman's method may be used so as to reduce the problem of bias in its estimates, and make it a more reasonable tool for reserch.

Our difficulty, as Robinson showed, is that we cannot necessarily infer the correlation between variables, taking people as the unit of analysis, on the basis of correlations between the same variables based on groups of people as units. For example, the “ecological” correlation between per cent black and per cent illiterate is +0.946, whereas the correlation between color and illiteracy among individuals is only+0.203.

Type: Research Article
Information: American Political Science Review , Volume 63 , Issue 4 , December 1969 , pp. 1183 - 1196

DOI: https://doi.org/10.2307/1955079 [Opens in a new window]
Copyright: Copyright © American Political Science Association 1969

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

The research from which this paper derives was made possible by assistance from the National Science Foundation and the graduate school of the University of Oregon. I am grateful to Robyn M. Dawes, William Keech, Gerald Kramer, Alden Lind, Trudi M. Lucas, and my wife Barbara, for their help and criticism. G. Wayne Peak and Robert Sternhell provided valuable assistance.

References

¹ Robinson, W. S., “Ecological Correlations and the Behavior of Individuals,” American Sociological Review, 15 (06, 1950), 351–357 CrossRef Google Scholar.

² Duncan, O. Dudley and Davis, Beverly, “An Alternative to Ecological Correlation,” American Sociological Review, 18 (12, 1953), 665–666 CrossRef Google Scholar; Blalock, Hubert M., Causal Inference in Non-Experimental Research (Chapel Hill: University of North Carolina Press, 1965), ch. 4Google Scholar; Goodman, Leo A., “Some Alternatives to Ecological Correlation,” American Journal of Sociology, 64 (05, 1959), 610–625 CrossRef Google Scholar. See also Duncan, , Cozzart, , and Duncan, , Statistical Georgaphy (Glencoe, Ill.: The Free Press, 1961), pp. 60–80 Google Scholar; Iversen, Gudmund R., Estimation of Cell Entries in Contingency Tables When Only Margins Are Observed (Ph.D. Dissertation, Department of Statistics, Harvard University, 1969)Google Scholar, and Donald E. Stokes, “Ecological Regression as a Game With Nature” (unpublished ms.).

³ Among others, Menzel, Herbert, “Comment on Robinson's ‘Ecological Correlations and the Behavior of Individuals,’” American Sociological Review, 15 (10, 1950), 674–675 Google Scholar, and Ranney, Austin, “The Utility and Limitations of Aggregate Data in the Study of Electoral Behavior,” in Ranney, Austin (ed.), Essays on the Behavioral Study of Politics (Urbana, Ill.: University of Illinois Press, 1962), pp. 91–102 Google Scholar.

⁴ Studies using community-level variables, but not requiring ecological inferences, are too numerous to list. Two examples are Matthews, Donald R. and Prothro, James W., Negroes and the New Southern Politics, (New York: Harcourt, Brace, & World, 1966), pp. 101–175 Google Scholar; and Crain, Robert L. and Rosenthal, Donald B., “Community Status as a Dimension of Local Decision-Making,” American Sociological Review, 32 (12, 1967), 970–984 CrossRef Google Scholar PubMed. Some recent studies which use aggregate correlations to infer individual-level relationships are MacRae, Duncan Jr., and Meldrum, James A., “Critical Elections in Illinois: 1888–1958,” this Review, 54 (09, 1960), 669–684 Google Scholar; Wilson, James Q. and Banfield, Edward C., “Public-Regardingness as a Value Premise in Voting Behavior,” this Review, 58 (12, 1964), 876–888 Google Scholar; Pomper, Gerald, “Classification of Presidential Elections,” Journal of Politics, 29 (08, 1967), 535–567 CrossRef Google Scholar; and O'Lessker, Karl, “Who Voted For Hitler?”, American Journal of Sociology, 74 (07, 1968), 63–69 CrossRef Google Scholar.

⁵ Ranney, op. cit., 99–101. Pomper, for example, uses this argument to help justify his use of ecological correlations. Pomper, op cit., 556.

⁶ Ranney, op. cit., 100.

⁷ The examples which Blalock uses in his chapter on aggregation are all of interval measure for the units which he groups together.

⁸ Goodman, op. cit.

⁹ For the sake of simplicity, I shall present Goodman's method as applied to two dichotomous variables. Since dichotomies can be treated as interval variables, Blalock's methods could also be used for two such variables, though the results would not be as simple conceptually as those from Goodman's method. But Goodman can be easily extended to non-dichotomous variables, which is not true of Blalock. To avoid creating the impression that there is a mysterious difference between Goodman's and Blalock's forms of ecological regression, I should point out that Goodman's version is equivalent to using Blalock's version with dummy variables, then transforming the resulting slopes back into proportions.

¹⁰ Goodman, op. cit., 612.

¹¹ Ibid., 612.

¹² Under a probabilistic grouping of individuals, the district frequencies would not be fixed precisely, but their average, or expected, frequencies would be. The example I present uses precise fixing of frequencies for convenience.

¹³ In this section I shall deal only with ways to reduce bias in Goodman's ecological regression. The suggestions I make should apply equally to Blalock's version, but I have not developed them along those lines here, since we only rarely use aggregate data to get at interval individual-level variables.

¹⁴ The problem of estimates which are higher than one hundred per cent or less than zero per cent has been dealt with elsewhere by Telser: Telser, L. G., “Least-Squares Estimation of Transition Probabilities,” in Christ, C. F., et. al., Measurement in Economics (Stanford, Calif.: Stanford University Press, 1963.)Google Scholar He is concerned with the problem when it is due to predictable estimation error in estimating parameters which fall near the extremes. His solution is to change the estimation procedure so that it cannot produce results of greater than one hundred or less than zero. This strikes me as a rather trivial problem, since an error of ten percentage points in estimating a parameter is just as harmful if it stays within the “respectable” 0–100 bounds as it is if it happens to make the researcher look silly by straying outside those bounds. I suspect, however, that the frequency with which one encounters these unnatural results in using Goodman's technique is due to a more serious problem than that of estimation error—violation of the assumption that data have not been grouped by the dependent variable.

¹⁵ P′ times the total number of Y ₁ individuals in the population tells us how many individuals in the population are X ₁ and Y ₁. If we divide this figure by the total number of X ¹ individuals in the population, we have the proportion of X ₁'s that are Y₁

¹⁶ Note that the direct and indirect estimates of P may differ as a result of sampling error, as well as because of differing magnitudes of bias. This is why I stated above that by comparing the two estimates we can pick the one which probably incorporates less bias. In fact, if there were no bias in either estimate of P, and we consistently chose the lower of the two, we would ourselves cause a slight downward bias in the estimation procedure. Accordingly, if there are only slight differences between the estimates, it may be better to take the average of the two estimates, rather than pick the one which (presumably) incorporates less bias.

¹⁷ Assuming that any relationship between X ₁ and P is monotonic. See the Note at the end of this Appendix.

Submit a response

Comments

No Comments have been published for this article.

Article contents

“Ecological” Inference: The Use of Aggregate Data to Study Individuals*

Extract

Access options

Footnotes

References

Comments

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests