Hostname: page-component-8448b6f56d-dnltx Total loading time: 0 Render date: 2024-04-20T00:46:41.726Z Has data issue: false hasContentIssue false

A Further Note on Correlation Coefficients Derived From Cumulative Distributions*

Published online by Cambridge University Press:  30 January 2017

David P. Adam*
Affiliation:
Laboratory of Tree-Ring Research, University of Arizona, Tucson, Arizona 85721, U.S.A.
Rights & Permissions [Opens in a new window]

Abstract

This paper elaborates on the note by Andrews and others (1971). It demonstrates that one may obtain any arbitrary value of r between two series of observations by adjusting the mean values of the two series before cumulating them. A computer simulation is used to illustrate the behavior of random Normal series cumulated under varying conditions.

Résumé

Résumé

Cet article s’appuie sur la note d’Andrews et autres (1971). Il démontre que l’on peut obtenir une valeur arbitraire de r entre deux séries d’observations en ajustant les valeurs moyennes des deux séries avant de des cumuler. Une simulation sur calculateur est utilisée pour illustrer le comportement d’une série aléatoire Normale cumulée sous diverses conditions.

Zusammenfassung

Zusammenfassung

Dieser Beitrag führt die Bermerkung von Andrews und anderen (1971) weiter. Er zeigt, dass man jeden beliebigen Wert für r zwischen zwei Beobachtungsreihen erhalten kann, wenn man die Mittelwerte der beiden Reihen vor ihrer Kumulierung entsprechend anpasst. Eine Computersimulation wird zur Illustration des Verhaltens von zufälligen Normalverteilungen, die unter varierenden Bedingungen kumuliert werden, benutzt.

Type
Short Notes
Copyright
Copyright © International Glaciological Society 1972

Andrews and others (1971) have correctly noted that the product-moment correlation coefficient cannot be used on cumulative data. I wish to add that the value of r between two cumulative series is not independent of the scale used for measurement, and that it is in fact possible to obtain nearly any desired value of r by simply adjusting the mean values of the two series before cumulating them.

The product-moment correlation coefficient is designed to deal with data which follow a Normal distribution. Observations of such data may be expressed as

(1)

where x is some mean value and e x is N(ο, σ). When a series of observations of the Normal variate of Equation (1) is expressed in cumulative form, the nth observation becomes

(2)

The final term in Equation (2) introduces a serial correlation which destroys the independence of the observations and converts the series of random Normal observations into a one-dimensional random walk (Mitchell and others, 1966, p. 6). This effect is illustrated in Figure 1; 500 random Normal observations were generated on a CDC 6400 computer using the algorithm of Reference NaylorNaylor and others (1966, p. 95), and these are plotted as a raw series (Fig. 1a) and as a cumulated series (Fig. 1b). It is clear that the cumulated series is far from random. The correlation between two random series is substantially altered by the transformation from raw to cumulated series, and this is shown in Table I. Ten pairs of random Normal series, e x and e y , were generated (N = 500) and the correlations between them were calculated for both raw (Equation (1)) and cumulated (Equation (2)) series, with The correlations between the cumulated series give no hint of the basic lack of relationship between the raw observations.

Fig. 1. A 500-observation random Normal series graphed (a) in raw form, and (b) in cumulated form.

Table I Series Means and Correlations Between Two Series of Random Normal Variates

Another potential source of error is that if the mean value of a series is different from zero, then the first term on the right side of Equation (2) will introduce a linear trend into the set of cumulative observations. The magnitude of the trend depends upon the absolute value of the mean and upon the length of the series, while the direction of the trend depends on the sign of the mean.

When the means of both series in a correlation analysis are different from zero, the introduced linear trends tend to dominate the relationship between the two sets of cumulative observations. A simulation model was designed to study the behavior of the correlation coefficient between two cumulated random Normal series, x and y, when different combinations of x and ȳ were added to the series before cumulation. Two 500-observation series of random Normal variates corresponding to the e x (or e y ) terms of Equation (1) were generated. Values of x and ȳ were varied from −0.2 to + 0.2 by steps of 0.04. For each possible combination of x and ȳ the two series were converted to the cumulative form according to Equation (2), and the correlation between them was calculated.

The results for one run of this model are shown in Table II. When x and ȳ are of the same sign, the two series are strongly positively correlated, but when they are of opposite sign, strong negative correlations result. By choosing different mean values for two unrelated series of random observations and then expressing those observations in cumulative form, it is thus possible to obtain almost any desired value of r.

Table II Correlations Between Two Cumulated Random Normal Series as A Function of the. Means, x and ȳ, of Those Series. The Means of the Non-Cumulated Series are 0.0151 and 0.0080, and the Correlation Between Them Is −0.0295

Indeed, it is not necessary that the two series be unrelated in order to be able to select r at will. The two simple examples in Table III and Figure 2 show that it is quite easy to completely reverse the sense of a relationship by using cumulative series instead of raw data.

Fig. 2. Two examples of the reversal of the correlation coefficient between two variables when the data are transformed from raw to cumulated series.

Table III Two Sets of Data Which Show Reversal of the Correlation Coefficient When the Data are Transformed From Raw to Cumulative Series. Top, Data for Figure 3A; Bottom, Data For Figure 3B

The high correlations between cumulative series reported by Andrews and others (1971) result from the fact that they used random numbers with a mean value of 50 for both sets of observations. By choosing different mean values for their initial series, they could have obtained any value they wanted.

Another quirk of the correlation coefficient between cumulated series is that it depends to a certain extent on the order in which the observations are cumulated. Only the final point in the cumulated series has a fixed value for a given set of points; the other points may assume different values depending on which point is chosen as the initial one and the sequence of the points which follow. When non-cumulated series are correlated, the order in which the pairs of observations are taken does not affect the correlation coefficient; in the case of cumulated series, however, variations in the magnitude of the coefficient do occur when different orders of accumulation are followed.

In summary, the transformation of a set of observations to cumulative form destroys the independence of the observations and makes the correlation coefficient strongly dependent on the scale used for measurement and on the length of the series. Correlating cumulated series is thus a procedure whose use should be restricted to special circumstances or completely eliminated.

Acknowledgements

A portion of this work was supported by NSF Grant GA-4128 to Valmore C. LaMarche. Computer time was supplied by the University of Arizona Computer Center. I thank John Sims for helpful criticism of the manuscript.

Footnotes

Present address: U.S. Geological Survey, 345 Middlefield Road, Menlo Park, California 94025, U.S.A.

*

Publication No. 39. Department of Geosciences, University of Arizona.

References

Andrews, J. T., and others. 1971. Note on correlation coefficients derived from cumulative distributions with reference to glaciological studies, by J. T. Andrews, B. D. Fahey and D. Alford. Journal of Glaciology, Vol. 10, No. 58, p. 14547.Google Scholar
Mitchell, J. M. jr. and others. 1966. Climatic change. Report of a working group of the Commission for Climatology prepared by J. M. Mitchell, Jr., chairman, B. Dzerdzeevskii, H. Flohn, W. L. Hofmeyr, H. H. Lamb, K. N. Rao, C. C. Wallén. World Meteorological Organization. Technical Note No. 79. (WMO No. 195. TP. 100.)Google Scholar
Naylor, T. H., and others. 1966. Computer simulation techniques, by T. H. Naylor. J. L. Balintfy, D. S. Burdick and K. Chu. New York, John Wiley and Sons, Inc.Google Scholar
Figure 0

Fig. 1. A 500-observation random Normal series graphed (a) in raw form, and (b) in cumulated form.

Figure 1

Table I Series Means and Correlations Between Two Series of Random Normal Variates

Figure 2

Table II Correlations Between Two Cumulated Random Normal Series as A Function of the. Means, x and ȳ, of Those Series. The Means of the Non-Cumulated Series are 0.0151 and 0.0080, and the Correlation Between Them Is −0.0295

Figure 3

Fig. 2. Two examples of the reversal of the correlation coefficient between two variables when the data are transformed from raw to cumulated series.

Figure 4

Table III Two Sets of Data Which Show Reversal of the Correlation Coefficient When the Data are Transformed From Raw to Cumulative Series. Top, Data for Figure 3A; Bottom, Data For Figure 3B