Corrections for Criterion Reliability in Validity Generalization: A False Prophet in a Land of Suspended Judgment

James M. LeBreton; Kelly T. Scherer; Lawrence R. James

doi:10.1017/S1754942600006775

Corrections for Criterion Reliability in Validity Generalization: A False Prophet in a Land of Suspended Judgment

Published online by Cambridge University Press: 10 April 2015

James M. LeBreton ,

Kelly T. Scherer and

Lawrence R. James

Show author details

James M. LeBreton*: Affiliation:
Pennsylvania State University
Kelly T. Scherer: Affiliation:
Purdue University
Lawrence R. James: Affiliation:
Georgia Institute of Technology
*: E-mail: james.lebreton@psu.edu, Address: Department of Psychology, Pennsylvania State University, 140 Moore Building, University Park, PA 16802

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

The results of meta-analytic (MA) and validity generalization (VG) studies continue to be impressive. In contrast to earlier findings that capped the variance accounted for in job performance at roughly 16%, many recent studies suggest that a single predictor variable can account for between 16 and 36% of the variance in some aspect of job performance. This article argues that this “enhancement” in variance accounted for is often attributable not to improvements in science but to a dumbing down of the standards for the values of statistics used in correction equations. With rare exceptions, applied researchers have suspended judgment about what is and is not an acceptable threshold for criterion reliability in their quest for higher validities. We demonstrate a statistical dysfunction that is a direct result of using low criterion reliabilities in corrections for attenuation. Corrections typically applied to a single predictor in a VG study are instead applied to multiple predictors. A multiple correlation analysis is then conducted on corrected validity coefficients. It is shown that the corrections often used in single predictor studies yield a squared multiple correlation that appears suspect. Basically, the multiple predictor study exposes the tenuous statistical foundation of using abjectly low criterion reliabilities in single predictor VG studies. Recommendations for restoring scientific integrity to the meta-analyses that permeate industrial–organizational (I–O) psychology are offered.

Type: Focal Article
Information: Industrial and Organizational Psychology , Volume 7 , Issue 4 , December 2014 , pp. 478 - 500

DOI: https://doi.org/10.1017/S1754942600006775 [Opens in a new window]
Copyright: Copyright © Society for Industrial and Organizational Psychology 2014

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Aguinis, H. A. (2013). Performance management (3rd ed.). Upper Saddle River, NJ: Prentice Hall.Google Scholar

Aiken, L. R. (1988). Psychological testing and assessment (6th ed.). Boston, MA: Allyn and Bacon.Google Scholar

Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Prospect Heights, IL: Waveland Press.Google Scholar

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: Author.Google Scholar

Anastasi, A. (1968). Psychological testing (3rd ed.). New York, NY: The MacMillan Company.Google Scholar

Austin, J. T., & Villanova, P. (1992). The criterion problem: 1917-1992. Journal of Applied Psychology, 77, 836–874.Google Scholar

Barrick, M. R., & Mount, M. K. (1991). The big five personality dimensions and job performance: A meta-analysis. Personnel Psychology, 44, 1–25.Google Scholar

Bhaskar-Shrinivas, P., Harrison, D. A., Shaffer, M., & Luk, D. M. (2005). Input-based and time-based models of international adjustment: Meta-analytic evidence and theoretical extensions. Academy of Management Journal, 48, 257–281.CrossRef Google Scholar

Binning, J. F., & Barrett, G. V. (1989). Validity of personnel decisions: A conceptual analysis of the inferential and evidential bases. Journal of Applied Psychology, 74, 478–494.Google Scholar

Binning, J. F., & LeBreton, J. M. (2009). Coherent conceptualization is useful for many things, and understanding validity is one of them. Industrial and Organizational Psychology: Perspectives on Science and Practice, 2, 486–492.Google Scholar

Borman, W. C. (1991). Job behavior, performance, and effectiveness. In Dunnette, M. D., & Hough, L. M. (Eds.), Handbook of industrial and organizational psychology (2nd ed., Vol 2, pp. 271–326). Palo Alto, CA: Consulting Psychologists Press.Google Scholar

Bowling, N. A., Eschleman, K. J., Wang, Q., Kirkendall, C., & Alarcon, G. (2010). A meta-analysis of the predictors and consequences of organization-based self-esteem. Journal of Occupational and Organizational Psychology, 83, 601–626.Google Scholar

Bradshaw, F. F. (1930). The American Council on Education rating scale. Archives of Psychology, 199, 80.Google Scholar

Campion, M. A., Campion, J. E., & Hudson, J. P. Jr. (1994). Structured interviewing: A note on incremental validity and alternative question types. Journal of Applied Psychology, 79, 998–1002.Google Scholar

Cardy, R. L., & Dobbins, G. H. (1994). Performance appraisal: Alternative perspectives. Cincinnati, OH: South-Western.Google Scholar

Cascio, W. F., & Aguinis, H. (2005). Applied psychology in human resource management (6th ed.). Upper Saddle River, NJ: Prentice Hall.Google Scholar

Chiaburu, D. S., & Harrison, D. A. (2008). Do peers make the place? Conceptual synthesis and meta-analysis of coworker effects on perceptions, attitudes, OCBs, and performance. Journal of Applied Psychology, 93, 1082–1103.CrossRef Google Scholar PubMed

Colquitt, J. A., Conlon, D. E., Wesson, M. J., Porter, C. O. L. H., & Ng, K. Y. (2001). Justice at the millennium: A meta-analytic review of 25 years of organizational justice research. Journal of Applied Psychology, 86, 425–445.CrossRef Google Scholar PubMed

Colquitt, J. A., Scott, B. A., & LePine, J. A. (2007). Trust, trustworthiness, and trust propensity: A meta-analytic test of their unique relationships with risk taking and job performance. Journal of Applied Psychology, 92, 909–927.Google Scholar

Cooper, W. H. (1981). Ubiquitous halo. Psychological Bulletin, 90, 218–244.CrossRef Google Scholar

Cooper-Hakim, A., & Viswesvaran, C. (2005). The construct of work commitment: Testing an integrative framework. Psychological Bulletin, 131, 241–259.Google Scholar

Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements: Theory of generalizability for scores and profiles. New York, NY: John Wiley & Sons.Google Scholar

DeShon, R. P. (1998). A cautionary note on measurement error corrections in structural equation models. Psychological Measurement, 3, 412–423.Google Scholar

DeShon, R. P. (2003). A generalizability theory perspective on measurement error corrections in validity generalization. In Murphy, K. R. (Ed.), Validity generalization: A critical review (pp. 365–402). Mahwah, NJ: Lawrence Erlbaum.Google Scholar

Erdogan, B., Bauer, T. N., Truxillo, D. M., & Mansfield, L. R. (2012). Whistle while you work: A review of the life satisfaction literature. Journal of Management, 38, 1038–1083.Google Scholar

Feldman, J. M. (1981). Beyond attribution theory: Cognitive processes in performance appraisal. Journal of Applied Psychology, 66, 127–148.Google Scholar

Ford, M. T., Cerasoli, C. P., Higgins, J. A., & Decesare, A. L. (2011). Relationships between psychological, physical, and behavioural health and work performance: A review and meta-analysis. Work & Stress, 25, 185–204.Google Scholar

Fried, Y. (1991). Meta-analytic comparison of the job diagnostic survey and job characteristics inventory as correlates of work satisfaction and performance. Journal of Applied Psychology, 76, 690–697.Google Scholar

Fried, Y., Shirom, A., Gilboa, S., & Cooper, C. L. (2008). The mediating effects of job satisfaction and propensity to leave on role stress-job performance relationships: Combining meta-analysis and structural equation modeling. International Journal of Stress Management, 15, 305–328.CrossRef Google Scholar

Ghiselli, E. E., & Brown, C. W. (1955). Personnel and industrial psychology (2nd ed.). New York, NY: McGraw-Hill.Google Scholar

Ghiselli, E. E., Campbell, J. P., & Zedeck, S. (1981). Measurement theory for the behavioral sciences. San Francisco, CA: W. H. Freeman & Company.Google Scholar

Guilford, J. P. (1936). Psychometric methods. New York, NY: McGraw-Hill.Google Scholar

Guilford, J. P. (1954). Psychometric methods (2nd ed.). New York, NY: McGraw-Hill.Google Scholar

Guilford, J. P., & Fruchter, B. (1973). Fundamental statistics in education and psychology (5th ed.). New York, NY: McGraw-Hill.Google Scholar

Gulliksen, H. (1950). Theory of mental tests. New York, NY: John Wiley & Sons.Google Scholar

Hermelin, E., & Robertson, I. T. (2001). A critique and standardization of meta-analytic validity coefficients in personnel selection. Journal of Occupational and Organizational Psychology, 74, 253–277.Google Scholar

Hoobler, J. M., Hu, J., & Wilson, M. (2010). Do workers who experience conflict between the work and family domains hit a “glass ceiling”?: A meta-analytic investigation. Journal of Vocational Behavior, 77, 481–494.Google Scholar

Huffcutt, A. I., & Arthur, W. Jr. (1994). Hunter and Hunter (1984) revisited: Interview validity for entry-level jobs. Journal of Applied Psychology, 79, 184–190.Google Scholar

Huffcutt, A. I., Roth, P. L., & McDaniel, M. A. (1996). A meta-analytic investigation of cognitive ability in employment interview evaluations: Moderating characteristics and implications for incremental validity. Journal of Applied Psychology, 81, 459–473.Google Scholar

Hunter, J. E. (1983). Test validation for 12,000 jobs: An application of job classification and validity generalization analysis to the general aptitude test battery. USES test research report no. 45. Division of Counseling and Test Development Employment and Training Administration. Washington, DC: U.S. Department of Labor.Google Scholar

Hunter, J. E., & Hunter, R. F. (1984). Validity and utility of alternative predictors of job performance. Psychological Bulletin, 96, 72–98.Google Scholar

Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis: Correcting error and bias in research findings. Thousand Oaks, CA: Sage.Google Scholar

James, L. R. (1973). Criterion models and construct validity for criteria. Psychological Bulletin, 80, 75–83.Google Scholar

James, L. R., & LeBreton, J. M. (2012). Assessing the implicit personality through conditional reasoning. Washington, DC: American Psychological Association.Google Scholar

James, L. R., Mulaik, S. A., & Brett, J. M. (1982). Causal analysis: Assumptions, models, and data. Beverly Hills, CA: Sage.Google Scholar

Joseph, D. L., & Newman, D. A. (2010). Emotional intelligence: An integrative meta-analysis and cascading model. Journal of Applied Psychology, 95, 54–78.CrossRef Google Scholar PubMed

Lance, C. E., Butts, M. M., & Michels, L. C. (2006). The sources of four commonly reported cutoff criteria: What did they really say? Organizational Research Methods, 9, 202–220.CrossRef Google Scholar

Landy, F. J. (1985). Psychology of work behavior (3rd ed.). Homewood, IL: Dorsey Press.Google Scholar

Landy, F. J. (1986). Stamp collecting versus science: Validation as hypothesis testing. American Psychologist, 41, 1183–1192.Google Scholar

Landy, F. J., & Farr, J. L. (1980). Performance ratings. Psychological Bulletin, 87, 72–107.Google Scholar

Lanier, L. H. (1927). Prediction of the reliability of mental tests and tests of special abilities. Journal of Experimental Psychology, 10, 69–113.Google Scholar

Latham, G. P., & Wexley, K. N. (1981). Increasing productivity through performance appraisal. Reading, MA: Addison-Wesley.Google Scholar

LeBreton, J. M., Burgess, J. R. D., Kaiser, R. B., Atchley, E. K. P., & James, L. R. (2003). The restriction of variance hypothesis and interrater reliability and agreement: Are ratings from multiple sources really dissimilar? Organizational Research Methods, 6, 80–128.CrossRef Google Scholar

LeBreton, J. M., & Senter, J. L. (2008). Answers to twenty questions about interrater reliability and interrater agreement. Organizational Research Methods, 11, 815–852.CrossRef Google Scholar

LeBreton, J. M., & Tonidandel, S. (2008). Multivariate relative importance: Extending relative weight analysis to multivariate criterion spaces. Journal of Applied Psychology, 93, 329–345.Google Scholar

Lilienfeld, S. O., Wood, J. M., & Garb, H. N. (2000). The scientific status of projective techniques. Psychological Science in the Public Interest, 1, 27–66.Google Scholar

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.Google Scholar

Margenau, H. (1950). The nature of physical reality. New York, NY: McGraw Hill.Google Scholar

McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Lawrence Erlbaum.Google Scholar

McNemar, Q. (1962). Psychological statistics (3rd ed.). New York, NY: John Wiley & Sons.Google Scholar

Meriac, J. P., Hoffman, B. J., Woehr, D. J., & Fleisher, M. S. (2008). Further evidence for the validity of assessment center dimensions: A meta-analysis of the incremental criterion-related validity of dimension ratings. Journal of Applied Psychology, 93, 1042–1052.CrossRef Google Scholar PubMed

Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741–749.Google Scholar

Michel, J. S., Mitchelson, J. K., Kotrba, L. M., LeBreton, J. M., & Baltes, B. B. (2009). A comparative test of work-family conflict models and critical examination of work-family linkages. Journal of Vocational Behavior, 74, 199–218.CrossRef Google Scholar

Morgeson, F. P., Campion, M. A., Dipboye, R. L., Hollenbeck, J. R., Murphy, K., & Schmitt, N. (2007). Reconsidering the use of personality tests in personnel selection contexts. Personnel Psychology, 60, 683–729.Google Scholar

Muchinsky, P. M. (1996). The correction for attenuation. Educational and Psychological Measurement, 56, 63–75.Google Scholar

Murphy, K. R., & Balzer, W. K. (1989). Rater errors and rating accuracy. Journal of Applied Psychology, 74, 619–624.Google Scholar

Murphy, K. R., & Cleveland, J. N. (1995). Understanding performance appraisal: Social, organizational, and goal-based perspectives. Thousand Oaks, CA: Sage.Google Scholar

Murphy, K. R., & Davidshofer, C. O. (2005). Psychological testing: Principles and applications (6th ed.). Upper Saddle River, NJ: Prentice Hall.Google Scholar

Murphy, K. R., & DeShon, R. (2000a). Interrater correlations do not estimate the reliability of job performance ratings. Personnel Psychology, 53, 873–900.Google Scholar

Murphy, K. R., & DeShon, R. (2000b). Progress in psychometrics: Can industrial and organizational psychology catch up? Personnel Psychology, 53, 913–924.Google Scholar

Murphy, K. R., & Shiarella, A. H. (1997). Implications of the multidimensional nature of job performance for the validity of selection tests: Multivariate frameworks for studying test validity. Personnel Psychology, 50, 823–854.Google Scholar

Nunnally, J. C. (1978). Psychometric theory (2nd ed.). New York, NY: McGraw-Hill.Google Scholar

Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York, NY: McGraw-Hill.Google Scholar

Oh, I., Wang, G., & Mount, M. K. (2011). Validity of observer ratings of the five-factor model of personality traits: A meta-analysis. Journal of Applied Psychology, 96, 762–773.Google Scholar

Ones, D. S., Viswesvaran, C., & Schmidt, F. L. (1993). Comprehensive meta-analysis of integrity test validities: Findings and implications for personnel selection and theories of job performance. Journal of Applied Psychology, 78, 679–703.Google Scholar

Oswald, F. L., & McCloy, R. A. (2003). Meta-analysis and the art of the average. In Murphy, K. R. (Ed.), Validity generalization: A critical review (pp. 311–338). Mahwah, NJ: Lawrence Erlbaum.Google Scholar

Ployhart, R. E., Schneider, B., & Schmitt, N. (2006). Staffing organizations: Contemporary practice and theory (3rd ed.). Mahwah, NJ: Lawrence Erlbaum.Google Scholar

Putka, D. J., Le, H., McCloy, R. A., & Diaz, T. (2008). Ill-structured measurement designs in organizational research: Implications for estimating interrater reliability. Journal of Applied Psychology, 93, 959–981.Google Scholar

Riggle, R. J., Edmondson, D. R., & Hansen, J. D. (2009). A meta-analysis of the relationship between perceived organizational support and job outcomes: 20 years of research. Journal of Business Research, 62, 1027–1030.Google Scholar

Rockstuhl, T., Dulebohn, J. H., Ang, S., & Shore, L. M. (2012). Leader-member exchange (LMX) and culture: A meta-analysis of correlates of LMX across 23 countries. Journal of Applied Psychology, 97, 1097–1130.CrossRef Google Scholar PubMed

Rosenthal, R. (1984). Meta-analysis procedures for social research. Beverly Hills, CA: Sage.Google Scholar

Rothstein, H. R., Schmidt, F. L., Erwin, F. W., Owens, W. A., & Sparks, C. P. (1990). Biographical data in employment selection: Can validities be made generalizable? Journal of Applied Psychology, 75, 175–184.Google Scholar

Rotundo, M., & Sackett, P. R. (2002). The relative importance of task, citizenship, and counterproductive performance to global ratings of job performance. Journal of Applied Psychology, 87, 66–80.Google Scholar

Rugg, H. O. (1921). Is the rating of human character practicable? Journal of Educational Psychology, 12, 425–438.Google Scholar

Saal, F. E., Downey, R. G., & Lahey, M. A. (1980). Rating the ratings: Assessing the psychometric quality of rating data. Psychological Bulletin, 88, 413–428.Google Scholar

Salas, E., Rozell, D., Mullen, B., & Driskell, J. E. (1999). The effect of team building on performance: An integration. Small Group Research, 30, 309–329.Google Scholar

Schmidt, F. L. (1992). What do data really mean?: Research findings, meta-analysis, and cumulative knowledge in psychology. American Psychologist, 47, 1173–1181.Google Scholar

Schmidt, F. L., & Hunter, J. E. (1977). Development of a general solution to the problem of validity generalization. Journal of Applied Psychology, 62, 529–540.Google Scholar

Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124, 262–274.Google Scholar

Schmidt, F. L., Hunter, J. E., & Outerbridge, A. N. (1986). Impact of job experience and ability on job knowledge, work sample performance, and supervisory ratings of job performance. Journal of Applied Psychology, 71, 432–439.CrossRef Google Scholar

Schmidt, F. L., Viswesvaran, C., & Ones, D. S. (2000). Reliability is not validity and validity is not reliability. Personnel Psychology, 53, 901–912.Google Scholar

Schmitt, N., & Chan, D. (1998). Personnel selection: A theoretical approach. Thousand Oaks, CA: Sage.Google Scholar

Seymour, R. T. (1988). Why plaintiffs' counsel challenge tests, and how they can successfully challenge the theory of “validity generalization.” Journal of Vocational Behavior, 33, 331–364.CrossRef Google Scholar

Smith, P. C. (1976). Behavior, results, and organizational effectiveness: The problem of the criteria. In Dunnette, M. D. (Ed.), Handbook of industrial and organizational psychology (pp. 745–775). Chicago, IL: Rand-McNally College.Google Scholar

Society for Industrial and Organizational Psychology (2003). Principles for the validation and use of personnel selection procedures (4th ed.). Bowling Green, OH: Author.Google Scholar

Spearman, C. (1910). Correlation calculated with faulty data. British Journal of Psychology, 3, 271–295.Google Scholar

Symonds, P. M. (1931). Diagnosing personality and conduct. New York, NY: D. Appleton-Century Company, Inc.Google Scholar

Thomas, J. P., Whitman, D. S., & Viswesvaran, C. (2010). Employee proactivity in organizations: A comparative meta-analysis of emergent proactive constructs. Journal of Occupational and Organizational Psychology, 83, 275–300.Google Scholar

Thorndike, E. L. (1920). A constant error in psychological ratings. Journal of Applied Psychology, 4, 25–29.Google Scholar

Thouless, R. H. (1939). The effects of errors of measurement on correlation coefficients. British Journal of Psychology: General Section, 29, 383–403.Google Scholar

Tornau, K., & Frese, M. (2013). Construct clean-up in proactivity research: A meta-analysis on the nomological net of work-related proactivity concepts and their incremental validities. Applied Psychology: An International Review, 62, 44–96.Google Scholar

Viswesvaran, C., Ones, D. S., & Schmidt, F. L. (1996). Comparative analysis of the reliability of job performance ratings. Journal of Applied Psychology, 81, 557–574.Google Scholar

Weiner, E. A., & Stewart, B. J. (1984). Assessing individuals: Psychological and educational tests and measurements. Boston, MA: Little, Brown, & Company.Google Scholar

Winne, P. H., & Belfry, M. J. (1982). Interpretive problems when correcting for attenuation. Journal of Educational Measurement, 19, 125–134.Google Scholar

Womer, F. B. (1968). Basic concepts in testing. Boston, MA: Houghton Mifflin Company.Google Scholar

Article contents

Corrections for Criterion Reliability in Validity Generalization: A False Prophet in a Land of Suspended Judgment

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests