Skip to main content Accessibility help
×
Hostname: page-component-745bb68f8f-5r2nc Total loading time: 0 Render date: 2025-01-14T00:33:25.429Z Has data issue: false hasContentIssue false

Chapter 1 - Machine Learning Algorithms and Measurement

from Part I - Foundations

Published online by Cambridge University Press:  08 November 2023

Louis Tay
Affiliation:
Purdue University, Indiana
Sang Eun Woo
Affiliation:
Purdue University, Indiana
Tara Behrend
Affiliation:
Purdue University, Indiana
Get access

Summary

This chapter provides an overview of the common machine learning algorithms used in psychological measurement (to measure human attributes). They include algorithms used to measure personality from interview videos; job satisfaction from open-ended text responses; and group-level emotions from social media posts and internet search trends. These algorithms enable effective and scalable measures of human psychology and behavior, driving technological advancements in measurement. The chapter consists of three parts. We first discuss machine learning and its unique contribution to measurement. We then provide an overview of the common machine learning algorithms used in measurement and their example applications. Finally, we provide recommendations and resources for using machine learning algorithms in measurement.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2023

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Ahmad, H., Arif, A., Khattak, A. M., Habib, A., Asghar, M. Z., & Shah, B. (2020, January). Applying deep neural networks for predicting dark triad personality trait of online users. In 2020 International Conference on Information Networking (ICOIN) (pp. 102105). IEEE.CrossRefGoogle Scholar
Akhtar, R., Winsborough, D., Ort, U., Johnson, A., & Chamorro-Premuzic, T. (2018). Detecting the dark side of personality using social media status updates. Personality and Individual Differences, 132, 9097. https://doi.org/10.1016/j.paid.2018.05.026CrossRefGoogle Scholar
Altman, N. S. (1992). An introduction to Kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3), 175185. https://doi.org/10.1080/00031305.1992.10475879Google Scholar
Angelov, D. (2020). Top2Vec: Distributed representations of topics. ArXiv:2008.09470 [Cs, Stat]. http://arxiv.org/abs/2008.09470Google Scholar
Arya, S., Mount, D. M., Netanyahu, N. S., Silverman, R., & Wu, A. Y. (1998). An optimal algorithm for approximate nearest neighbor searching fixed dimensions. Journal of the ACM (JACM), 45(6), 891923. https://doi.org/10.1145/293347.293348CrossRefGoogle Scholar
Batista, G., & Monard, M.-C. (2002). A study of K-nearest neighbour as an imputation method. In International Conference on Health Information Science.Google Scholar
Bengio, Y. (2009). Learning deep architectures for AI. Now Publishers Inc.CrossRefGoogle Scholar
Blei, D. M., Jordan, M. I., Griffiths, T. L., & Tenenbaum, J. B. (2003). Hierarchical topic models and the nested Chinese restaurant process. Proceedings of the 16th International Conference on Neural Information Processing Systems, 17–24.Google Scholar
Bontempi, G., Birattari, M., & Bersini, H. (1999). Lazy learning for local modelling and control design. International Journal of Control, 72(7–8), 643658. https://doi.org/10.1080/002071799220830CrossRefGoogle Scholar
Braun, M. T., & Kuljanin, G. (2015). Big data and the challenge of construct validity. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8(4), 521527. https://doi.org/10.1017/iop.2015.77CrossRefGoogle Scholar
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123140.CrossRefGoogle Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45, 532. https://doi.org/10.1023/A:1010933404324CrossRefGoogle Scholar
Brownlee, J. (2020, September 21). Multi-core machine learning in Python with Scikit-Learn. Machine Learning Mastery. https://machinelearningmastery.com/multi-core-machine-learning-in-python/Google Scholar
Calanna, P., Lauriola, M., Saggino, A., Tommasi, M., & Furlan, S. (2020). Using a supervised machine learning algorithm for detecting faking good in a personality self‐report. International Journal of Selection and Assessment, 28(2), 176185. https://doi.org/10.1111/ijsa.12279CrossRefGoogle Scholar
Cardaioli, M., Cecconello, S., Conti, M., Pajola, L., & Turrin, F. (2020). Fake news spreaders profiling through behavioural analysis. In Working notes of CLEF 2020 – Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, September 22–25, 2020, vol. 2696 of CEUR Workshop Proceedings. CEUR-WS.orgGoogle Scholar
Chaney, A., & Blei, D. (2012, May). Visualizing topic models. In Proceedings of the International AAAI Conference on Web and Social Media (Vol. 6, No. 1).CrossRefGoogle Scholar
Chekroud, A. M., Gueorguieva, R., Krumholz, H. M., Trivedi, M. H., Krystal, J. H., & McCarthy, G. (2017). Reevaluating the efficacy and predictability of antidepressant treatments: A symptom clustering approach. JAMA Psychiatry, 74(4), 370378. https://doi.org/10.1001/jamapsychiatry.2017.0025CrossRefGoogle ScholarPubMed
Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794).CrossRefGoogle Scholar
Costa, P. T., & McCrae, R. R. (1992). Normal personality assessment in clinical practice: The NEO Personality Inventory. Psychological Assessment, 4(1), 513. https://doi.org/10.1037/1040-3590.4.1.5CrossRefGoogle Scholar
Drucker, H., Burges, C. J., Kaufman, L., Smola, A., & Vapnik, V. (1996). Support vector regression machines. Advances in Neural Information Processing Systems, 155–161.Google Scholar
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. The Annals of Statistics, 32(2), 407499. https://doi.org/10.1214/009053604000000067CrossRefGoogle Scholar
Fix, E., & Hodges, J. (1951). Discriminatory analysis – nonparametric discrimination: Consistency properties. Technical Report 21-49-004,4, U.S. Air Force, School of Aviation Medicine, Randolph Field, TX.Google Scholar
Friedman, J. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 11891232. https://doi.org/10.1214/aos/1013203451CrossRefGoogle Scholar
Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4), 367378. https://doi.org/10.1016/S0167–9473(01)00065-2CrossRefGoogle Scholar
Fujimoto, N. (2008). Faster matrix-vector multiplication on GeForce 8800GTX. 2008 IEEE International Symposium on Parallel and Distributed Processing, 1–8. https://doi.org/10.1109/IPDPS.2008.4536350CrossRefGoogle Scholar
Géron, A. (2019). Hands-on machine learning with Scikit-Learn & TensorFlow: Concepts, tools, and techniques to build intelligent systems. O’Reilly Media, Inc.Google Scholar
Gradus, J. L., Rosellini, A. J., Horváth-Puhó, E., Street, A. E., Galatzer-Levy, I., Jiang, T., Lash, T. L., & Sørensen, H. T. (2020). Prediction of sex-specific suicide risk using machine learning and single-payer health care registry data from Denmark. JAMA Psychiatry, 77(1), 2534. https://doi.org/10.1001/jamapsychiatry.2019.2905CrossRefGoogle Scholar
Groves, R. M., Fowler, F. J. Jr, Couper, M. P., Lepkowski, J. M., Singer, E., & Tourangeau, R. (2011). Survey methodology (Vol. 561). John Wiley & Sons.Google Scholar
Hansen, L. K., & Salamon, P. (1990). Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(10), 9931001.CrossRefGoogle Scholar
Hassanien, A. E., Kilany, M., Houssein, E. H., & AlQaheri, H. (2018). Intelligent human emotion recognition based on elephant herding optimization tuned support vector regression. Biomedical Signal Processing and Control, 45, 182191. https://doi.org/10.1016/j.bspc.2018.05.039CrossRefGoogle Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer Science & Business Media.CrossRefGoogle Scholar
Hickman, L., Bosch, N., Ng, V., Saef, R., Tay, L., & Woo, S. E. (2022). Automated video interview personality assessments: Reliability, validity, and generalizability investigations. Journal of Applied Psychology, 107(8), 13231351. https://doi.org/10.1037/apl0000695CrossRefGoogle ScholarPubMed
Hickman, L., Song, Q. C., & Woo, S. E. (2022). Evaluating data. In Murphy, K. R. (Ed.), Data, methods and theory in the organizational sciences (pp. 98123). Society of Industrial and Organizational Psychology Organizational Frontiers Series. Routledge.CrossRefGoogle Scholar
Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 5567. https://doi.org/10.1080/00401706.1970.10488634CrossRefGoogle Scholar
IBM Corp. (2020). IBM SPSS statistics for Windows, version 27.0. IBM Corp.Google Scholar
James, G., Witten, D., Hastie, T., & Tibshirani, R. (Eds.). (2013). An introduction to statistical learning: With applications in R. Springer.CrossRefGoogle Scholar
Jiang, T., Gradus, J. L., & Rosellini, A. J. (2020). Supervised machine learning: A brief primer. Behavior Therapy, 51(5), 675687. https://doi.org/10.1016/j.beth.2020.05.002CrossRefGoogle ScholarPubMed
Jung, Y., & Suh, Y. (2019). Mining the voice of employees: A text mining approach to identifying and analyzing job satisfaction factors from online employee reviews. Decision Support Systems, 123, 113074. https://doi.org/10.1016/j.dss.2019.113074CrossRefGoogle Scholar
Kenney, J. F., & Keeping, E. S. (1962). Linear regression and correlation. Mathematics of Statistics, 1, 252285.Google Scholar
Kotsiantis, S. B. (2007). Supervised machine learning: A review of classification techniques. Informatica (Slovenia), 31(3), 249268.Google Scholar
Kouiroukidis, N., & Evangelidis, G. (2011). The effects of dimensionality curse in high dimensional kNN search. 2011 15th Panhellenic Conference on Informatics, 41–45. https://doi.org/10.1109/PCI.2011.45CrossRefGoogle Scholar
Krajewski, R. (2020, November 26). Python vs R: What language is better for data science projects? Ideamotive. https://www.ideamotive.co/blog/python-vs-r-what-language-is-better-for-data-science-projects.Google Scholar
Kuhn, M. (2019). Parallel processing. In The caret package. https://topepo.github.io/caret/parallel-processing.htmlGoogle Scholar
Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. Springer.CrossRefGoogle Scholar
Lu, Z., Pu, H., Wang, F., Hu, Z., & Wang, L. (2017). The expressive power of neural networks: A view from the wdth. ArXiv:1709.02540 [Cs]. http://arxiv.org/abs/1709.02540Google Scholar
MacQueen, J. (1967, June). Some methods for classification and analysis of multivariate observations. Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, 1(14), 281297.Google Scholar
Matusik, J. G., Heidl, R., Hollenbeck, J. R., Yu, A., Lee, H. W., & Howe, M. (2019). Wearable bluetooth sensors for capturing relational variables and temporal variability in relationships: A construct validation study. Journal of Applied Psychology, 104(3), 357387. https://doi.org/10.1037/apl0000334CrossRefGoogle ScholarPubMed
McKinney, W. (2012). Python for data analysis. O’Reilly Media.Google Scholar
Mitchell, T. M. (1997). Machine learning. McGraw-Hill.Google Scholar
Morgan, J. N., & Sonquist, J. A. (1963). Problems in the analysis of survey data, and a proposal. Journal of the American Statistical Association, 58, 415434.CrossRefGoogle Scholar
Müller, A. C., & Guido, S. (2018). Introduction to machine learning with Python: A guide for data scientists. O’Reilly Media.Google Scholar
Myles, A. J., Feudale, R. N., Liu, Y., Woody, N. A., & Brown, S. D. (2004). An introduction to decision tree modeling. Journal of Chemometrics: A Journal of the Chemometrics Society, 18(6), 275285. https://doi.org/10.1002/cem.873CrossRefGoogle Scholar
Nelson, D. (2020). Data visualization in Python. StackAbuse.Google Scholar
Newman, D. A. (2014). Missing data: Five practical guidelines. Organizational Research Methods, 17(4), 372411. https://doi.org/10.1177/1094428114548590CrossRefGoogle Scholar
Park, G., Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Kosinski, M., Stillwell, D. J., Ungar, L. H., & Seligman, M. E. P. (2015). Automatic personality assessment through social media language. Journal of Personality and Social Psychology, 108(6), 934952. https://doi.org/10.1037/pspp0000020CrossRefGoogle ScholarPubMed
Polyak, S. T., von Davier, A. A., & Peterschmidt, K. (2017). Computational psychometrics for the measurement of collaborative problem solving skills. Frontiers in Psychology, 8. https://doi.org/10.3389/fpsyg.2017.02029CrossRefGoogle ScholarPubMed
Putka, D. J., Beatty, A. S., & Reeder, M. C. (2018). Modern prediction methods: New perspectives on a common problem. Organizational Research Methods, 21(3), 689732. https://doi.org/10.1177/1094428117697041CrossRefGoogle Scholar
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81106. https://doi.org/10.1007/BF00116251CrossRefGoogle Scholar
Quinlan, J. R. (1996). Bagging, boosting, and C4.5. Proceedings of the Thirteenth National Conference on Artificial Intelligence, 725–730.Google Scholar
Roberts, M. E., Stewart, B. M., Tingley, D., Lucas, C., Leder-Luis, J., Gadarian, S. K., Albertson, B., & Rand, D. G. (2014). Structural topic models for open-ended survey responses. American Journal of Political Science, 58(4), 10641082. https://doi.org/10.1111/ajps.12103CrossRefGoogle Scholar
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3), 581592. https://doi.org/10.1093/biomet/63.3.581CrossRefGoogle Scholar
Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206215. https://doi.org/10.1038/s42256–019-0048-xCrossRefGoogle ScholarPubMed
Salehzadeh, R. (2017). Which types of leadership styles do followers prefer? A decision tree approach. International Journal of Educational Management, 31(7), 865877. https://doi.org/10.1108/IJEM-04-2016-0079Google Scholar
Speer, A. B. (2020). Scoring dimension-level job performance from narrative comments: Validity and generalizability when using natural language processing. Organizational Research Methods, 109442812093081. https://doi.org/10.1177/1094428120930815CrossRefGoogle Scholar
Steinhaus, H. (1956). Sur la division des corps matériels en parties. Bulletin L’Académie Polonaise des Science, 1(804), 801804.Google Scholar
Tay, L., Woo, S. E., Hickman, L., & Saef, R. M. (2020). Psychometric and validity issues in machine learning approaches to personality assessment: A focus on social media text mining. European Journal of Personality, 34(5), 826844. https://doi.org/10.1002/per.2290CrossRefGoogle Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.xGoogle Scholar
Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(11). http://www.jmlr.org/papers/v9/vandermaaten08a.htmlGoogle Scholar
Wang, W., Hernandez, I., Newman, D. A., He, J., & Bian, J. (2016). Twitter analysis: Studying US weekly trends in work stress and emotion. Applied Psychology, 65(2), 355378. https://doi.org/10.1111/apps.12065CrossRefGoogle Scholar
Wickham, H., & Grolemund, G. (2017). R for data science: Import, tidy, transform, visualize, and model data. O’Reilly Media.Google Scholar
Winkielman, P., Schwarz, N., Fazendeiro, T. A., & Reber, R. (2003). The hedonic marking of processing fluency: Implications for evaluative judgment. In Musch, J. & Klauer, K. C. (Eds.), The psychology of evaluation: Affective processes in cognition and emotion (pp. 189217). Lawrence Erlbaum Associates Publishers.Google Scholar
Xu, H., Zhang, N., & Zhou, L. (2020). Validity concerns in research using organic data. Journal of Management, 46(7), 12571274. https://doi.org/10.1177/0149206319862027CrossRefGoogle Scholar
Youyou, W., Kosinski, M., & Stillwell, D. (2015). Computer-based personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Sciences, 112(4), 10361040. https://doi.org/10.1073/pnas.1418680112CrossRefGoogle ScholarPubMed
Zhang, T. (2004). Solving large scale linear prediction problems using stochastic gradient descent algorithms. ICML 2004: Proceedings of The Twenty-first International Conference on Machine Learning (pp. 919926). Omnipress. https://doi.org/10.1145/1015330.1015332Google Scholar
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301320. https://doi.org/10.1111/j.1467-9868.2005.00503.xCrossRefGoogle Scholar

Save book to Kindle

To save this book to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×