Reducing the Noise From Scraping Social Media Content: Some Evidence-Based Recommendations

Filip Lievens; Chad H. Van Iddekinge

doi:10.1017/iop.2016.67

Reducing the Noise From Scraping Social Media Content: Some Evidence-Based Recommendations

Published online by Cambridge University Press: 07 October 2016

Filip Lievens and

Chad H. Van Iddekinge

Show author details

Filip Lievens*: Affiliation:
Department of Personnel Management and Work and Organizational Psychology, Ghent University, Ghent, Belgium
Chad H. Van Iddekinge: Affiliation:
College of Business, Florida State University
*: Correspondence concerning this article should be addressed to Filip Lievens, Department of Personnel Management and Work and Organizational Psychology, Ghent University, Henri Dunantlaan 2, 9000 Ghent, Belgium. E-mail: filip.lievens@ugent.be

Article contents

Extract
Evidence-Based Recommendations
Epilogue
References

Rights & Permissions

Extract

Chamorro-Premuzic, Winsborough, Sherman, and Hogan (2016) describe a variety of new selection approaches (e.g., “scraping” of social media information, gamified assessments) in the staffing domain that might provide new sources of information about people. The authors also mention advantages and downsides of these potentially “new talent signals.”

Type: Commentaries
Information: Industrial and Organizational Psychology , Volume 9 , Issue 3 , September 2016 , pp. 660 - 666

DOI: https://doi.org/10.1017/iop.2016.67 [Opens in a new window]
Copyright: Copyright © Society for Industrial and Organizational Psychology 2016

Chamorro-Premuzic, Winsborough, Sherman, and Hogan (Reference Chamorro-Premuzic, Winsborough, Sherman and Hogan2016) describe a variety of new selection approaches (e.g., “scraping” of social media information, gamified assessments) in the staffing domain that might provide new sources of information about people. The authors also mention advantages and downsides of these potentially “new talent signals.”

We suggest that the next step is to identify conditions under which these new approaches might be best used to increase their probability of providing accurate job-related information on candidates’ knowledge, skills, abilities, and other characteristics (KSAOs). Although there has been little scientific research on these new assessment methods, we posit that some guidance may be found in the over 100 years of research on personnel selection. This makes sense because, as noted in the focal article, these new tools are mainly technologically enhanced versions of traditional assessment methods. Therefore, we draw on existing personnel selection knowledge to delineate a set of general recommendations to make these new talent signals less weak and “noisy” (i.e., more reliable and valid). We focus primarily on scraping social media information, although we show how some of these recommendations may be relevant for gamified assessment as well.

Evidence-Based Recommendations

Establish Links Between New Talent Signals and Job-Related Constructs

New talent signals from people's involvement in social media refer to the content posted on social media. Examples include comments posted on Facebook or Twitter, preferences for brands mentioned on Facebook pages, comments on other people's messages, likes, and membership in specific groups. As job relatedness is a cardinal rule to ensure validity and fairness, it is pivotal to map these new signals to job-related constructs (Society for Industrial and Organizational Psychology, 2003; Uniform Guidelines on Employee Selection Procedures, 1978). To this end, job analysis methods, subject matter judgments, and theoretical models can be used. Essentially, this endeavor is similar to earlier research that mapped interviewee answers, biodata responses, or assessment center observations onto job-related constructs. Afterward, the validity of the new signals can be ascertained by correlating them with more common measures of the same constructs or with criteria the new signals should predict.

So, we recommend that organizations determine beforehand which signals are indicators of well-known individual differences such as cognitive ability, knowledge, interests, personality, or motivation (Sackett, Lievens, Van Iddekinge, & Kuncel, in press). Along these lines, Roth, Bobko, Van Iddekinge, and Thatcher (Reference Roth, Bobko, Van Iddekinge and Thatcher2016) provided a good start in describing how social media content might be mapped to existing taxonomies of individual differences (KSAOs). Clearly, our recommendation to establish evidence-based or theory-based links between these new talent signals and job-related constructs applies not only to information gathered via social media but also to candidate actions in serious games (see Bedwell, Pavlas, Heyne, Lazzara, & Salas, Reference Bedwell, Pavlas, Heyne, Lazzara and Salas2012).

Adopt the Principle of Aggregation

Once job-related signals have been gathered, aggregation is a second cardinal principle to further reduce the noise of these job-related signals. We adopt the aggregation principle (Epstein, Reference Epstein1979) for reliability purposes almost everywhere in psychology. For instance, we use multiple items in scales, rely on multiple raters (e.g., interviewers), and combine data from several observations to form an overall evaluation.

In the context of scraping social media content, it is equally important to make sure that judgments about candidates’ standing on KSAOs are made on the basis of multiple signals. The consistency among these multiple, job-related social media signals can then be determined as a way of assessing their reliability (McFarland & Ployhart, Reference McFarland and Ployhart2015). For example, the consistency among the information captured can be examined across time or across different signals thought to reflect the same job-related construct.

A specific example of such aggregation consists of combining information from different information sources. For instance, some social media content comes from self-reported information, whereas other information on social media is provided by others (e.g., endorsements and comments). Given that self- and other reports each have their respective benefits and weaknesses (Vazire, Reference Vazire2010), and that other reports have been found to incrementally predict over self-reports in the personality domain (Connelly & Ones, Reference Connelly and Ones2010; Oh, Wang, & Mount, Reference Oh, Wang and Mount2011), scraping social media for both sources of information might be beneficial. For example, data from self and others could be correlated to assess the validity of new talent signals.

Standardize the Evoking and Extracting of Information

Using talent signals derived from social media is not without risks (Roth et al., Reference Roth, Bobko, Van Iddekinge and Thatcher2016). For example, social media platforms often lack standardization and might provide abundant information for some people and no information for others. Another major challenge is that social media such as Facebook routinely provide personal information that organizations are not allowed to use for decision making, such as race, religion, political affiliation, and sexual orientation. Once decision makers are exposed to such information (also things like physical appearance and disability status, which can be apparent from pictures, etc.), it may be difficult for them to ignore. These issues complicate the task for raters and/or machines to make sense of that information, such as when comparing job candidates.

To reduce information processing and decision making biases, we suggest taking a page out of the book of structured interviews (e.g., Campion, Palmer, & Campion, Reference Campion, Palmer and Campion1997), structured references (e.g., Taylor, Pajo, Cheung, & Stringfield, Reference Taylor, Pajo, Cheung and Stringfield2004), assessment centers (e.g., Woehr & Arthur, Reference Woehr and Arthur2003), and research on judgment and decision making in general. In these literatures, two main approaches have emerged to increase standardization. One approach consists of standardizing the questions and prompts given to candidates. Clearly, this is a challenge in the context of social media because these are typically designed to facilitate social interaction rather than to evoke job-related information (Roth et al., Reference Roth, Bobko, Van Iddekinge and Thatcher2016). This is especially the case with platforms such as Facebook, Twitter, and Google+. However, a professional and career-related social network such as LinkedIn is different because it has fixed rubrics that people are asked to complete. In addition, references can be given in LinkedIn.

Whereas standardizing the assessment of social media information could be challenging, this is less the case for other new talent signals from gamified assessment. In such gamified assessment, for instance, one might elicit relevant behavior because serious games provide a variety of fixed and dynamic stimuli to evoke job-related information (Bedwell et al., Reference Bedwell, Pavlas, Heyne, Lazzara and Salas2012). Although people will not follow the same path through the game due to its adaptive and interactive nature, the general idea is to build in more generic stimuli (Lievens, Schollaert, & Keen, Reference Lievens, Schollaert and Keen2015). For example, such generic stimuli might refer to dealing with ambiguity, conflict, scarce resources, or stress. Different rounds (levels) in a game might be built around one of these challenges and provide multiple data points of how candidates deal with them.

Another well-known standardization approach relates to providing training and other aids to raters who must extract and evaluate the information available. Again, there exists a rich literature in the interview and assessment center domains of training raters and equipping them with various aids. We urge industrial–organizational (I-O) psychologists to rely on this extensive knowledge base and devise training programs and rating tools to help raters (e.g., staffing professionals, managers) identify and evaluate relevant signals when screening social media information (Kluemper, Rosen, & Mossholder, Reference Kluemper, Rosen and Mossholder2012; McFarland & Ployhart, Reference McFarland and Ployhart2015). At the same time, there have also been rapid advancements in the automated scoring of texts, interviews, videos, and work samples (see below).

Use Mechanical Integration

Our suggestion to ensure that multiple signals are gathered from social media begs the question as to how these signals should be integrated to form a final judgment regarding candidates’ KSAOs. Generally, two broad approaches can be distinguished: (a) judgmental/clinical integration via raters and (b) mechanical/statistical integration via algorithms. In the selection and educational domains, there is meta-analytical evidence that mechanical integration outperforms judgmental integration (Kuncel, Klieger, Connelly, & Ones, Reference Kuncel, Klieger, Connelly and Ones2013). This suggests relying on machine-learning algorithms for mining social media data. Along these lines, Oswald and Putka (in press) reviewed a series of innovative tools that can be used for integrating information from big data. However, most of these tools require a large sample to train the algorithm to identify and categorize the information and then apply this algorithm to score future data (e.g., from actual job applicants).

Two Examples

Two recent studies illustrate the possibilities of using social media information to facilitate personnel decisions. Van Iddekinge, Lanivich, Roth, and Junco (in press) asked college recruiters to review the Facebook profiles of students who were on the job market and then rate the students on several general job-related constructs (e.g., adaptability, leadership, work ethic) and perceived hireability. The researchers followed up with students 6 to 12 months later to collect supervisor ratings of job performance and turnover information. The recruiter ratings did not predict performance, turnover intentions, or actual turnover. In addition, there was some evidence of subgroup differences in recruiter ratings that favored White and female ratees. Although the study design incorporated aggregation and mechanical integration, it did not incorporate the other recommendations we discussed. Importantly, Facebook information was not tied to job-specific constructs, and recruiters were not trained on how to search for and evaluate the information. Overall, Van Iddekinge et al.’s design may be closer to how recruiters and other decision makers (e.g., managers) currently scrape social media information (although many organizations likely use even less structured approaches to scrape social media information) rather than the potential for what organizations could do to maximize the potential of such information.

Park et al. (Reference Park, Schwartz, Eichstaedt, Kern, Kosinski, Stillwell and Seligman2015) also studied the use of Facebook information. Specifically, the researchers examined personality data from over 66,000 Facebook users obtained from a third-party application of Facebook (myPersonality). As such, their approach exemplifies our recommendation to adopt some standardization in eliciting stimuli. Moreover, they relied on existing theory and research that has linked language to the Big Five personality dimensions (see our first recommendation). One part of their sample was used to teach the machine to categorize linguistic features (words, phrases, and topics) into the Big Five dimensions. The stability of these language-based analyses of personality was examined by splitting the data into various time periods (test–retest reliabilities between .62 and .74). Park et al. used the other part of the sample to validate the language-based results against self- and other reports obtained from personality questionnaires (overall convergent validity correlation was .45). Results also revealed modest correlations between the language-based personality information (extracted from Facebook) and external criteria such as life satisfaction.

Epilogue

In this commentary, we emphasized that I-O psychologists should not stand on the sidelines and engage only in a debate concerning whether organizations should use the new talent signals derived from scraping social media or serious games. Such a debate has the risk of widening the existing science–practice gap in these domains. Indeed, it is important to realize that candidates and organizations are already jumping on these fast moving trains. So, the key pressing question is this: How can we inform or improve the way organizations gather and use these new talent signals to make decisions about people?

Therefore, we posit that I-O psychologists should play an active role in shaping conditions under which these new selection approaches can be used, thereby buttressing the quality of the data derived from them. We began this commentary by formulating several recommendations (i.e., link new signals to job-related constructs, adopt the aggregation principle, standardize the evoking and extracting of information, and use mechanical integration) that are based on knowledge derived from decades of research. These recommendations are not meant to be exhaustive; we urge researchers and practitioners to build on them and conduct much-needed empirical research. In the past, we successfully engaged in similar endeavors to formulate evidence-based recommendations for increasing the quality of the data gathered in unproctored Internet testing (e.g., Bartram, Reference Bartram, Cartwright and Cooper2008; Lievens & Burke, Reference Lievens and Burke2011) and in competency determinations (e.g., Campion et al., Reference Campion, Fink, Ruggeberg, Carr, Phillips and Odman2011; Lievens & Sanchez, Reference Lievens and Sanchez2007). There is no reason why we could not do this again this time.

References

Bartram, D. (2008). The advantages and disadvantages of on-line testing. In Cartwright, S. & Cooper, C. L. (Eds.), The Oxford handbook of personnel psychology (pp. 234–260). Oxford, UK: Oxford University Press.Google Scholar

Bedwell, W., Pavlas, D., Heyne, K., Lazzara, E., & Salas, E. (2012). Toward a taxonomy linking game attributes to learning: An empirical study. Simulation & Gaming, 43, 729–760.Google Scholar

Campion, M. A., Fink, A. A., Ruggeberg, B. J., Carr, L., Phillips, G. M., & Odman, R. B. (2011). Doing competencies well: Best practices in competency modeling. Personnel Psychology, 64, 225–262.Google Scholar

Campion, M. A., Palmer, D. K., & Campion, J. E. (1997). A review of structure in the selection interview. Personnel Psychology, 50, 655–702.CrossRef Google Scholar

Chamorro-Premuzic, T., Winsborough, D., Sherman, R. A., & Hogan, R. (2016). New talent signals: Shiny new objects or a brave new world? Industrial and Organizational Psychology: Perspectives on Science and Practice, 9 (3), 621–640.CrossRef Google Scholar

Connelly, B. S., & Ones, D. S. (2010). Another perspective on personality: Meta-analytic integration of observers’ accuracy and predictive validity. Psychological Bulletin, 136, 1092–1122.Google Scholar

Epstein, S. (1979). The stability of behavior: I. On predicting most of the people much of the time. Journal of Personality and Social Psychology, 37, 1097–1126.Google Scholar

Kluemper, D. H., Rosen, P. A., & Mossholder, K. W. (2012). Social networking websites, personality ratings, and the organizational context: More than meets the eye. Journal of Applied Social Psychology, 42, 1143–1172.Google Scholar

Kuncel, N. R., Klieger, D. M., Connelly, B. S., & Ones, D. S. (2013). Mechanical versus clinical data combination in selection and admissions decisions: A meta-analysis. Journal of Applied Psychology, 98, 1060–1072.CrossRef Google Scholar PubMed

Lievens, F., & Burke, E. (2011). Dealing with the threats inherent in unproctored Internet testing of cognitive ability: Results from a large-scale operational test program. Journal of Occupational and Organizational Psychology, 84, 817–824.CrossRef Google Scholar

Lievens, F., & Sanchez, J. I. (2007). Can training improve the quality of inferences made by raters in competency modeling? A quasi-experiment. Journal of Applied Psychology, 92, 812–819.Google Scholar

Lievens, F., Schollaert, E., & Keen, G. (2015). The interplay of elicitation and evaluation of trait-expressive behavior: Evidence in assessment center exercises. Journal of Applied Psychology, 100, 1169–1188.Google Scholar

McFarland, L. A., & Ployhart, R. E. (2015). Social media in organizations: A theoretical framework to guide research and practice. Journal of Applied Psychology, 100, 1653–1677.Google Scholar

Oh, I. S., Wang, G., & Mount, M. K. (2011). Validity of observer ratings of the five-factor model of personality: A meta-analysis. Journal of Applied Psychology, 96, 762–773.Google Scholar

Oswald, F. L., & Putka, D. J. (in press). Statistical methods for big data. In Tonidandel, S., King, E., & Cortina, J. (Eds.), Big data at work: The data science revolution and organizational psychology. New York, NY: Routledge.Google Scholar

Park, G., Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Kosinski, M., Stillwell, D. J., . . . Seligman, M. E. (2015). Automatic personality assessment through social media language. Journal of Personality and Social Psychology, 108, 934–952.Google Scholar

Roth, P. L., Bobko, P., Van Iddekinge, C. H., & Thatcher, J. B. (2016). Social media in employee-selection-related decisions: A research agenda for uncharted territory. Journal of Management, 42, 269–298.Google Scholar

Sackett, P. R., Lievens, F., Van Iddekinge, C. H., & Kuncel, N. R. (in press). Individual differences and their measurement: A review of 100 years of research. Journal of Applied Psychology.Google Scholar

Society for Industrial and Organizational Psychology. (2003). Principles for the validation and use of personnel selection procedures (4th ed.). Bowling Green, OH: Author.Google Scholar

Taylor, P. J., Pajo, K., Cheung, G. W, & Stringfield, P. (2004). Dimensionality and validity of a structured telephone reference check procedure. Personnel Psychology, 57, 745–772.Google Scholar

Uniform Guidelines on Employee Selection Procedures, 43 Fed. Reg. 38,295–38,309 (Aug. 25, 1978).Google Scholar

Van Iddekinge, C. H., Lanivich, S. E., Roth, P. L., & Junco, E. (in press). Social media for selection? Validity and adverse impact potential of a Facebook-based assessment. Journal of Management.Google Scholar

Vazire, S. (2010). Who knows what about a person? The self–other knowledge asymmetry (SOKA) model. Journal of Personality and Social Psychology, 28, 281–300.Google Scholar

Woehr, D. J., & Arthur, W. Jr. (2003). The construct-related validity of assessment center ratings: A review and meta-analysis of the role of methodological factors. Journal of Management, 29, 231–258.Google Scholar

Article contents

Reducing the Noise From Scraping Social Media Content: Some Evidence-Based Recommendations

Extract

Evidence-Based Recommendations

Establish Links Between New Talent Signals and Job-Related Constructs

Adopt the Principle of Aggregation

Standardize the Evoking and Extracting of Information

Use Mechanical Integration

Two Examples

Epilogue

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests