Skip to main content Accessibility help
Hostname: page-component-55597f9d44-ms7nj Total loading time: 0.668 Render date: 2022-08-11T15:52:02.757Z Has data issue: true Feature Flags: { "shouldUseShareProductTool": true, "shouldUseHypothesis": true, "isUnsiloEnabled": true, "useRatesEcommerce": false, "useNewApi": true } hasContentIssue true

Reducing the Noise From Scraping Social Media Content: Some Evidence-Based Recommendations

Published online by Cambridge University Press:  07 October 2016

Filip Lievens*
Department of Personnel Management and Work and Organizational Psychology, Ghent University, Ghent, Belgium
Chad H. Van Iddekinge
College of Business, Florida State University
Correspondence concerning this article should be addressed to Filip Lievens, Department of Personnel Management and Work and Organizational Psychology, Ghent University, Henri Dunantlaan 2, 9000 Ghent, Belgium. E-mail:
Rights & Permissions[Opens in a new window]


Chamorro-Premuzic, Winsborough, Sherman, and Hogan (2016) describe a variety of new selection approaches (e.g., “scraping” of social media information, gamified assessments) in the staffing domain that might provide new sources of information about people. The authors also mention advantages and downsides of these potentially “new talent signals.”

Copyright © Society for Industrial and Organizational Psychology 2016 

Chamorro-Premuzic, Winsborough, Sherman, and Hogan (Reference Chamorro-Premuzic, Winsborough, Sherman and Hogan2016) describe a variety of new selection approaches (e.g., “scraping” of social media information, gamified assessments) in the staffing domain that might provide new sources of information about people. The authors also mention advantages and downsides of these potentially “new talent signals.”

We suggest that the next step is to identify conditions under which these new approaches might be best used to increase their probability of providing accurate job-related information on candidates’ knowledge, skills, abilities, and other characteristics (KSAOs). Although there has been little scientific research on these new assessment methods, we posit that some guidance may be found in the over 100 years of research on personnel selection. This makes sense because, as noted in the focal article, these new tools are mainly technologically enhanced versions of traditional assessment methods. Therefore, we draw on existing personnel selection knowledge to delineate a set of general recommendations to make these new talent signals less weak and “noisy” (i.e., more reliable and valid). We focus primarily on scraping social media information, although we show how some of these recommendations may be relevant for gamified assessment as well.

Evidence-Based Recommendations

Establish Links Between New Talent Signals and Job-Related Constructs

New talent signals from people's involvement in social media refer to the content posted on social media. Examples include comments posted on Facebook or Twitter, preferences for brands mentioned on Facebook pages, comments on other people's messages, likes, and membership in specific groups. As job relatedness is a cardinal rule to ensure validity and fairness, it is pivotal to map these new signals to job-related constructs (Society for Industrial and Organizational Psychology, 2003; Uniform Guidelines on Employee Selection Procedures, 1978). To this end, job analysis methods, subject matter judgments, and theoretical models can be used. Essentially, this endeavor is similar to earlier research that mapped interviewee answers, biodata responses, or assessment center observations onto job-related constructs. Afterward, the validity of the new signals can be ascertained by correlating them with more common measures of the same constructs or with criteria the new signals should predict.

So, we recommend that organizations determine beforehand which signals are indicators of well-known individual differences such as cognitive ability, knowledge, interests, personality, or motivation (Sackett, Lievens, Van Iddekinge, & Kuncel, in press). Along these lines, Roth, Bobko, Van Iddekinge, and Thatcher (Reference Roth, Bobko, Van Iddekinge and Thatcher2016) provided a good start in describing how social media content might be mapped to existing taxonomies of individual differences (KSAOs). Clearly, our recommendation to establish evidence-based or theory-based links between these new talent signals and job-related constructs applies not only to information gathered via social media but also to candidate actions in serious games (see Bedwell, Pavlas, Heyne, Lazzara, & Salas, Reference Bedwell, Pavlas, Heyne, Lazzara and Salas2012).

Adopt the Principle of Aggregation

Once job-related signals have been gathered, aggregation is a second cardinal principle to further reduce the noise of these job-related signals. We adopt the aggregation principle (Epstein, Reference Epstein1979) for reliability purposes almost everywhere in psychology. For instance, we use multiple items in scales, rely on multiple raters (e.g., interviewers), and combine data from several observations to form an overall evaluation.

In the context of scraping social media content, it is equally important to make sure that judgments about candidates’ standing on KSAOs are made on the basis of multiple signals. The consistency among these multiple, job-related social media signals can then be determined as a way of assessing their reliability (McFarland & Ployhart, Reference McFarland and Ployhart2015). For example, the consistency among the information captured can be examined across time or across different signals thought to reflect the same job-related construct.

A specific example of such aggregation consists of combining information from different information sources. For instance, some social media content comes from self-reported information, whereas other information on social media is provided by others (e.g., endorsements and comments). Given that self- and other reports each have their respective benefits and weaknesses (Vazire, Reference Vazire2010), and that other reports have been found to incrementally predict over self-reports in the personality domain (Connelly & Ones, Reference Connelly and Ones2010; Oh, Wang, & Mount, Reference Oh, Wang and Mount2011), scraping social media for both sources of information might be beneficial. For example, data from self and others could be correlated to assess the validity of new talent signals.

Standardize the Evoking and Extracting of Information

Using talent signals derived from social media is not without risks (Roth et al., Reference Roth, Bobko, Van Iddekinge and Thatcher2016). For example, social media platforms often lack standardization and might provide abundant information for some people and no information for others. Another major challenge is that social media such as Facebook routinely provide personal information that organizations are not allowed to use for decision making, such as race, religion, political affiliation, and sexual orientation. Once decision makers are exposed to such information (also things like physical appearance and disability status, which can be apparent from pictures, etc.), it may be difficult for them to ignore. These issues complicate the task for raters and/or machines to make sense of that information, such as when comparing job candidates.

To reduce information processing and decision making biases, we suggest taking a page out of the book of structured interviews (e.g., Campion, Palmer, & Campion, Reference Campion, Palmer and Campion1997), structured references (e.g., Taylor, Pajo, Cheung, & Stringfield, Reference Taylor, Pajo, Cheung and Stringfield2004), assessment centers (e.g., Woehr & Arthur, Reference Woehr and Arthur2003), and research on judgment and decision making in general. In these literatures, two main approaches have emerged to increase standardization. One approach consists of standardizing the questions and prompts given to candidates. Clearly, this is a challenge in the context of social media because these are typically designed to facilitate social interaction rather than to evoke job-related information (Roth et al., Reference Roth, Bobko, Van Iddekinge and Thatcher2016). This is especially the case with platforms such as Facebook, Twitter, and Google+. However, a professional and career-related social network such as LinkedIn is different because it has fixed rubrics that people are asked to complete. In addition, references can be given in LinkedIn.

Whereas standardizing the assessment of social media information could be challenging, this is less the case for other new talent signals from gamified assessment. In such gamified assessment, for instance, one might elicit relevant behavior because serious games provide a variety of fixed and dynamic stimuli to evoke job-related information (Bedwell et al., Reference Bedwell, Pavlas, Heyne, Lazzara and Salas2012). Although people will not follow the same path through the game due to its adaptive and interactive nature, the general idea is to build in more generic stimuli (Lievens, Schollaert, & Keen, Reference Lievens, Schollaert and Keen2015). For example, such generic stimuli might refer to dealing with ambiguity, conflict, scarce resources, or stress. Different rounds (levels) in a game might be built around one of these challenges and provide multiple data points of how candidates deal with them.

Another well-known standardization approach relates to providing training and other aids to raters who must extract and evaluate the information available. Again, there exists a rich literature in the interview and assessment center domains of training raters and equipping them with various aids. We urge industrial–organizational (I-O) psychologists to rely on this extensive knowledge base and devise training programs and rating tools to help raters (e.g., staffing professionals, managers) identify and evaluate relevant signals when screening social media information (Kluemper, Rosen, & Mossholder, Reference Kluemper, Rosen and Mossholder2012; McFarland & Ployhart, Reference McFarland and Ployhart2015). At the same time, there have also been rapid advancements in the automated scoring of texts, interviews, videos, and work samples (see below).

Use Mechanical Integration

Our suggestion to ensure that multiple signals are gathered from social media begs the question as to how these signals should be integrated to form a final judgment regarding candidates’ KSAOs. Generally, two broad approaches can be distinguished: (a) judgmental/clinical integration via raters and (b) mechanical/statistical integration via algorithms. In the selection and educational domains, there is meta-analytical evidence that mechanical integration outperforms judgmental integration (Kuncel, Klieger, Connelly, & Ones, Reference Kuncel, Klieger, Connelly and Ones2013). This suggests relying on machine-learning algorithms for mining social media data. Along these lines, Oswald and Putka (in press) reviewed a series of innovative tools that can be used for integrating information from big data. However, most of these tools require a large sample to train the algorithm to identify and categorize the information and then apply this algorithm to score future data (e.g., from actual job applicants).

Two Examples

Two recent studies illustrate the possibilities of using social media information to facilitate personnel decisions. Van Iddekinge, Lanivich, Roth, and Junco (in press) asked college recruiters to review the Facebook profiles of students who were on the job market and then rate the students on several general job-related constructs (e.g., adaptability, leadership, work ethic) and perceived hireability. The researchers followed up with students 6 to 12 months later to collect supervisor ratings of job performance and turnover information. The recruiter ratings did not predict performance, turnover intentions, or actual turnover. In addition, there was some evidence of subgroup differences in recruiter ratings that favored White and female ratees. Although the study design incorporated aggregation and mechanical integration, it did not incorporate the other recommendations we discussed. Importantly, Facebook information was not tied to job-specific constructs, and recruiters were not trained on how to search for and evaluate the information. Overall, Van Iddekinge et al.’s design may be closer to how recruiters and other decision makers (e.g., managers) currently scrape social media information (although many organizations likely use even less structured approaches to scrape social media information) rather than the potential for what organizations could do to maximize the potential of such information.

Park et al. (Reference Park, Schwartz, Eichstaedt, Kern, Kosinski, Stillwell and Seligman2015) also studied the use of Facebook information. Specifically, the researchers examined personality data from over 66,000 Facebook users obtained from a third-party application of Facebook (myPersonality). As such, their approach exemplifies our recommendation to adopt some standardization in eliciting stimuli. Moreover, they relied on existing theory and research that has linked language to the Big Five personality dimensions (see our first recommendation). One part of their sample was used to teach the machine to categorize linguistic features (words, phrases, and topics) into the Big Five dimensions. The stability of these language-based analyses of personality was examined by splitting the data into various time periods (test–retest reliabilities between .62 and .74). Park et al. used the other part of the sample to validate the language-based results against self- and other reports obtained from personality questionnaires (overall convergent validity correlation was .45). Results also revealed modest correlations between the language-based personality information (extracted from Facebook) and external criteria such as life satisfaction.


In this commentary, we emphasized that I-O psychologists should not stand on the sidelines and engage only in a debate concerning whether organizations should use the new talent signals derived from scraping social media or serious games. Such a debate has the risk of widening the existing science–practice gap in these domains. Indeed, it is important to realize that candidates and organizations are already jumping on these fast moving trains. So, the key pressing question is this: How can we inform or improve the way organizations gather and use these new talent signals to make decisions about people?

Therefore, we posit that I-O psychologists should play an active role in shaping conditions under which these new selection approaches can be used, thereby buttressing the quality of the data derived from them. We began this commentary by formulating several recommendations (i.e., link new signals to job-related constructs, adopt the aggregation principle, standardize the evoking and extracting of information, and use mechanical integration) that are based on knowledge derived from decades of research. These recommendations are not meant to be exhaustive; we urge researchers and practitioners to build on them and conduct much-needed empirical research. In the past, we successfully engaged in similar endeavors to formulate evidence-based recommendations for increasing the quality of the data gathered in unproctored Internet testing (e.g., Bartram, Reference Bartram, Cartwright and Cooper2008; Lievens & Burke, Reference Lievens and Burke2011) and in competency determinations (e.g., Campion et al., Reference Campion, Fink, Ruggeberg, Carr, Phillips and Odman2011; Lievens & Sanchez, Reference Lievens and Sanchez2007). There is no reason why we could not do this again this time.


Bartram, D. (2008). The advantages and disadvantages of on-line testing. In Cartwright, S. & Cooper, C. L. (Eds.), The Oxford handbook of personnel psychology (pp. 234260). Oxford, UK: Oxford University Press.Google Scholar
Bedwell, W., Pavlas, D., Heyne, K., Lazzara, E., & Salas, E. (2012). Toward a taxonomy linking game attributes to learning: An empirical study. Simulation & Gaming, 43, 729760.CrossRefGoogle Scholar
Campion, M. A., Fink, A. A., Ruggeberg, B. J., Carr, L., Phillips, G. M., & Odman, R. B. (2011). Doing competencies well: Best practices in competency modeling. Personnel Psychology, 64, 225262.CrossRefGoogle Scholar
Campion, M. A., Palmer, D. K., & Campion, J. E. (1997). A review of structure in the selection interview. Personnel Psychology, 50, 655702.CrossRefGoogle Scholar
Chamorro-Premuzic, T., Winsborough, D., Sherman, R. A., & Hogan, R. (2016). New talent signals: Shiny new objects or a brave new world? Industrial and Organizational Psychology: Perspectives on Science and Practice, 9 (3), 621640.CrossRefGoogle Scholar
Connelly, B. S., & Ones, D. S. (2010). Another perspective on personality: Meta-analytic integration of observers’ accuracy and predictive validity. Psychological Bulletin, 136, 10921122.CrossRefGoogle ScholarPubMed
Epstein, S. (1979). The stability of behavior: I. On predicting most of the people much of the time. Journal of Personality and Social Psychology, 37, 10971126.CrossRefGoogle Scholar
Kluemper, D. H., Rosen, P. A., & Mossholder, K. W. (2012). Social networking websites, personality ratings, and the organizational context: More than meets the eye. Journal of Applied Social Psychology, 42, 11431172.CrossRefGoogle Scholar
Kuncel, N. R., Klieger, D. M., Connelly, B. S., & Ones, D. S. (2013). Mechanical versus clinical data combination in selection and admissions decisions: A meta-analysis. Journal of Applied Psychology, 98, 10601072.CrossRefGoogle ScholarPubMed
Lievens, F., & Burke, E. (2011). Dealing with the threats inherent in unproctored Internet testing of cognitive ability: Results from a large-scale operational test program. Journal of Occupational and Organizational Psychology, 84, 817824.CrossRefGoogle Scholar
Lievens, F., & Sanchez, J. I. (2007). Can training improve the quality of inferences made by raters in competency modeling? A quasi-experiment. Journal of Applied Psychology, 92, 812819.CrossRefGoogle ScholarPubMed
Lievens, F., Schollaert, E., & Keen, G. (2015). The interplay of elicitation and evaluation of trait-expressive behavior: Evidence in assessment center exercises. Journal of Applied Psychology, 100, 11691188.CrossRefGoogle ScholarPubMed
McFarland, L. A., & Ployhart, R. E. (2015). Social media in organizations: A theoretical framework to guide research and practice. Journal of Applied Psychology, 100, 16531677.CrossRefGoogle ScholarPubMed
Oh, I. S., Wang, G., & Mount, M. K. (2011). Validity of observer ratings of the five-factor model of personality: A meta-analysis. Journal of Applied Psychology, 96, 762773.CrossRefGoogle ScholarPubMed
Oswald, F. L., & Putka, D. J. (in press). Statistical methods for big data. In Tonidandel, S., King, E., & Cortina, J. (Eds.), Big data at work: The data science revolution and organizational psychology. New York, NY: Routledge.Google Scholar
Park, G., Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Kosinski, M., Stillwell, D. J., . . . Seligman, M. E. (2015). Automatic personality assessment through social media language. Journal of Personality and Social Psychology, 108, 934952.CrossRefGoogle ScholarPubMed
Roth, P. L., Bobko, P., Van Iddekinge, C. H., & Thatcher, J. B. (2016). Social media in employee-selection-related decisions: A research agenda for uncharted territory. Journal of Management, 42, 269298.CrossRefGoogle Scholar
Sackett, P. R., Lievens, F., Van Iddekinge, C. H., & Kuncel, N. R. (in press). Individual differences and their measurement: A review of 100 years of research. Journal of Applied Psychology.Google Scholar
Society for Industrial and Organizational Psychology. (2003). Principles for the validation and use of personnel selection procedures (4th ed.). Bowling Green, OH: Author.Google Scholar
Taylor, P. J., Pajo, K., Cheung, G. W, & Stringfield, P. (2004). Dimensionality and validity of a structured telephone reference check procedure. Personnel Psychology, 57, 745772.CrossRefGoogle Scholar
Uniform Guidelines on Employee Selection Procedures, 43 Fed. Reg. 38,295–38,309 (Aug. 25, 1978).Google Scholar
Van Iddekinge, C. H., Lanivich, S. E., Roth, P. L., & Junco, E. (in press). Social media for selection? Validity and adverse impact potential of a Facebook-based assessment. Journal of Management.Google Scholar
Vazire, S. (2010). Who knows what about a person? The self–other knowledge asymmetry (SOKA) model. Journal of Personality and Social Psychology, 28, 281300.CrossRefGoogle Scholar
Woehr, D. J., & Arthur, W. Jr. (2003). The construct-related validity of assessment center ratings: A review and meta-analysis of the role of methodological factors. Journal of Management, 29, 231258.CrossRefGoogle Scholar
You have Access
Cited by

Save article to Kindle

To save this article to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the or variations. ‘’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Reducing the Noise From Scraping Social Media Content: Some Evidence-Based Recommendations
Available formats

Save article to Dropbox

To save this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Dropbox account. Find out more about saving content to Dropbox.

Reducing the Noise From Scraping Social Media Content: Some Evidence-Based Recommendations
Available formats

Save article to Google Drive

To save this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Google Drive account. Find out more about saving content to Google Drive.

Reducing the Noise From Scraping Social Media Content: Some Evidence-Based Recommendations
Available formats

Reply to: Submit a response

Please enter your response.

Your details

Please enter a valid email address.

Conflicting interests

Do you have any conflicting interests? *