We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Measurement is an essential activity in neurology, as it allows for collecting and sharing data that can be used for description, comparison and decision making regarding the health status of patients. The adequate assessment of motor and functional signs and symptoms of movement disorders must be done with instruments that have been developed and tested following a standardized methodology. The validation of a scale or instrument is an iterative process that includes several phases and the testing of a number of psychometric properties following the principles of the Classical Test Theory or the Latent Test Theory, each with its own methods and statistical procedures. In this chapter, we review the characteristics and psychometric properties of the main measurement instruments and scales for assessing motor and functional symptoms in movement disorders, particularly those recommended by the Movement Disorders Society.
This methodological study aimed to adapt the DLS, introduced for individuals aged 18-60 years, to those aged 60 years and older and to determine its psychometric properties.
Methods
We collected the data between December 15, 2021 and April 18, 2022. We carried out the study with a sample of 60 years and older living in the city center of Burdur, Turkey. The sample was selected using snowball sampling, a non-probability sampling technique. We collected the data using a questionnaire booklet covering an 11-item demographic information form and the DLS. We utilized reliability and validity analyses in the data analysis. The analyses were performed on SPSS 23.0, and a P value < 0.05 was considered statistically significant.
Results
The mean age of the participants was found to be 68.29 (SD = 6.36). The 61-item measurement tool was reduced to 57 items by removing a total of 4 items from the scale. We also calculated Cronbach’s α values to be 0.936 for the mitigation/prevention subscale, 0.935 for the preparedness subscale, 0.939 for the response subscale, and 0.945 for the recovery/rehabilitation subscale.
Conclusions
As adapted in this study, the DLS-S can be validly and reliably used for individuals aged 60 years and older.
The last two decades have been marked by excitement for measuring implicit attitudes and implicit biases, as well as optimism that new technologies have made this possible. Despite considerable attention, this movement is marked by weak measures. Current implicit measures do not have the psychometric properties needed to meet the standards required for psychological assessment or necessary for reliable criterion prediction. Some of the creativity that defines this approach has also introduced measures with unusual properties that constrain their applications and limit interpretations. We illustrate these problems by summarizing our research using the Implicit Association Test (IAT) as a case study to reveal the challenges these measures face. We consider such issues as reliability, validity, model misspecification, sources of both random and systematic method variance, as well as unusual and arbitrary properties of the IAT’s metric and scoring algorithm. We then review and critique four new interpretations of the IAT that have been advanced to defend the measure and its properties. We conclude that the IAT is not a viable measure of individual differences in biases or attitudes. Efforts to prove otherwise have diverted resources and attention, limiting progress in the scientific study of racism and bias.
In this paper, I will review some aspects of psychometric projects that I have been involved in, emphasizing the nature of the work of the psychometricians involved, especially the balance between the statistical and scientific elements of that work. The intent is to seek to understand where psychometrics, as a discipline, has been and where it might be headed, in part at least, by considering one particular journey (my own). In contemplating this, I also look to psychometrics journals to see how psychometricians represent themselves to themselves, and in a complementary way, look to substantive journals to see how psychometrics is represented there (or perhaps, not represented, as the case may be). I present a series of questions in order to consider the issue of what are the appropriate foci of the psychometric discipline. As an example, I present one recent project at the end, where the roles of the psychometricians and the substantive researchers have had to become intertwined in order to make satisfactory progress. In the conclusion I discuss the consequences of such a view for the future of psychometrics.
This paper analyzes the theoretical, pragmatic, and substantive factors that have hampered the integration between psychology and psychometrics. Theoretical factors include the operationalist mode of thinking which is common throughout psychology, the dominance of classical test theory, and the use of “construct validity” as a catch-all category for a range of challenging psychometric problems. Pragmatic factors include the lack of interest in mathematically precise thinking in psychology, inadequate representation of psychometric modeling in major statistics programs, and insufficient mathematical training in the psychological curriculum. Substantive factors relate to the absence of psychological theories that are sufficiently strong to motivate the structure of psychometric models. Following the identification of these problems, a number of promising recent developments are discussed, and suggestions are made to further the integration of psychology and psychometrics.
Human abilities in perceptual domains have conventionally been described with reference to a threshold that may be defined as the maximum amount of stimulation which leads to baseline performance. Traditional psychometric links, such as the probit, logit, and t, are incompatible with a threshold as there are no true scores corresponding to baseline performance. We introduce a truncated probit link for modeling thresholds and develop a two-parameter IRT model based on this link. The model is Bayesian and analysis is performed with MCMC sampling. Through simulation, we show that the model provides for accurate measurement of performance with thresholds. The model is applied to a digit-classification experiment in which digits are briefly flashed and then subsequently masked. Using parameter estimates from the model, individuals’ thresholds for flashed-digit discrimination is estimated.
Borsboom (2006) attacks psychologists for failing to incorporate psychometric advances in their work, discusses factors that contribute to this regrettable situation, and offers suggestions for ameliorating it. This commentary applauds Borsboom for calling the field to task on this issue and notes additional problems in the field regarding measurement that he could add to his critique. It also chastises Borsboom for occasionally being unnecessarily perjorative in his critique, noting that negative rhetoric is unlikely to make converts of offenders. Finally, it exhorts psychometricians to make their work more accessible and points to Borsboom, Mellenbergh, and Van Heerden (2003) as an excellent example of how this can be done.
Educational assessment concerns inference about students' knowledge, skills, and accomplishments. Because data are never so comprehensive and unequivocal as to ensure certitude, test theory evolved in part to address questions of weight, coverage, and import of data. The resulting concepts and techniques can be viewed as applications of more general principles for inference in the presence of uncertainty. Issues of evidence and inference in educational assessment are discussed from this perspective.
A taxonomy of latent structure assumptions (LSAs) for probability matrix decomposition (PMD) models is proposed which includes the original PMD model (Maris, De Boeck, & Van Mechelen, 1996) as well as a three-way extension of the multiple classification latent class model (Maris, 1999). It is shown that PMD models involving different LSAs are actually restricted latent class models with latent variables that depend on some external variables. For parameter estimation a combined approach is proposed that uses both a mode-finding algorithm (EM) and a sampling-based approach (Gibbs sampling). A simulation study is conducted to investigate the extent to which information criteria, specific model checks, and checks for global goodness of fit may help to specify the basic assumptions of the different PMD models. Finally, an application is described with models involving different latent structure assumptions for data on hostile behavior in frustrating situations.
The Psychometric Society is “devoted to the development of Psychology as a quantitative rational science”. Engineering is often set in contradistinction with science; art is sometimes considered different from science. Why, then, juxtapose the words in the title:psychometric, engineering, and art? Because an important aspect of quantitative psychology is problem-solving, and engineering solves problems. And an essential aspect of a good solution is beauty—hence, art. In overview and with examples, this presentation describes activities that are quantitative psychology as engineering and art—that is, as design. Extended illustrations involve systems for scoring tests in realistic contexts. Allusions are made to other examples that extend the conception of quantitative psychology as engineering and art across a wider range of psychometric activities.
Designers rely on many methods and strategies to create innovative designs. However, design research often overlooks the personality and attitudinal factors influencing method utility and effectiveness. This article defines and operationalizes the construct design mindset and introduces the Design Mindset Inventory (D-Mindset0.1), allowing us to measure and leverage statistical analyses to advance our understanding of its role in design. The inventory’s validity and reliability are evaluated by analyzing a large sample of engineering students (N = 473). Using factor analysis, we identified four underlying factors of D-Mindset0.1 related to the theoretical concepts: Conversation with the Situation, Iteration, Co-Evolution of Problem–Solution and Imagination. The latter part of the article finds statistical and theoretically meaningful relationships between design mindset and the three design-related constructs of sensation-seeking, self-efficacy and ambiguity tolerance. Ambiguity tolerance and self-efficacy emerge as positively correlated with design mindset. Sensation-seeking, which is only significantly correlated with subconstructs of D-Mindset0.1, is both negatively and positively correlated. These relationships lend validity D-Mindset0.1 and, by drawing on previously established relationships between the three personality traits and specific behaviors, facilitate further investigations of what its subconstructs capture.
With the increased use of computer-based tests in clinical and research settings, assessing retest reliability and reliable change of NIH Toolbox-Cognition Battery (NIHTB-CB) and Cogstate Brief Battery (Cogstate) is essential. Previous studies used mostly White samples, but Black/African Americans (B/AAs) must be included in this research to ensure reliability.
Method:
Participants were B/AA consensus-confirmed healthy controls (HCs) (n = 49) or mild cognitive impairment (MCI) (n = 34) adults 60–85 years that completed NIHTB-CB and Cogstate for laptop at two timepoints within 4 months. Intraclass correlations, the Bland-Altman method, t-tests, and the Pearson correlation coefficient were used. Cut scores indicating reliable change provided.
Results:
NIHTB-CB composite reliability ranged from .81 to .93 (95% CIs [.37–.96]). The Fluid Composite demonstrated a significant difference between timepoints and was less consistent than the Crystallized Composite. Subtests were less consistent for MCIs (ICCs = .01–.89, CIs [−1.00–.95]) than for HCs (ICCs = .69–.93, CIs [.46–.92]). A moderate correlation was found for MCIs between timepoints and performance on the Total Composite (r = -.40, p = .03), Fluid Composite (r = -.38, p = .03), and Pattern Comparison Processing Speed (r = -.47, p = .006).
On Cogstate, HCs had lower reliability (ICCs = .47–.76, CIs [.05–.86]) than MCIs (ICCs = .65–.89, CIs [.29–.95]). Identification reaction time significantly improved between testing timepoints across samples.
Conclusions:
The NIHTB-CB and Cogstate for laptop show promise for use in research with B/AAs and were reasonably stable up to 4 months. Still, differences were found between those with MCI and HCs. It is recommended that race and cognitive status be considered when using these measures.
Adequate measurement of psychological phenomena is a fundamental aspect of theory construction and validation. Forming composite scales from individual items has a long and honored tradition, although, for predictive purposes, the power of using individual items should be considered. We outline several fundamental steps in the scale construction process, including (1) choosing between prediction and explanation; (2) specifying the construct(s) to measure; (3) choosing items thought to measure these constructs; (4) administering the items; (5) examining the structure and properties of composites of items (scales); (6) forming, scoring, and examining the scales; and (7) validating the resulting scales.
In this chapter we review advanced psychometric methods for examining the validity of self-report measures of attitudes, beliefs, personality style, and other social psychological and personality constructs that rely on introspection. The methods include confirmatory-factor analysis to examine whether measurements can be interpreted as meaningful continua, and measurement invariance analysis to examine whether items are answered the same way in different groups of people. We illustrate the methods using a measure of individual differences in openness to political pluralism, which includes four conceptual facets. To understand how the facets relate to the overall dimension of openness to political pluralism, we compare a second-order factor model and a bifactor model. We also check to see whether the psychometric patterns of item responses are the same for males and females. These psychometric methods can both document the quality of obtained measurements and inform theorists about nuances of their constructs.
This study evaluated the validity and reliability of the Persian version of the Disaster Resilience Measuring Tool (DRMT-C19).
Methods
The research was a methodological, psychometric study. Standard translation processes were performed. Face validity and content validity were determined along with construct and convergent validity. To determine the final version of the questionnaire, 483 health care rescuers were selected using a consecutive sampling method. Other resilience-related questionnaires were used to assess concurrent validity. All quantitative data analyses were conducted using SPSS 22 and Jamovi 2.3.28 software.
Results
The content validity and reliability were indicated using Scale’s Content Validity Ratio (S-CVR) = 0.92 and Scale’s Content Validity Index (S-CVI) = 0.93. The comprehensiveness of the measurement tool = 0.875%. Cronbach’s α = 0.89 and the test re-test reliability using interclass correlation coefficients (ICC) = 0.68 to 0.92. Exploratory factor analysis determined 4 factors which accounted for more than 58.54% of the variance among the items. Confirmatory factor analysis determined 12 factors. The concurrent validity between the DRMT-C19 and the Connor-Davidson Resilience Scale (CD-RISC) was r = 0.604 (P ≤ 0.0001).
Conclusions
The DRMT-C19 has satisfactory psychometric properties and is a valid, reliable, and valuable tool for assessing resilience against disasters in Iran’s Persian-speaking health care rescuers.
In light of the growing threat of climate change and urgency of mitigation at the societal and individual level, an exponentially growing body of research has addressed how and what people think about climate change—ranging from basic judgments of truth and attitudes about risk to predictions of future outcomes. However, the field is also beset by a striking variety of items and scales used to measure climate change beliefs, with notable differences in content, untested structural assumptions, and unsatisfactory or unknown psychometric properties. In a series of four studies (total N = 2,678), scales for the assessment of climate change beliefs are developed that are comprehensive and balanced in content and psychometrically sound. The latent construct structure is tested, and evidence of high rank-order stability (1-year retest-reliability) and predictive validity (for policy preferences and actual behavior) provided.
This study examined the validity of a visual inspection time (IT) task as a measure of processing speed (PS) in a sample of children with and without cerebral palsy (CP). IT tasks measure visualization speed without focusing on the motor response time to indicate decision making about the properties of those stimuli.
Methods:
Participants were 113 children ages 8–16, including 45 with congenital CP, and 68 typically developing peers. Measures were a standard visual IT task that required dual key responding and a modified version using an assistive technology button with response option scanning. Performance on these measures was examined against traditional Wechsler PS measures (Coding, Symbol Search).
Results:
IT performance shared considerable variance with traditional paper-pencil PS measures for the group with CP, but not necessarily in the typically developing group. Concurrent validity was found for both IT task versions with traditional PS measures in the group with CP. IT classification accuracy for lowered PS showed modest sensitivity and good specificity particularly for the modified IT task.
Conclusions:
As measures of PS in children with CP who are unable to validly participate in traditional PS tasks, IT tasks demonstrate adequate concurrent validity and may serve as a beneficial alternative measure of PS in this population.
This chapter considers the role of neuropsychology in the diagnostic process. It covers who can undertake a neuropsychological assessment, when to undertake an assessment, and some of the assumptions underlying neuropsychological assesssment. Basic psychometrics are covered, using the premise that undertanding a few basic concepts is sufficient for most practioners as more complex ideas are developed from these basics. This includes the normal distribution, different types of average, the standard deviation, and the correlation. Next, the relationship between different tyes of metrics is discussed, focusing on IQ/Index scores, T-scores, scaled scores, and percentiles.
Adult attention-deficit hyperactivity disorder (ADHD) clinics are in their infancy in Ireland and internationally. There is an urgent need for clinical evaluation of these services. Until now, clinical outcomes have relied mainly on functional scales and/or quality of life. However, adult ADHD is a longstanding disorder with many comorbidities. Although medication for ADHD symptoms can have immediate effects, co-occurring problems may take considerably longer to remediate.
Aims
To present the psychometrics of a short outcome measure of key clinical areas including symptoms.
Method
The ADHD Clinical Outcome Scale (ACOS), developed by the authors, is a clinician-rated scale and was administered in consecutive adults attending an ADHD clinic. A modified version was completed by the participant. A second clinician independently administered the scale in a subsample. ACOS consists of 15 items rated on a Likert scale. Two self-report scales, the Adult ADHD Quality of Life Questionnaire (AAQoL) and Weiss Functional Impairment Rating Scale (WFIRS), were also administered.
Results
The mean age of 148 participants was 30.1 years (s.d. = 9.71), and 81 were female (54.7%). The correlation for interrater reliability was r = 0.868, and that between the participant and clinician versions was r = 0.663. The intraclass correlation coefficient for the internal consistency was 0.829, and the correlations for concurrent validity with total AAQoL and WFIRS scores were r = −0.573 and r = 0.477, respectively. Factor analysis revealed four factors: (a) attentional/organisational problems; (b) hyperactivity/impulsivity; (c) comorbidities; and (d) alcohol/drug use, self-harm and tension in relationships.
Conclusions
The psychometrics of the ACOS are promising, and the inclusion of typically co-occurring clinical domains makes it suitable for use as a clinician-rated outcome measure in every contact with patients attending adult ADHD clinics.
Whether the recent rise in adolescent self-reported depressive symptoms is influenced by changing reporting behavior is much debated. Most studies use observed sum scores to document trends but fail to assess whether their measures are invariant across time, a prerequisite for meaningful inferences about change. We examined whether measurement noninvariance, indicative of changing perceptions and reporting of symptoms, may influence the assessment of time trends in adolescent depressive symptoms.
Methods
Data stem from the nationwide repeated cross-sectional Ungdata-surveys (2010–2019) of 560 712 responses from adolescents aged 13 to 19 years. Depressive symptoms were measured with the Kandel and Davies' six-item Depressive Mood Inventory. Using structural equation modeling, we examined measurement invariance across time, gender and age, and estimated the consequences of noninvariance on cross-cohort time trends.
Results
Across most conditions, the instrument was found measurement invariant across time. The few noninvariant parameters detected had negligible impact on trend estimates. From 2014, latent mean depressive symptom scores increased among girls. For boys, a U shaped pattern was detected, whereby an initial decrease in symptoms was followed by an increase from 2016. Larger issues of noninvariance were found across age in girls and between genders.
Conclusions
From a measurement perspective, the notion that changed reporting of symptoms has been an important driver of secular trends in depressive symptoms was not supported. Thus, other causes of these trends should be considered. However, noninvariance across age (in girls) and gender highlights that depressive symptoms are not necessarily perceived equivalently from early to late adolescence and across gender.