We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This study aims to validate the Palliative and Complex Chronic Pediatric Patients QoL Inventory (PACOPED QL), a new quality-of-life (QoL) assessment tool for pediatric palliative patients with complex chronic conditions. The goal is to create a comprehensive and inclusive instrument tailored to this unique population, addressing the gap in existing tools that do not meet these specific needs.
Methods
The validation process included a literature review and consultations with experts. A pilot study refined the items, followed by a cross-sectional study involving pediatric palliative patients and their caregivers. Statistical analyses, such as Cronbach’s alpha for internal consistency and exploratory factor analysis for structural validity, were utilized.
Results
The PACOPED QL, comprising 50 items across 8 domains and 6 subdomains, demonstrated strong reliability with Cronbach’s alpha and Guttman split-half reliability both exceeding .9. Validity assessments confirmed its suitability for children with complex illnesses. The tool was refined through expert consultations and pilot testing, reducing items from an initial 85 to a final 50, ensuring relevance and clarity.
Significance of results
The PACOPED QL shows strong reliability and validity in assessing QoL in pediatric palliative patients. Its comprehensive structure makes it a promising tool for clinical practice and research, addressing a critical need for a tailored assessment in this population. The instrument’s robust psychometric properties indicate its potential utility in improving the QoL assessment and care for children with life-threatening illnesses. Further studies are encouraged to confirm its effectiveness across various settings.
A recent debate on implicit measures of racial attitudes has focused on the relative roles of the person, the situation, and their interaction in determining the measurement outcomes. The chapter describes process models for assessing the roles of the situation and the person-situation interaction on the one hand and stable person-related components on the other hand in implicit measures. Latent state-trait models allow one to assess to what extent the measure is a reliable measure of the person and/or the situation and the person-situation interaction (Steyer, Geiser, & Fiege, 2012). Moreover, trait factor scores as well as situation-specific residual factor scores can be computed and related to third variables, thereby allowing one to assess to what extent the implicit measure is a valid measure of the person and/or the situation and the person-situation interaction. These methods are particularly helpful when combined with a process decomposition of implicit-measure data such as a diffusion-model analysis of the IAT (Klauer, Voss, Schmitz, & Teige-Mocigemba, 2007).
In psychophysiology, an interesting question is how to estimate the reliability of event-related potentials collected by means of the Eriksen Flanker Task or similar tests. A special problem presents itself if the data represent neurological reactions that are associated with some responses (in case of the Flanker Task, responding incorrectly on a trial) but not others (like when providing a correct response), inherently resulting in unequal numbers of observations per subject. The general trend in reliability research here is to use generalizability theory and Bayesian estimation. We show that a new approach based on classical test theory and frequentist estimation can do the job as well and in a simpler way, and even provides additional insight to matters that were unsolved in the generalizability method approach. One of our contributions is the definition of a single, overall reliability coefficient for an entire group of subjects with unequal numbers of observations. Both methods have slightly different objectives. We argue in favor of the classical approach but without rejecting the generalizability approach.
A coefficient derived from communalities of test parts has been proposed as greatest lower bound to Guttman's “immediate retest reliability.” The communalities have at times been calculated from covariances between item sets, which tends to underestimate appreciably. When items are experimentally independent, a consistent estimate of the greatest defensible internal-consistency coefficient is obtained by factoring item covariances. In samples of modest size, this analysis capitalizes on chance; an estimate subject to less upward bias is suggested. For estimating alternate-forms reliability, communality-based coefficients are less appropriate than stratified alpha.
Corrections of correlations for range restriction (i.e., selection) and unreliability are common in psychometric work. The current rule of thumb for determining the order in which to apply these corrections looks to the nature of the reliability estimate (i.e., restricted or unrestricted). While intuitive, this rule of thumb is untenable when the correction includes the variable upon which selection is made, as is generally the case. Using classical test theory, we show that it is the nature of the range restriction, not the nature of the available reliability coefficient, that determines the sequence for applying corrections for range restriction and unreliability.
The asymptotic standard deviation (SD) of the alpha coefficient with standardized variables is derived under normality.The research shows that the SD of the standardized alpha coefficient becomes smaller as the number of examinees and/or items increase. Furthermore, this research shows that the degree of the dependence of the SD on the number of items is a function of the average correlation coefficients. When the average correlation approaches 1, the SD of the alpha coefficient decreases rapidly as the number of items increase, with the order of p. On the other hand, when the items are only weakly correlated, increasing the number of items decreases the SD of the alpha coefficient at a much slower rate.
An asymptotic expression for the reliability of a linearly equated test is developed using normal theory. The reliability is expressed as the product of two terms, the reliability of the test before equating, and an adjustment term. This adjustment term is a function of the sample sizes used to estimate the linear equating transformation. The results of a simulation study indicate close agreement between the theoretical and simulated reliability values for samples greater than 200. Findings demonstrate that samples as small as 300 can be used in linear equating without an appreciable decrease in reliability.
It is argued that the generalizability theory interpretation of coefficient alpha is important. In this interpretation, alpha is a slightly biased but consistent estimate for the coefficient of generalizability in a subjects x items design where both subjects and items are randomly sampled. This interpretation is based on the “domain sampling” true scores. It is argued that these true scores have a more solid empirical basis than the true scores of Lord and Novick (1968), which are based on “stochastic subjects” (Holland, 1990), while only a single observation is available for each within-subject distribution. Therefore, the generalizability interpretation of coefficient alpha is to be preferred, unless the true scores can be defined by a latent variable model that has undisputed empirical validity for the test and that is sufficiently restrictive to entail a consistent estimate of the reliability—as, for example, McDonald’s omega. If this model implies that the items are essentially tau-equivalent, both the generalizability and the reliability interpretation of alpha can be defensible.
To date, virtually all techniques appropriate for ordinal data are based on the uniform probability distribution over the permutations. In this paper we introduce and examine an alternative probability model for the distribution of ordinal data. Preliminary to deriving the expectations of Spearman's rho and Kendall's tau under this model, we show how to compute certain conditional expectations of rho and tau under the uniform distribution. The alternative probability model is then applied to ordinal test theory, and the calculation of true scores and test reliability are discussed.
Change scores obtained in pretest–posttest designs are important for evaluating treatment effectiveness and for assessing change of individual test scores in psychological research. However, over the years the use of change scores has raised much controversy. In this article, from a multilevel perspective, we provide a structured treatise on several persistent negative beliefs about change scores and show that these beliefs originated from the confounding of the effects of within-person change on change-score reliability and between-person change differences. We argue that psychometric properties of change scores, such as reliability and measurement precision, should be treated at suitable levels within a multilevel framework. We show that, if examined at the suitable levels with such a framework, the negative beliefs about change scores can be renounced convincingly. Finally, we summarize the conclusions about change scores to dispel the myths and to promote the potential and practical usefulness of change scores.
In the last decade several authors discussed the so-called minimum trace factor analysis (MTFA), which provides the greatest lower bound (g.l.b.) to reliability. However, the MTFA fails to be scale free. In this paper we propose to solve the scale problem by maximization of the g.l.b. as the function of weights. Closely related to the primal problem of the g.l.b. maximization is the dual problem. We investigate the primal and dual problems utilizing convex analysis techniques. The asymptotic distribution of the maximal g.l.b. is obtained provided the population covariance matrix satisfies sone uniqueness and regularity assumptions. Finally we outline computational algorithms and consider numerical examples.
In theory, the greatest lower bound (g.l.b.) to reliability is the best possible lower bound to the reliability based on single test administration. Yet the practical use of the g.l.b. has been severely hindered by sampling bias problems. It is well known that the g.l.b. based on small samples (even a sample of one thousand subjects is not generally enough) may severely overestimate the population value, and statistical treatment of the bias has been badly missing. The only results obtained so far are concerned with the asymptotic variance of the g.l.b. and of its numerator (the maximum possible error variance of a test), based on first order derivatives and the asumption of multivariate normality. The present paper extends these results by offering explicit expressions for the second order derivatives. This yields a closed form expression for the asymptotic bias of both the g.l.b. and its numerator, under the assumptions that the rank of the reduced covariance matrix is at or above the Ledermann bound, and that the nonnegativity constraints on the diagonal elements of the matrix of unique variances are inactive. It is also shown that, when the reduced rank is at its highest possible value (i.e., the number of variables minus one), the numerator of the g.l.b. is asymptotically unbiased, and the asymptotic bias of the g.l.b. is negative. The latter results are contrary to common belief, but apply only to cases where the number of variables is small. The asymptotic results are illustrated by numerical examples.
To assess the reliability of congeneric tests, specifically designed reliability measures have been proposed. This paper emphasizes that such measures rely on a unidimensionality hypothesis, which can neither be confirmed nor rejected when there are only three test parts, and will invariably be rejected when there are more than three test parts. Jackson and Agunwamba's (1977) greatest lower bound to reliability is proposed instead. Although this bound has a reputation for overestimating the population value when the sample size is small, this is no reason to prefer the unidimensionality-based reliability. Firstly, the sampling bias problem of the glb does not play a role when the number of test parts is small, as is often the case with congeneric measures. Secondly, glb and unidimensionality based reliability are often equal when there are three test parts, and when there are more test parts, their numerical values are still very similar. To the extent that the bias problem of the greatest lower bound does play a role, unidimensionality-based reliability is equally affected. Although unidimensionality and reliability are often thought of as unrelated, this paper shows that, from at least two perspectives, they act as antagonistic concepts. A measure, based on the same framework that led to the greatest lower bound, is discussed for assessing how close is a set of variables to unidimensionality. It is the percentage of common variance that can be explained by a single factor. An empirical example is given to demonstrate the main points of the paper.
In this rejoinder, we examine some of the issues Peter Bentler, Eunseong Cho, and Jules Ellis raise. We suggest a methodological solid way to construct a test indicating that the importance of the particular reliability method used is minor, and we discuss future topics in reliability research.
Social scientists are frequently interested in assessing the qualities of social settings such as classrooms, schools, neighborhoods, or day care centers. The most common procedure requires observers to rate social interactions within these settings on multiple items and then to combine the item responses to obtain a summary measure of setting quality. A key aspect of the quality of such a summary measure is its reliability. In this paper we derive a confidence interval for reliability, a test for the hypothesis that the reliability meets a minimum standard, and the power of this test against alternative hypotheses. Next, we consider the problem of using data from a preliminary field study of the measurement procedure to inform the design of a later study that will test substantive hypotheses about the correlates of setting quality. The preliminary study is typically called the “generalizability study” or “G study” while the later, substantive study is called the “decision study” or “D study.” We show how to use data from the G study to estimate reliability, a confidence interval for the reliability, and the power of tests for the reliability of measurement produced under alternative designs for the D study. We conclude with a discussion of sample size requirements for G studies.
Maximum likelihood and Bayesian procedures for item selection and scoring of multidimensional adaptive tests are presented. A demonstration using simulated response data illustrates that multidimensional adaptive testing (MAT) can provide equal or higher reliabilities with about one-third fewer items than are required by one-dimensional adaptive testing (OAT). Furthermore, holding test-length constant across the MAT and OAT approaches, substantial improvements in reliability can be obtained from multidimensional assessment. A number of issues relating to the operational use of multidimensional adaptive testing are discussed.
Coefficient κ is generally defined in terms of procedures of computation rather than in terms of a population. Here a population definition is proposed. On this basis, the interpretation of κ as a measure of diagnostic reliability in characterizing an individual, and the effect of reliability, as measured by κ, on estimation bias, precision, and test power are examined. Factors influencing the magnitude of κ are identified. Strategies to improve reliability are proposed, including that of combining multiple unreliable diagnoses.
A method is presented for estimating reliability using structural equation modeling (SEM) that allows for nonlinearity between factors and item scores. Assuming the focus is on consistency of summed item scores, this method for estimating reliability is preferred to those based on linear SEM models and to the most commonly reported estimate of reliability, coefficient alpha.
Reliability captures the influence of error on a measurement and, in the classical setting, is defined as one minus the ratio of the error variance to the total variance. Laenen, Alonso, and Molenberghs (Psychometrika 73:443–448, 2007) proposed an axiomatic definition of reliability and introduced the RT coefficient, a measure of reliability extending the classical approach to a more general longitudinal scenario. The RT coefficient can be interpreted as the average reliability over different time points and can also be calculated for each time point separately. In this paper, we introduce a new and complementary measure, the so-called RΛ, which implies a new way of thinking about reliability. In a longitudinal context, each measurement brings additional knowledge and leads to more reliable information. The RΛ captures this intuitive idea and expresses the reliability of the entire longitudinal sequence, in contrast to an average or occasion-specific measure. We study the measure’s properties using both theoretical arguments and simulations, establish its connections with previous proposals, and elucidate its performance in a real case study.
One of the intriguing questions of factor analysis is the extent to which one can reduce the rank of a symmetric matrix by only changing its diagonal entries. We show in this paper that the set of matrices, which can be reduced to rank r, has positive (Lebesgue) measure if and only if r is greater or equal to the Ledermann bound. In other words the Ledermann bound is shown to be almost surely the greatest lower bound to a reduced rank of the sample covariance matrix. Afterwards an asymptotic sampling theory of so-called minimum trace factor analysis (MTFA) is proposed. The theory is based on continuous and differential properties of functions involved in the MTFA. Convex analysis techniques are utilized to obtain conditions for differentiability of these functions.