To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure email@example.com
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
We aimed to identify barriers for meeting the fruit, vegetable and fish guidelines in older Dutch adults and to investigate socio-economic status (SES) differences in these barriers. Furthermore, we examined the mediating role of these barriers in the association between SES and adherence to these guidelines.
Longitudinal Aging Study Amsterdam (LASA), the Netherlands.
We used data from 1057 community-dwelling adults, aged 55–85 years. SES was measured by level of education and household income. An FFQ was used to assess dietary intake and barriers were measured with a self-reported lifestyle questionnaire.
Overall, 48·9 % of the respondents perceived a barrier to adhere to the fruit guideline, 40·0 % for the vegetable and 51·1 % for the fish guideline. The most frequently perceived barriers to meet the guidelines were the high price of fruit and fish and a poor appetite for vegetables. Lower-SES groups met the guidelines less often and perceived more barriers. The association between income and adherence to the fruit guideline was mediated by ‘perceiving any barrier to meet the fruit guideline’ and the barrier ‘dislike fruit’. The association between income and adherence to the fish guideline was mediated by ‘perceiving any barrier to meet the fish guideline’ and the barrier ‘fish is expensive’.
Perceived barriers for meeting the dietary guidelines are common in older adults, especially in lower-SES groups. These barriers and in particular disliking and cost concerns explained the lower adherence to the guidelines for fruit and fish in lower-income groups in older adults.
Measurements are central to clinical practice and medical and health research. They form the basis of diagnosis, prognosis and evaluation of the results of medical interventions. Advances in diagnosis and care that were made possible, for example, by the widespread use of the Apgar scale and various imaging techniques, show the power of well-designed, appropriate measures. The key words here are ‘well-designed’ and ‘appropriate’. A decision-maker must know that the measure used is adequate for its purpose, how it compares with similar measures and how to interpret the results it produces.
For every patient or population group, there are numerous instruments that can be used to measure clinical condition or health status, and new ones are still being developed. However, in the abundance of available instruments, many have been poorly or insufficiently validated. This book primarily serves as a guide to evaluate properties of existing measurement instruments in medicine, enabling researchers and clinicians to avoid using poorly validated ones or alerting them to the need for further validation.
Validity is defined by the COSMIN panel as ‘the degree to which an instrument truly measures the construct(s) it purports to measure’ (Mokkink et al., 2010a). This definition seems to be quite simple, but there has been much discussion in the past about how validity should be assessed and how its results should be interpreted. Psychologists, in particular, have struggled with this problem, because, as we saw in Chapters 2 and 3, they often have to deal with ‘unobservable’ constructs. This makes it difficult for them to judge whether they are measuring the right thing. In general, three different types of validity can be distinguished: content validity, criterion validity and construct validity. Content validity focuses on whether the content of the instrument corresponds with the construct that one intends to measure, with regard to relevance and comprehensiveness. Criterion validity, applicable in situations in which there is a gold standard for the construct to be measured, refers to how well the scores of the measurement instrument agree with the scores on the gold standard. Construct validity, applicable in situations in which there is no gold standard, refers to whether the instrument provides the expected scores, based on existing knowledge about the construct. Within these three main types of validity, there are numerous subtypes, as we will see later in this chapter.
Systematic reviews are made for many different types of studies, such as randomized clinical trials (RCTs), observational studies and diagnostic studies. Researchers, doctors and policy-makers use the results and conclusions of systematic reviews for research purposes, development of guidelines, and evidence-based patient care and policy-making. It saves them a considerable amount of time in searching for literature, and reading and interpreting the relevant articles. For the same purposes, more and more systematic reviews of studies focusing on the measurement properties of measurement instruments are being published. The aim of such reviews is to find all the existing evidence of the properties of one or more measurement instruments, to evaluate the strength of this evidence, and come to a conclusion about the best instrument available for a particular purpose. They may also result in a recommendation for additional research.
Measuring is the cornerstone of medical research and clinical practice. Therefore, the quality of measurement instruments is crucial. This book offers tools to inform the choice of the best measurement instrument for a specific purpose, methods and criteria to support the development of new instruments, and ways to improve measurements and interpretation of their results.
With this book, we hope to show the reader, among other things, why it
is usually a bad idea to develop a new measurement instrument
that objective measures are not better than subjective measures
that Cronbach???s alpha has nothing to do with validity
why valid instruments do not exist and
how to improve the reliability of measurements
The book is applicable to all medical and health fields and not directed at a specific clinical discipline. We will not provide the reader with lists of the best measurement instruments for paediatrics, cancer, dementia and so on ??? but rather with methods for evaluating measurement instruments and criteria for choosing the best ones. So, the focus is on the evaluation of instrument measurement properties, and on the interpretation of their scores.
The success of the Apgar score demonstrates the astounding power of an appropriate clinical instrument. This down-to-earth book provides practical advice, underpinned by theoretical principles, on developing and evaluating measurement instruments in all fields of medicine. It equips you to choose the most appropriate instrument for specific purposes. The book covers measurement theories, methods and criteria for evaluating and selecting instruments. It provides methods to assess measurement properties, such as reliability, validity and responsiveness, and interpret the results. Worked examples and end-of-chapter assignments use real data and well-known instruments to build your skills at implementation and interpretation through hands-on analysis of real-life cases. All data and solutions are available online. This is a perfect course book for students and a perfect companion for professionals/researchers in the medical and health sciences who care about the quality and meaning of the measurements they perform.
Field-testing of the measurement instrument is still part of the development phase. When a measurement instrument is considered to be satisfactory after one or more rounds of pilot-testing, it has to be applied to a large sample of the target population. The aims of this field-testing are item reduction and obtaining insight into the structure of the data, i.e. examining the dimensionality and then deciding on the definitive selection of items per dimension. These issues are only relevant for multi-item instruments that are used to measure unobservable constructs. Therefore, the focus of this chapter is purely on these measurement instruments. Other newly developed measurement instruments (e.g. single-item patient-reported outcomes (PROs)) and instruments to measure observable constructs go straight from the phase of pilot-testing to the assessment of validity, responsiveness and reliability (see Figure 3.1).
This chapter forms the backbone of the book. It deals with choices and decisions about what we measure and how we measure it. In other words, this chapter deals with the conceptual model behind the content of the measurements (what), and the methods of measurements and theories on which these are based (how). As described in Chapter 1, the scope of measurement in medicine is broad and covers many and quite different concepts. It is essential to define explicitly what we want to measure, as that is the ‘beginning of wisdom’.
In this chapter, we will introduce many new terms. An overview of these terms and their explanations is provided in Table 2.1.
Different concepts and constructs require different methods of measurement. This concerns not only the type of measurement instrument, for example an X-ray, performance test or questionnaire, but also the measurement theory underlying the measurements. Many of you may have heard of classical test theory (CTT), and some may also be familiar with item response theory (IRT). Both are measurement theories. We will explain the essentials of different measurement theories and discuss the assumptions to be made.
An essential requirement of all measurements in clinical practice and research is that they are reliable. Reliability is defined as ‘the degree to which the measurement is free from measurement error’ (Mokkink et al., 2010a). Its importance often remains unrecognized until repeated measurements are performed. To give a few examples of reliability issues: radiologists want to know whether their colleagues interpret X-rays or specific scans in the same way as they do, or whether they themselves would give the same rating if they had to assess the same X-ray twice. These are called the inter-rater and the intra-rater reliability, respectively. Repeated measurements of fasting blood glucose levels in patients with diabetes may differ due to day-to-day variation or to the instruments used to determine the blood glucose level. These sources of variation play a role in test–retest reliability. In a pilot study, we are interested in the extent of agreement between two physiotherapists who assess the range of movement in a shoulder, so that we can decide whether or not their ratings can be used interchangeably in the main study. The findings of such performance tests may differ for several reasons. For example, patients may perform the second test differently because of their experience with the first test, the physiotherapists may score the same performance differently or the instructions given by one physiotherapist may motivate the patients more than the instructions given by the other physiotherapist.
Technical developments and advances in medical knowledge mean that new measurement instruments are still appearing in all fields of medicine. Think about recent developments such as functional MRI and DNA microarrays. Furthermore, existing instruments are continuously being refined and existing technologies are being applied beyond their original domains. The current attention to patient-oriented medicine has shifted interest from pathophysiological measurements to impact on functioning, perceived health and quality of life (QOL). Patient-reported outcomes (PROs) have therefore gained importance in medical research.
It is clear that the measurement instruments used in various medical disciplines differ greatly from each other. Therefore, it is evident that details of the development of measurement instruments must be specific to each discipline. However, from a methodological viewpoint, the basic steps in the development of all these measurement instruments are the same. Moreover, basic requirements with regard to measurement properties, which have to be considered in evaluating the adequacy of a new instrument, are similar for all measurement instruments. Chapters 3 and 4 are written from the viewpoint of developers of measurement instruments. When describing the different steps we have the development of PROs in mind. However, at various points in this chapter we will give examples to show analogies with other measurement instruments in medicine.
The ultimate goal of medicine is to cure patients. Therefore, assessing whether the disease status of patients has changed over time is often the most important objective of measurements in clinical practice and clinical and health research. In Section 3.2.3, we stated that we need measurement instruments with an evaluative purpose or application to detect changes in health status over time. These instruments should be responsive. Responsiveness is defined by the COSMIN panel as ‘the ability of an instrument to detect change over time in the construct to be measured’ (Mokkink et al., 2010a). In essence, when assessing responsiveness the hypothesis is tested that if patients change on the construct of interest, their scores on the measurement instrument assessing this construct change accordingly. The approach to assess responsiveness is quite similar as for validity, as we will show in this chapter. In Section 7.2, we will start by elaborating a bit more on the concept of responsiveness. We will discuss the relationship between responsiveness and validity, taking responsiveness as an aspect of validity, in a longitudinal context. We will also elaborate on the definition of responsiveness and the impact of this definition on the assessment of responsiveness.
After addressing the development of measurement instruments in Chapters 3 and 4 and evaluating measurement properties (i.e. reliability, validity and responsiveness) in Chapters 5–7, it is time to pay attention to the interpretability of the scores when applying the measurement instruments. For well-known instruments, such as blood pressure measurements and the Apgar score, the interpretability will cause no problems, but for new or lesser known instruments this may be challenging. This particularly applies to the scores for multi-item measurement instruments, the meaning of which is not immediately clear. For example, in a randomized trial on back pain carried out in the United Kingdom, the effectiveness of exercise therapy and manipulation was compared with usual care in 1334 patients with low back pain. The researchers used the Roland–Morris Disability Questionnaire (RDQ) to assess functional disability (UK BEAM trial team, 2004). The RDQ has a 0–24-point scale, with a score of 0 indicating no disability, and 24 indicating very severe disability. The mean baseline score for the patients with low back pain was 9.0. In the group who received usual care, the mean RDQ value decreased to 6.8 after 3 months, resulting in an average improvement of 2.2 points. This gives rise to the following questions: What does a mean value of 9.0 points on the 0–24 RDQ scale mean? In addition, is an improvement of 2.2 points meaningful for the patients? The primary focus of this chapter is on the interpretability of scores and change scores on a measurement instrument. In other words, the aim is to learn more about the measurement instrument, and not about the disease under study.
A decline in everyday cognitive functioning is important for diagnosing dementia. Informant questionnaires, such as the informant questionnaire on cognitive decline in the elderly (IQCODE), are used to measure this. Previously, conflicting results on the IQCODEs ability to discriminate between Alzheimer's disease (AD), mild cognitive impairment (MCI), and cognitively healthy elderly were found. We aim to investigate whether specific groups of items are more useful than others in discriminating between these patient groups. Informants of 180 AD, 59 MCI, and 89 patients with subjective memory complaints (SMC) completed the IQCODE. To investigate the grouping of questionnaire items, we used a two-dimensional graded response model (GRM).The association between IQCODE, age, gender, education, and diagnosis was modeled using structural equation modeling. The GRM with two groups of items fitted better than the unidimensional model. However, the high correlation between the dimensions (r=.90) suggested unidimensionality. The structural model showed that the IQCODE was able to differentiate between all patient groups. The IQCODE can be considered as unidimensional and as a useful addition to diagnostic screening in a memory clinic setting, as it was able to distinguish between AD, MCI, and SMC and was not influenced by gender or education. (JINS, 2011, 17, 674–681)
Email your librarian or administrator to recommend adding this to your organisation's collection.