Although early detection of dementia has emerged as an important public health priority, dementia often remains undiagnosed in primary care settings until symptoms are moderate or severe (Callahan et al., 1995; Knopman et al., 2000; Larson, 1998; Valcour et al., 2000). The lack of efficient and effective methods to identify early dementia contributes to the primary care physicians' (PCP) uncertainty in diagnosing dementia (Bond et al., 2005; von Hout et al., 2000). Developing efficient and cost-effective strategies for use in primary care is challenging. As currently formulated, screening and diagnosis programs for dementia require substantial financial and human resources (Boustani et al., 2005; Callahan et al., 2006; Chodosh et al., 2006). In addition, the ethnic and racial composition of the aging population requires strategies that work equally well in diverse populations so that all patients can benefit from the current and emerging treatments (Doody et al., 2007; Peskind, 2007).
As experts in the measurement of cognitive function, neuropsychologists have long taken a leadership role in developing strategies for screening and diagnosing dementia (e.g., Albert et al., 2001; Ferris et al., 2006; Fuld et al., 1990; Jacobs et al., 1995; Masur et al., 1989; Tabert et al., 2006; Tuokko et al., 1995; Welsh et al., 1994). Several screening strategies have been developed. A traditional approach is to use tests of global cognitive function such as the Mini-Mental State Exam (MMSE: Folstein et al., 1975) or the Blessed-Information-Memory-Concentration Test (BIMC: Blessed et al., 1968) to assess the severity of cognitive impairment (Tombaugh and McIntyre, 1992). A second approach is to measure memory, the only cognitive domain where impairment is required for diagnosis with a test that controls attention and cognitive processing to identify memory impairment that is not secondary to other cognitive deficits (Grober & Buschke, 1987; Grober et al., 1988, 2000, 2008; Grober & Kawas, 1997). A third approach is briefly to measure several specific cognitive domains such as memory, attention–executive function, and visuospatial ability (Borson et al., 2005; Kilada et al., 2005; Lipton et al., 2003). A final approach is to interview a reliable informant about the patient's cognition and daily activities (Jorm & Korten, 1988). Informant interviews have the advantage of being race and education neutral unlike performance based screening tests (Jorm, 2004). Popular dementia screening instruments have been reviewed with recommendations for general practice (Brodaty et al., 2006) and for monitoring persons with mild cognitive impairment who are at increased risk of developing dementia (Peterson et al., 2001).
Neuropsychologists should play a central role in developing, implementing, and assessing the cost-effectiveness of strategies to identify early dementia in primary care. Effective screening strategies should be accurate (both sensitive and specific), and efficient. Two-stage screening models have been widely used to optimize accuracy and efficiency (Denny et al., 2000; Dunn et al., 1999; McNamee, 2003). All eligible subjects receive a brief, highly sensitive initial screen, and those who screen positive are assessed with a more specific but time-consuming second stage. This study presents the results of a two-stage approach that was designed to identify early dementia in primary care settings. Four candidate measures for the first stage of screening were selected based on their brevity and sensitivity to the cognitive domains impaired in early dementia (memory, verbal fluency, executive function, and visuospatial processing). A fifth measure was an informant questionnaire that probed cognitive change. The second stage consisted of a sensitive and specific verbal memory test. The central role of memory in both stages of the screening process derives from the requirement for memory impairment in the DSM-IV criteria for dementia.
A two-stage approach to identify early dementia was implemented in a racially mixed, urban academic primary care practice staffed by geriatricians, the Geriatric Ambulatory Practice (GAP) at Montefiore Medical Center (Bronx, NY) and validated against an independent, clinically assessed gold standard. Eligible participants (see below) who provided informed consent under an Internal Review Board-approved protocol received detailed neuropsychological and clinical assessments. Presence versus absence of dementia was established by expert consensus using baseline information that did not include the screening tests being evaluated here. These consensus diagnoses were used to assess the concurrent construct validity of the first-stage screening tests individually, and in combination, followed by the second stage to diagnose memory impairment with the overall goal of determining the most efficient, sensitive, and specific arrangement of tests to identify AD and other dementias.
The 2-hr neuropsychological evaluation was composed of the tests in the screening battery and an independent diagnostic battery used to determine dementia status. The evaluation was usually completed in two sessions approximately 3 to 4 months apart (median = 91 days; mean = 121 days) in accordance with scheduling practices at the GAP. Masters-level psychologists administered the screening and diagnostic tests when study participants came for their regularly scheduled medical appointments. The (first-stage) screening test for memory impairment was usually completed in the first session and the two multitrial list learning tasks were usually completed in the second session. A semistructured interview with the participant's designated informant was completed by telephone.
Study participants were tested between January, 2003, and December, 2005, and met the following criteria: 65 years of age or older; described themselves as white, not of Hispanic origin, or black, not of Hispanic origin; provided the name of a family member or friend who had known them for at least 5 years; had spoken English since age 30; and had adequate vision and hearing to complete the neuropsychological tests. To identify patients with mild dementia, patients with an MMSE score of less than 18 were excluded, except for two illiterate patients with scores of 13. Of the 1041 potential participants from the GAP, we contacted by phone, 35% were ineligible due to ethnicity or language and 9% due to advanced dementia, 18% were not interested, and 7% never completed the assessment. A total of 318 of the eligible patients who completed the baseline evaluation and were assigned a diagnosis provided the data for these analyses.
A consensus diagnosis for each participant was established among a neuropsychologist, a geriatrician, and a geriatric psychiatrist using DSM-IV criteria for dementia (American Psychiatric Association, 1994) purposely without input from the patient's primary care provider or knowledge of the screening test results to avoid diagnostic circularity. A report was generated for each patient containing the test scores in Table 1 along with percentile scores for each test based on the performance of GAP patients without dementia at baseline. Also included in the report were informant's responses to the Clinical Dementia Rating (CDR) interview (Morris, 1993).
Before meeting at the consensus conference, raters reviewed this information, made an independent determination of the patient's diagnostic status, and then rated the patient's cognitive performance and activities of daily living using the CDR scale (Hughes et al., 1982; Morris, 1993). At consensus conferences, patients were discussed when there was any disagreement on diagnostic criteria or CDR box score. The final CDR rating was based on the pattern of box scores (Morris, 1993). Dementia subtyping was accomplished after the conference by the study neurologist through chart review using established criteria for probable/possible AD (McKhann et al., 1984), probable/possible vascular dementia (VaD; Chui et al., 1992), probable/possible Lewy body dementia (McKeith et al., 1999), and frontotemporal dementia (Knopman et al., 2005).
Stage 1: Rapid Dementia Screen
Candidate tests for the Rapid Dementia Screen were chosen because of their previously demonstrated sensitivity and specificity for identifying cognitive impairment or early dementia (see Brodaty et al., 2006). As candidate Rapid Dementia Screens, a brief memory test, the Memory Impairment Screen (Buschke et al., 1999), was coupled with either Animal Fluency (Lipton et al., 2003), Clock Drawing (Kilada et al., 2005), or Oral Trails (Ricker & Axelrod, 1994). Adding tests that tap domains other than memory were intended to improve sensitivity, particularly to non-AD dementias where memory may not be the first effected domain. The other Rapid Dementia Screen was the short Informant Questionnaire on Cognitive Decline in the Elderly (IQCODE) (Jorm, 1994).
Memory Impairment Screen
The Memory Impairment Screen (MIS) is a 4-min, four-item, delayed free- and cued-recall controlled learning test of episodic memory (Buschke et al., 1999) and has been recommended by the American Academy of Neurology for dementia detection (Gifford & Cummings, 1999). Participants read four words aloud and then identify each word (e.g., pink) when its cue is presented (color). After 3 to 4 min of distraction, the individual is asked for free recall of the words followed by cued recall of words that are not retrieved by free recall. The number of items retrieved by free and cued recall is used to calculate the MIS score as follows: [2 × (free recall)] + [cued recall]. Scores ranged from 0 to 8.
For the Animal Fluency test (Rosen, 1980), participants are asked to generate the names of as many animals as possible in 1 min, providing a screen for semantic memory impairment. Compared with normal subjects, AD patients generate significantly fewer members of common semantic categories (Canning et al., 2004; Graham et al., 2004; Lipton et al., 2003, Monsch et al., 1994; Salmon et al., 1999), a result either of the loss of semantic information or its disorganization (e.g., Chertkow & Bub, 1990; Grober et al., 1985; Martin & Fedio, 1983).
The participant is asked to draw a clock, first by drawing a circle, then inserting the numbers, and finally drawing “hands” to a specified time. Clocks are scored for contour, numbers, hands, and center for a total of 15 points (Freeman et al., 1994). Clock Drawing is frequently recommended as a screening test for dementia (Sunderland et al., 1989) and provides information on visuospatial ability and planning.
Participants are asked to recite numbers and letters in alternating sequence (Ricker & Axelrod, 1994). In this oral version of a common paper and pencil test, visual and graphomotor reasons for poor performance are eliminated while retaining the executive behaviors that predict the subsequent development of AD in patients with memory impairment (Albert et al., 2001; Bozoki et al., 2001; Chen et al., 2001; Fabrigoule et al., 1998; Rapp & Reischies, 2005). The dependent measure was number of errors. A performance was considered impaired if a participant gave up before completing the test.
Short Informant Questionnaire on Cognitive Decline in the Elderly
The Short Informant Questionnaire on Cognitive Decline in the Elderly (IQCODE) assesses change in memory and intelligence over time as rated by a family member or friend (Jorm & Korten, 1988). The short form includes 16 of the original 26 items, and it operates as well as the long form to distinguish demented from nondemented elderly (Jorm, 1994). A 5-point scale indicates the degree of change in daily activities (e.g., remembering recent conversations and events, making decisions); a score of 3 indicates no change. A 5-year time frame was used, which is long enough to observe functional decline but avoids the difficulty in finding informants who have 10 years of contact with the participant (Barba et al., 2000; Pisani et al., 2003). Higher scores mean greater impairment. The dependent measure was the average rating of the 16 items.
Stage 2: Diagnosing Memory Impairment
Free and Cued Selective Reminding Test (FCSRT). To diagnose memory impairment, the 16-item FCSRT was used, which includes the same controlled learning procedure as the MIS (Grober & Buschke, 1987). During the study phase, subjects are asked to search a card containing four pictures (e.g., grapes) for an item that goes with a unique category cue (fruit). After all four items are identified and named, immediate cued recall of just those four items is tested. The search procedure is continued until all 16 items are identified and retrieved in immediate cued recall. After the study phase, there are three test trials consisting of free recall followed by cued recall for items not retrieved by free recall. Total recall is the sum of free and cued recall. Trials are separated by 20 s of interference. The dependent measure used here is the sum of free recall over the three test trials for a maximum score of 48.
The usual pattern in two-stage approaches to case detection is to allow sensitivity to increase in the screening stage followed by a thorough assessment of an essential domain in the second stage to increase diagnostic specificity. This pattern was not precisely followed, because time and resources are limited in primary care settings. Instead, the strategy was to limit the number of patients who would require second-stage testing. Cases of dementia missed in the first stage would presumably be detected when the patient underwent screening at a future visit.
Sensitivity, specificity, and proportion of patients who screened positively in the first stage, herein referred to as efficiency, was examined for each candidate test individually and in combination for detecting dementia at various cutoffs. Three different first-stage dementia screens were developed. In the next set of analyses, the sensitivity and specificity of FCSRT was assessed for identifying cases of dementia from the participants who screened positively in each of the three Rapid Dementia Screens. The McNemar test was then used to determine how each screen followed by FCSRT worked to identify AD and non-AD dementias. Next race and education effects were examined by linear regression in which age, education, and race were used to predict FCSRT performance in participants without dementia. Finally, to determine whether there were significant differences in the specificity or sensitivity of the Rapid Dementia Screens followed by FCSRT in African Americans versus Caucasians or in participants who differed by educational level, Pearson's χ2 tests with Yates' continuity correction were performed on the classification accuracy of noncases and cases. All p values were two-sided.
Table 2 shows the demographic characteristics of the 318 participants by dementia status and CDR score. Fifty-six participants (17%) met DSM-IV criteria for dementia: 35 (62%) had very mild dementia (CDR 0.5), 15 (27%) had mild dementia (CDR 1), and 6 (11%) had moderately severe dementia (CDR 2). Fifty-four of them were subtyped: 25 (47%) had possible or probable AD; 8 (15%) had a mixed dementia (AD+VaD); 13 (24%) had possible or probable VaD; and 8 (15%) were diagnosed with other subtypes. AD and mixed dementia cases are combined in some analyses. Of the 262 patients who did not meet criteria for dementia, 128 were assigned a CDR rating of 0 and 134 were assigned a CDR rating of 0.5. Table 3 shows the average test scores by race, dementia status, and CDR score.
Demographic information on the Bronx cohort
Means and standard deviations of screening tests by race, dementia status, and CDR rating
Stage 1: Developing the Rapid Dementia Screen
Table 4 shows the sensitivity and specificity of the individual candidate tests taken one at a time across a broad range of cut scores. Using the recommended cutoff of ≤4, the MIS had modest sensitivity (.49), high specificity (.94), and was highly efficient in that only 14% of patients required stage 2 testing. At a MIS cutoff of ≤7, sensitivity increased (.93), specificity declined (.42), and efficiency was poor: nearly two of every three patients would be sent for the second stage. In the case of Animal Fluency, using a cutoff of ≤9 produced modest sensitivity (.59), good specificity (.85), and efficiency (.23). Raising the cutoff to the recommended 15 (Canning et al., 2004) captured 9 of 10 cases of dementia but nearly three of four patients would require the second stage. At a cutoff of ≤9 on Clock Drawing, sensitivity was modest (.47), specificity was high (.92), and only 14% screened positive. Raising the cutoff to ≤13 resulted in increased sensitivity and lower specificity (.71) and reduced efficiency (36% went to stage 2). In the case of Oral Trails, with three or more errors used as the cutoff, sensitivity was modest (.44), specificity was good (.84), and only 20% required stage 2.
Sensitivity, specificity, efficiency, and 95% confidence intervals for candidate screening tests at various cutoffs
No single, brief performance-based test simultaneously provided adequate sensitivity and specificity. As planned, various combinations were tested to determine the most sensitive and specific arrangement of tests for the Rapid Dementia Screen. When the MIS (≤4) was combined with Animal Fluency (≤9) using a logical “or” as recommended in a study of telephone screening (Lipton et al., 2003), 45 of the 56 cases (i.e., sensitivity = .80) were captured, including 31 of the 34 patients with AD or AD/VaD, while maintaining good specificity (.81). Furthermore, using this combination only, 30% of participants needed to undergo second-stage memory testing. In comparison, the addition of Clock Drawing to the MIS did not capture as many additional cases as Animal Fluency and more patients screened positive, necessitating second-stage testing. In addition, there were missing data as a result of visual problems, and scoring takes time and requires a degree of judgment (Philpot, 2004). For these reasons, clock drawing was eliminated from the Rapid Dementia Screen.
As a second candidate Rapid Dementia Screen, Oral Trails was added to the combination of MIS and Animal Fluency, which improved the detection of non-AD dementias. A cutoff of 3 or more errors was linked by a logical “or” to the other two tests. Using this procedure, five of the eleven previously missed cases were identified including three with non-AD dementias. However, this second strategy was less efficient than the first in that 40% of the participants needed to undergo the second stage (McNemar's χ2 test = 35.0; df = 1; p < .0001).
The third candidate Rapid Dementia Screen was the informant-based Short IQCODE (Table 4). Eleven participants were not included in these analyses because the informant they designated refused to participate. To optimize sensitivity a lower cutoff was used than is recommended (Jorm, 1994); with a cutoff of ≥3.2, sensitivity was .89, specificity was .74, and 37% of the patients needed to undergo the second stage. This strategy was less efficient than the use of the MIS and Animal Fluency (McNemar's χ2 = 6.64; df = 1; p = .01).
Stage 2: Diagnosing Memory Impairment
The sensitivity and specificity of the FCSRT across a range of cut scores are shown at the bottom of Table 4. For purposes of developing the two-stage approach, it was assumed that participants who screened positively in each of the candidate Rapid Dementia Screens went on to the second stage. A cutoff of ≤25 in free recall summed over three trials was adopted, which maximized the sum of sensitivity and specificity. The results of applying this cut score to the participants who screened positively in each of three the Rapid Dementia Screens are shown in Table 5.
Sensitivity and specificity of FCSRT following three different Rapid Dementia Screens
According to the McNemar test, the sensitivity of the FCSRT was not significantly different when the Rapid Dementia Screen consisted of the MIS and Animal Fluency (.75); the MIS, Animal Fluency, and Oral Trails (.82); or the Short IQCODE (.77) (p's > .10). Specificity of the FCSRT also did not differ significantly as a function of the Rapid Dementia Screen used (.90, .88, .90, respectively). Positive and negative predictive values for each strategy are shown on the right of Table 5. Using the 17% prevalence of mild dementia in the current cohort, the positive predictive value (PPV), or ratio of patients with dementia who screened positive, ranged from .59 to .63. Negative predictive value, or the ratio of patients without dementia who screened negative was high (.94 to .96). PPV increases with a doubling of the base rates of dementia as shown in Table 5.
Next, sensitivity was assessed for the Rapid Dementia Screens followed by FCSRT for identifying AD and AD/VaD versus non-AD dementias shown in Table 6. The sensitivity of the MIS and Animal Fluency followed by the FCSRT was .85 for AD dementias and .56 for non-AD dementias (Fisher's exact test: p = .06). The addition of the Oral Trails had a modest effect on sensitivity for AD dementias (.91) and increased the sensitivity to non-AD dementias to .67. Despite the improved sensitivity to non-AD dementias, this second strategy was still significantly better at detecting AD than non-AD dementias (p = .04). The IQCODE showed a similar pattern: sensitivity was .84 for AD dementias and .67 for non-AD dementias (p = .19).
Sensitivity of FCSRT following three different Rapid Dementia Screens for AD and non-AD dementias
The remaining analyses concerned race and education effects. Table 7 shows the results of a linear regression in which age, education, and race were used to predict free recall in the 262 patients without dementia. Each additional year of age reduced free recall score by 0.34 points. Each additional year of education increased free recall score by 0.31 points. Being African American increased free recall by 1.60 points. Adjusting for education did not change the age effect, and vice versa. However, adjusting for age eliminated the race effect because African American participants were younger. Adjusting for education increased the race effect and vice versa, because African American participants had less education. All two-way and three-way interactions were nonsignificant, with p values ranging from .16 to .78.
Linear regression of age, education, and race on free recall performance in 262 patients without dementia
Table 8 shows the sensitivity and specificity by race of the three Rapid Dementia Screens followed by FCSRT. There was a nonsignificant trend for specificity to be higher in African Americans than in Caucasians for all three Rapid Dementia Screens followed by FCSRT according the χ2 tests (MIS+Animal Fluency; p = .09;+Oral Trails; p = .06; IQCODE; p = .07). The differences in sensitivity were not significant (p's > .30).
Sensitivity and specificity of the three screening strategies as a function of race
Table 9 shows the sensitivity and specificity by level of education for the three Rapid Dementia Screens followed by FCSRT. Sensitivity did not differ as a function of educational level for any Rapid Dementia Screen (p's >.30). Specificity tended to improve with educational level, and the difference was significant when the Rapid Dementia Screen included Oral Trails (p = .02).
Sensitivity and specificity of the three screening strategies as a function of educational level
The purpose of this study was to improve the detection of early dementia in primary care. A brief, high sensitivity dementia screen was used as a first stage, and only those individuals who failed required additional, second-stage testing to diagnose memory impairment. The approach was tested in African American and Caucasian patients from an urban, primary care practice in the Bronx, all of whom received an independent research diagnostic assessment for dementia and AD. A set of candidate dementia screening tests selected for their brevity and previously demonstrated sensitivity and specificity for dementia was applied to all patients over the age of 65.
Sensitivity and specificity of the candidate tests were assessed individually for dementia across a range of cut-scores. Though all tests performed well, Clock Drawing was sometimes impossible in the visually impaired and scoring might be difficult in primary care (Philpot, 2004). Accordingly, the MIS, Animal Fluency, and Oral Trails were emphasized as the performance-based tests to consider for the Rapid Dementia Screen. No performance-based test, used alone, provided the combination of sensitivity and specificity we sought in the Rapid Screen. Accordingly, these tests were combined to form two alternative Rapid Dementia Screens. The third alternative was the short IQCODE. Specificity was maximized to achieve the efficiency needed for primary care screening while still identifying at least three of four patients with early dementia. If individuals screen negative for dementia and have it (false negative), illness will most likely be detected and treated in a follow-up visit. The performance of the two-stage approach was evaluated by applying the FCSRT to the patients who screened positive in the first stage.
All three two-stage models performed well. Using the MIS and Animal Fluency efficiently minimized the number of patients requiring second-stage testing. Adding Oral Trials improved sensitivity. The IQCODE worked very well but can only be used if an informant is available. All three were more sensitive to AD than non-AD dementias due to the focus on memory in both stages of the strategies. Tests like the FCSRT and the MIS, which use controlled learning, powerfully discriminate between normal aging and dementia (Buschke, 1984; Buschke et al., 1995, 1999; Gebner et al., 1997; Grober et al., 1988, 2000; Grober & Kawas, 1997; Ferris et al., 2006, Peterson et al., 1994, 1995; Tounsi et al., 1999; Tuokko & Crockett, 1989). FCSRT, the memory test used here, has high sensitivity and specificity for the identification of dementia (Ferris et al., 2006; Gebner et al., 1997; Grober et al., 1988; Peterson et al., 1994; Tuokko & Crockett, 1989), and preclinical dementia (Grober & Kawas, 1997; Grober et al., 2000; Peterson et al., 1995; Robert et al., 2006; Sarazin et al., 2007). The controlled learning procedures minimize any influence of toxic-metabolic disorders on measures of memory performance, thereby reducing false-positive rates (Grober et al., 1989). Furthermore, the FCSRT has promise as a demographically neutral memory test because performance in nondemented elderly is unrelated to race or education (Grober et al., 1998; Ivnik et al., 1997).
Memory testing is critical to dementia screening because impaired memory is one of its earliest manifestations (e.g., Elias et al., 2000; Grober et al., 2000; Linn et al., 1995; Saxton et al., 2004; Tierney et al., 2005) and because memory is the only cognitive domain that must be impaired to diagnose dementia (American Psychiatric Association, 1994). Postmortem series have demonstrated that memory decline precedes decline in mental status in early pathologically defined AD (Grober et al., 1999). Finally, other causes of acquired memory impairment in the elderly are rare (Fratiglioni et al., 1991; Cummings and Benson, 1992). Therefore, in the absence of other identifiable etiologies, the identification of impaired memory is highly predictive of a diagnosis of dementia (Grober & Buschke, 1987; Grober et al., 1988, 2000; Grober & Kawas, 1997).
Sensitivity to non-AD dementias was improved by adding a measure of executive function, Oral Trails, to the Rapid Dementia Screen. This finding is consistent with evidence that individuals with non-AD dementias such as VaD or frontotemporal dementias may have greater impairment on tasks involving sequencing, set shifting, and/or self-monitoring than individuals with AD (Freilich et al., 2006), perhaps because the frontal regions of the brain (Stuss & Alexander, 2000) are especially prone to cerebrovascular disease and subsequent VaD (Roman et al., 2002).
The strategies were not as race- and education-neutral as had been hoped. Specificity of all three Rapid Dementia Screens followed by the FCSRT tended to be higher among African American than Caucasian patients and tended to improve with educational level. These trends did not reach statistical significance but might in larger samples.
Selecting one Rapid Dementia Screen over another will depend upon the clinical setting of screening, the goals of the screening program, and the consequences of false positives and false negatives (Teresi, 2007). If a reliable family member or friend is available, the short IQCODE can be used as the Rapid Dementia Screen, because its sensitivity and specificity compares favorably with the other strategies and it is equally sensitive in African Americans and Caucasians. If a family member or friend does not accompany the patient to the medical appointment, which is the typical situation in primary care, the MIS and Animal Fluency are recommended as the Rapid Dementia Screen. Using this combination, only 30% of patients screened positive and would have to undergo the second stage of FCSRT testing, significantly fewer than with the two other Rapid Dementia Screens. Three of four patients with dementia were identified with this strategy. Finally, adding Oral Trails to the Rapid Screen improved the identification of non-AD dementias.
Some of the individuals who screened positively for dementia but were not diagnosed as having dementia by the consensus process may be at increased risk for future dementia. Of 26 patients who were misclassified as having dementia with the two-stage approach, 15 had memory impairment according to the consensus process but did not meet other DSM-IV criteria. Nine of these 15 were considered demented by their PCP at the time. If some of these patients develop dementia on follow-up, the construct validity of this approach would improve.
A two-stage approach for identifying early dementia in primary care is advocated because it would be very time consuming to administer the FCSRT to everyone over the age of 65. One risk of two-stage screening is that individuals with disease may be excluded from follow-up based on the initial screen. The first stage must be very sensitive to avoid this problem (McNamee, 2003). Although there is no universally accepted rule of thumb, it has been argued that the sum of sensitivity and specificity should exceed 1.6, according to an evidence-based rule of thumb, or 1.7 to allow for shrinkage when the tests are applied to other settings (McNamee, 2003). The three two-stage strategies have sums of 1.65, 1.70, and 1.68, bracketing these benchmarks. Patients with MMSE scores of less than 18 were excluded; had they been included, most likely this rule of thumb would have been exceeded.
In addition to this external rule of thumb, it is also useful to compare the performance of the two-stage approach with alternative methods. In another study in the same cohort (Grober et al., 2008), the sensitivity and specificity of the two-stage approach was compared with that of the MMSE, the most widely used dementia screening test. In this study, the cutoff on the MMSE was adjusted to achieve the same level of sensitivity or specificity as the two-stage approach, depending upon whether classification accuracy for cases or noncases was being compared. With the specificity of both tests set to 90%, the sensitivity was 75% for the two-stage approach and 53% for the MMSE. When sensitivity of both tests was set to 75%, specificity was 95% for the two-stage approach and 73% for the MMSE. This pattern of significantly higher sensitivity and specificity for the two-stage approach compared with the MMSE was repeated in the results by race, with the two-stage approach outperforming the MMSE for both African American and Caucasian patients. This increased accuracy did not require additional resources (Grober et al., 2008).
As a practical matter, it is unlikely that PCPs will take the lead in implementing dementia screening strategies at their current stage of development. Neuropsychologists should take a leadership role in defining appropriate strategies for identifying early dementia and, together with neurologists and geropsychiatrists, should be involved in implementing and assessing the efficiency and cost-effectiveness of the different approaches in primary care settings (MacDonald et al., 2005). As the detection of dementia improves and as PCPs become more aware of the issues, they will become better able to identify for referral those patients who need detailed clinical assessments by a neuropsychologist or other specialists to address specific issues such differentiating dementia from depression or subtyping dementia based on cognitive profile (AD vs. frontotemporal dementia vs. Lewy body dementia).
While the study results are encouraging, the study has limitations. First, a limited set of dementia screening tests were evaluated and then only in specific combinations. Therefore, the sensitivity and specificity of the strategies need to be assessed in a separate validation sample. There is the likelihood of shrinkage when the methods are applied to independent samples. Second, while the results appear applicable to a range of patients, the methods must be replicated in other primary care settings and in other ethnic groups. Sensitivity of the strategies to early dementia in college-educated patients needs to be improved. Third, the size of the cohort was modest; there were only 56 cases including 34 with AD dementias. This is too small a sample to be confident that the strategies are more sensitive to AD dementias than to non-AD dementias as the current study suggests. Future studies with greater numbers of cases will make these determinations.
Identifying individuals with early dementia in primary care settings is a first step toward delivering current and emerging treatments to all seniors who need them. Early identification is an essential step toward a disease management program.