Introduction
The contribution of primary care to health system performance has been widely examined nationally and internationally (Starfield et al., Reference Starfield, Shi and Macinko2005; Kringos et al., Reference Kringos, Boerma, Van der zee and Groenewegen2013). A systematic review of 36 studies in low- and middle-income countries showed that strong primary care leads to improved and more equitable health outcomes, especially in infants and children (Macinko et al., Reference Macinko, Starfield and Erinosho2009). Another study of 31 high-income countries in Europe showed that strong primary care system is associated with improved population health outcomes, reduced socioeconomic inequality in health outcomes, fewer unnecessary hospitalizations, and slower increases in the overall health-care expenditures (Kringos et al., Reference Kringos, Boerma, Van der zee and Groenewegen2013). Recognizing the value and effectiveness of primary care, many countries including China have identified primary care transformation as a major component of health reform.
Measuring progress towards high-performing primary health care can contribute to the achievement of sustainable development goals. Measuring the quality of primary care from patients’ perspective can provide actionable and comprehensive performance information to guide primary care reform efforts. Patients are the best evaluators of key aspects of their health care, including accessibility, continuity, interpersonal communication, respectfulness, family-centered care, whole-person care, and cultural sensitivity (Haggerty et al., Reference Haggerty, Burge, Levesque, Gass, Pineault, Beaulieu and Santor2007). To strengthen people-centered integrated primary care system building, there is increasing interests in patient experience measurement globally (Kruk et al., Reference Kruk, Pate and Mullan2017). Many instruments have been developed, such as Primary Care Assessment Tool [(PCAT); Shi et al., Reference Shi, Starfield and Xu2001].
The PCAT developed by Barbara Starfield has been extensively used internationally (Shi et al., Reference Shi, Starfield and Xu2001). Inspired by the World Health Organization definition of primary care, the PCAT was originally developed to measure the extent to which primary care is achieved from user perspective in the United States. Seven primary care domains were included in the original English PCAT-adult version: first contact utilization and access, ongoing care, coordination with specialists, comprehensiveness of service available and provided, family centeredness, community orientation, and cultural competency (Shi et al., Reference Shi, Starfield and Xu2001).
The original English PCAT-adult version has been translated into many languages. The PCAT validation studies were mostly conducted in the following countries: China (Wang et al., Reference Wang, Shi, Yin, Lai, Maitland and Nicholas2014; Yang et al., Reference Yang, Shi, Lebrun, Zhou, Liu and Wang2013; Wang et al., Reference Wang, Wong, Wong, Wang, Wei, Li, Tang and Griffiths2015a; Wei et al., Reference Wei, Li, Yang, Wong, Chong, Shi, Wong, Xu, Zhang, Tang, Li, Meng and Griffiths2015; Mei et al., Reference Mei, Liang, Shi, Zhao, Wang and Kuang2016), Canada (Haggerty et al., Reference Haggerty, Burge, Beaulieu, Pineault, Beaulieu, Levesque, Santor, Gass and Lawson2011b), South Korea (Lee et al., Reference Lee, Choi, Sung, Kim, Chung, Kim, Jeon and Park2009), Argentina (Berra et al., Reference Berra, Hauser, Audisio, Mantaras, Nicora, De Oliveira, Starfield and Harzheim2013; Vazquez Pena et al., Reference Vazquez Pena, Harzheim, Terrasa and Berra2017), Spain (Pasarin et al., Reference Pasarin, Berra, Rajmil, Solans, Borrell and Starfield2007; Berra et al., Reference Berra, Rocha, Rodriguez-Sanz, Pasarin, Rajmil, Borrell and Starfield2011), Brazil (Macinko et al., Reference Macinko, Almeida and De Sa2007), Japan (Aoki et al., Reference Aoki, Inoue and Nakayama2016), Vietnam (Hoa et al., Reference Hoa, Tam, Peersman, Derese and Markuns2018), Turkey (Lağarlıa et al., Reference Lağarlia, Eserb and Baydurc2014), and South Africa (Bresick et al., Reference Bresick, Sayed, Le Grange, Bhagwan and Manga2015). These validation studies suggest reasonable psychometric properties in the specific country context, but some common problems emerge, especially in factor structure and with response options. For example, factor analytic models do not support the underlying theoretical domains in several language versions (Lee et al., Reference Lee, Choi, Sung, Kim, Chung, Kim, Jeon and Park2009; Yang et al., Reference Yang, Shi, Lebrun, Zhou, Liu and Wang2013; Mei et al., Reference Mei, Liang, Shi, Zhao, Wang and Kuang2016). Response distributions tend to skew toward more favorable answers in most language versions, including English, compromising the formal statistical assumptions in the most psychometric analysis (Shi et al., Reference Shi, Starfield and Xu2001; Macinko et al., Reference Macinko, Almeida and De Sa2007; Pasarin et al., Reference Pasarin, Berra, Rajmil, Solans, Borrell and Starfield2007; Lee et al., Reference Lee, Choi, Sung, Kim, Chung, Kim, Jeon and Park2009; Berra et al., Reference Berra, Rocha, Rodriguez-Sanz, Pasarin, Rajmil, Borrell and Starfield2011; Reference Berra, Hauser, Audisio, Mantaras, Nicora, De Oliveira, Starfield and Harzheim2013; Haggerty et al., Reference Haggerty, Beaulieu, Lawson, Santor, Fournier and Burge2011a; Reference Haggerty, Burge, Beaulieu, Pineault, Beaulieu, Levesque, Santor, Gass and Lawson2011b; Yang et al., Reference Yang, Shi, Lebrun, Zhou, Liu and Wang2013; Lağarlıa et al., Reference Lağarlia, Eserb and Baydurc2014; Wang et al., Reference Wang, Shi, Yin, Lai, Maitland and Nicholas2014; Bresick et al., Reference Bresick, Sayed, Le Grange, Bhagwan and Manga2015; Aoki et al., Reference Aoki, Inoue and Nakayama2016; Mei et al., Reference Mei, Liang, Shi, Zhao, Wang and Kuang2016; Vazquez Pena et al., Reference Vazquez Pena, Harzheim, Terrasa and Berra2017).
Some of these issues were also found in the Tibetan version of PCAT that was translated from one (Yang et al., Reference Yang, Shi, Lebrun, Zhou, Liu and Wang2013) of the three available Chinese versions (Yang et al., Reference Yang, Shi, Lebrun, Zhou, Liu and Wang2013; Wei et al., Reference Wei, Li, Yang, Wong, Chong, Shi, Wong, Xu, Zhang, Tang, Li, Meng and Griffiths2015; Wang et al., Reference Wang, Wong, Wong, Wang, Wei, Li, Tang and Griffiths2015a). The Chinese version developed by Yang et al. (Reference Yang, Shi, Lebrun, Zhou, Liu and Wang2013) was adapted from adult version of PCAT longer version. The initial PCAT-Tibet included six domains of first contact, continuity, coordination, comprehensiveness, family-centeredness, and community orientation. The Tibet Autonomous Region is located in southwestern China, at an average elevation of 4000 m and an area about one eighth of China’s area, with a population of 3 million scattered over this large region. Since the national health reform initiated in China in 2009 (Yip et al., Reference Yip, Hsiao, Chen, Hu, Ma and Maynard2012) township health centers in Tibet were identified as the main primary care provider, and they received strong support from regional and national governments, including more health staff and salary increases, need-based training, and capital investments in infrastructure and necessary equipment. Actionable information about the effectiveness of these programs was needed for directing the resource allocation and to address the specific weakness in primary care service delivery.
The initial validation of PCAT-Tibetan suggested departures from the original factor structure and confirmed the skewed responses seen in other studies (Wang et al., Reference Wang, Shi, Yin, Lai, Maitland and Nicholas2014). For example, three original PCAT domains (first contact, continuity, and coordination) were split across five domains in the PCAT-Tibetan; and the original comprehensiveness domain was represented by two domains (Wang et al., Reference Wang, Shi, Yin, Lai, Maitland and Nicholas2014). These issues in psychometric properties in common with results from validation studies in other PCAT versions suggest a need for advanced psychometric analysis to examine the appropriateness of domains and items in relative to the original PCAT. In order to further examine the construct validity and reliability of PCAT-Tibetan version, this article reports on the results of confirmatory factor analysis to test the congruence between the theoretical primary care domains and the empirical results in Tibet, and on item response theory analysis to examine item performance and the appropriateness of item response options.
Methods
This was a further analysis of the initial PCAT-Tibetan validation study that was conducted among 1386 patients who visited their primary care providers in three different types of health facilities in two prefectures in 2013. This survey was administered through face-to-face interview. The detailed information about sampling and data collection can be found in our previous publication (Wang et al., Reference Wang, Shi, Yin, Lai, Maitland and Nicholas2014).
Maximum likelihood (ML) imputation method was used to replace the missing values; match age, sex, education, self-rated health status; 295 respondents were excluded because of missing values for the matching variables, leaving 1091 respondents in subsequent analyses. Those excluded were more likely to be female and less educated. To examine the robustness of our conclusion, we excluded all respondents who had at least one missing value on any item (listwise deletion) and repeated all data analyses (n = 729), which did not alter any of the general conclusions. Values were also imputed for the ‘not sure/don’t remember’ response option as an alternative to attributing the pre-set value of 2.5. In general, estimates produced with listwise deletion are less efficient than other methods of handling missing data. Therefore, we reported only results using database with ML imputation (Enders, Reference Enders2001).
In user-evaluation research, it is common to treat report and rating values as quasi-cardinal (DeVellis, Reference DeVellis1991). The items measuring patient experience in this study were strictly ordinal level, so we treated them as interval level, which was consistent with previous validation studies of original PCAT version in the United States (Shi et al., Reference Shi, Starfield and Xu2001) and the PCAT versions in other languages (Lee et al., Reference Lee, Choi, Sung, Kim, Chung, Kim, Jeon and Park2009; Haggerty et al., Reference Haggerty, Burge, Beaulieu, Pineault, Beaulieu, Levesque, Santor, Gass and Lawson2011b; Yang et al., Reference Yang, Shi, Lebrun, Zhou, Liu and Wang2013; Aoki et al., Reference Aoki, Inoue and Nakayama2016; Mei et al., Reference Mei, Liang, Shi, Zhao, Wang and Kuang2016). This approach was also used by the validation study of World Health Organization’s health system responsiveness survey (Valentine et al., Reference Valentine, Bonsel and Murray2007). Therefore, the four-point Likert response scale in PCAT-Tibetan version was treated as continuous variable.
First, the inter-item correlation and exploratory factor analysis (principle component analysis) by domain was conducted to flag items with low correlations (Pearson <0.20) or low factor loading (<0.30) for potential deletion. We repeated the exploratory factor analysis after deleting each item with low factor loading until all retained items had a factor loading of at least 0.30. The results of exploratory factor analysis guided subsequent confirmatory factor analysis.
Then confirmatory factor analysis using structural equation modeling was used to test the goodness of fit of the items to the theoretical domains of the original PCAT. Subsequently, confirmatory factor models were adjusted iteratively based on fit and judgment until the goodness-of-fit statistics were optimized. We also assessed the entire instrument including all domains in our data analysis. First, we included all domains in confirmatory factor analysis. Then we only included the four core domains (first contact, continuity, coordination, and comprehensiveness) in confirmatory factor analysis. The following goodness-of-fit statistics were used: normed fit index (NFI) ≥0.9 indicating good fit, comparative fit index (CFI) ≥0.9 indicating good fit, standardized root mean square residual (SRMR) ≤0.05 indicating acceptable fit.
For each domain confirmed as being unidimensional, we examined the distribution of response options of individual items within each domain based on domain performance using nonparametric item response theory analysis. However, we cannot get the exact information of each item performance. To be more precise, two parameter estimates (discriminability and difficulty) were subsequently generated using Samejima’s (Reference Samejima1969) grade response model.
We estimated the correlations between domains to examine how domains were related and whether they demonstrated distinctiveness.
Finally, we used logistic regression modeling to examine how primary care domains were associated with patients’ satisfaction with service attitude (one item) and perceived technical quality (one item) of their primary care provider. The five-point Likert response scale for the two items was dichotomized to indicate satisfaction (very satisfied or satisfied) versus dissatisfaction. All domains were put in the same model, and age, sex, education, and self-rated health status were included in the model as covariates. Education level was categorized into three groups: illiterate, primary school, junior high school and above. Self-rated health status was measured by a visual analogue scale with end points of 0 and 100, where 0 corresponds to ‘the worst health status’, and 100 corresponds to ‘the best health’.
Descriptive, correlation, and exploratory factor analysis were conducted with SPSS 22.0. ML imputation and confirmatory factor analysis were conducted with LISREL9.1. Nonparametric and parametric item response theory analyses was conducted with SPSS 22.0 and MULTILOG 7.03.
Results
Table 1 summarizes the response distribution and descriptive statistics of each item in initial PCAT-Tibetan. The percentage of true missing values for each item ranged from 0.9% to 4.4%. Most items are slightly negatively skewed, and with over 50% respondents reporting the most favorable response option ‘4 = Definitely’ on 22 of 36 items. The percentage of respondents choosing the response option of ‘not sure/don’t remember’ is higher, particularly for items in first contact access, coordination, and community orientation, suggesting that patients may not be the best information source for these domains or may not have direct experience with all aspects elicited.
a n = 1386. Mean, median, and standard deviation were calculated using the database with missing values imputed by maximum likelihood method (n = 1091).·Item labels being struck through (eg, FCA5) are deleted from the final version.
NS/DNK = not sure/don’t know.
For first contact access, two of the six items were deleted because of the unacceptable goodness-of-fit statistics (FCA5 and FCA6); and two items fit better in first contact utilization (FCA1 and FCA4), leaving only two items in first contact access and four in first contact utilization. In the eight-item ongoing care, two items were deleted because of unacceptable goodness-of-fit statistics (OC1 and OC7); and a further item (OC8) was removed to improve the goodness-of-fit statistics, leaving a five-item ongoing care scale. The eight-item comprehensiveness subscale showed unacceptable goodness-of-fit statistics on a single factor (NFI = 0.27, CFI = 0.27, SMRM = 0.19); and after removing four of the poorest fitting items, a final four-item comprehensiveness subscale demonstrated good fit (NFI = 0.98, CFI = 0.99, SMRM = 0.02). There is no change in items in other domains. The detailed iterative process to finalize the original factor structure was reported in Supplemental Table S1.
Table 2 shows the goodness-of-fit statistics of confirmatory factor analysis for each domain in the final structural model. The fit statistics of first contact utilization and coordination do demonstrate a moderate fit. For ongoing care, comprehensiveness, family centeredness, and community orientation, all the four confirmatory factor analysis models demonstrate a good fit. The empirical results suggest a best final structure for the PCAT-Tibetan of 7 domains and 27 items instead of 36. At least three items are needed for confirmatory factor analysis. First contact access only includes two items, so we did not have confirmatory factor analysis results.
a n = 1091. At least three items are needed for confirmatory factor analysis: first contact access only includes two items, so we did not have confirmatory factor analysis results.
b Weighted least square estimation method was used for confirmatory factor analysis using LISREL version 9.1.
c SRMR = standardized root mean square residual; ≤0.08 indicating acceptable fit.
d NFI = normed fit index; ≥0.9 indicating good fit.
e CFI = comparative fit index; ≥0.9 indicating good fit.
However, either the model including all domains (NFI = 0.46, CFI = 0.46, SMRM = 0.23) or the model including only the four core domains (NFI = 0.55, CFI = 0.56, SMRM = 0.14) shows unacceptable goodness-of-fit statistics on a single factor, which suggest that the domains included in original PCAT may not measure a common single construct in Tibet context, and it is not appropriate to report a total score.
The mean of each domain score is lower than the median and negatively skewed, indicating most of patients reporting favorable response answers. The first contact utilization score is the highest (3.66±0.48), while the first contact access score is the lowest (2.94±0.95). Cronbach α is over 0.70, indicating good internal consistency of items for all domains except first contact access (0.66) and ongoing care (0.66).
Non-parametric item response theory graphs were modeled on each unidimensional domain to provide further insight into item performance and reliability. In most items, the option characteristic curve for the response option ‘2 = Probably not’ is overshadowed by other options (see example in Figure 1a), indicating that nowhere along the primary care experience continuum was this option more likely to be chosen than other options, raising the question of the appropriateness of a four-point response scale. Only a few items perform optimally, such that the probability of choosing each response option is highest in a unique zone of primary care experience continuum, reflecting clearly ordinal response behavior appropriate to the assigned value for each option. Figure 1b shows a well-performing item from community orientation. A problem common to all items is that the extreme response option ‘4 = definitely’ covers a large area of primary care experience continuum and is most likely to be endorsed, even at below-average primary care experience level, suggesting that additional response options may be desirable.
The results from parametric item response theory analysis provide further evidence to confirm the findings from non-parametric item response theory analysis (Table 3). The discriminability parameter is over 1 for all items except one item on ongoing care, indicating that response options discriminate well between low and high levels of primary care experience. However, the difficulty parameters for almost all items are negative, indicating that positive ratings (b2, b3) are endorsed at less than average performance, reinforcing the pattern observed in Figure 1a. Likewise the information curves show that the majority of items are most informative in the negative zone of the underlying construct. Only CO1, illustrated in Figure 1b, shows that each response option corresponds to a distinct zone of the domain, including the most positive experience. Together these suggest that the items are most sensitive and specific for patients with poor primary care experience.
a n = 1091. Domains were scored by averaging the value of individual items. Item response scale ranged from 1 to 4. Higher score meant better patient experience.
b IRT parameter estimates were generated using Samejima’s (Reference Samejima1969) grade response model and using IRT software MULTILOG 7.03.
a = discriminability; value >1 indicating minimal discriminability. b = difficulty; highest probability of endorsing.
The Pearson correlations between the domains indicate the distinctiveness of each domain. Correlation coefficients between domains range from 0.23 to 0.61 and are lower than Cronbach α of each domain. Coordination is most highly correlated with family centeredness (0.61) and ongoing care (0.60). First contact utilization is also highly correlated with family centeredness (0.54) and ongoing care (0.55). First contact access, comprehensiveness, and community orientation have lower correlation with other domains.
Finally, Table 4) shows the extent to which a unit increase in each domain score increases the odds of patient satisfaction. Although most patients are satisfied with the service attitude (82.6%) and the technical quality (80.2%) of their primary care provider, a higher PCAT score generally is associated with higher satisfaction. For example, every unit increase in ongoing care score increases the likelihood of being satisfied with service attitude by 2.65 times.
Discussion
This advanced psychometric analysis of the PCAT-Tibetan versions provides further insight into some of the problematic psychometric properties found in the initial validation analysis, and it suggests avenues to improve the metric performance of the tool. Despite the metric problems, the PCAT-Tibetan domains of first-contact, ongoing care, and coordination with specialists are associated with an increased likelihood of patient satisfaction. This illustrates the potential of the PCAT and underlines the importance of improving the tool to address some of the metric problems. Some of the problems, such as the skewness of item response distribution found in many items, are shared with the original PCAT and other versions; and our results suggest some solutions that could improve performance. Others may be specific to the PCAT-Tibetan version – such as the non-optimal resolution of items relating to First Contact constructs – suggest the need for some principles in adapting the tool to different health system contexts.
The widespread use of the PCAT to evaluate primary care in many countries provides an opportunity to compare primary care across different contexts and to support a worldwide movement to improve primary care. Our results show not only the association of the PCAT-Tibetan with patient satisfaction but also other analysis that demonstrated the capacity to distinguish between health-care organizations in Tibet and other regions in China (Wang et al., Reference Wang, Wong, Wong, Wei, Wang, Li, Tang, Gao and Griffiths2013; McCollum et al., Reference Mccollum, Chen, Chenxiang, Liu, Starfield, Jinhuan and Tolhurst2014; Wang et al., Reference Wang, Shi, Yin, Mao, Maitland, Nicholas and Liu2015b; Hu et al., Reference Hu, Liao, Du, Hao, Liang and Shi2016; Feng et al., Reference Feng, Shi, Zeng, Chen and Ling2017). Other studies have shown the capacity of different versions of the PCAT to differentiate between delivery models. For instance, in the United States, community health centers have been showed to provide better quality primary care than health maintenance organizations, especially in continuity, coordination, and comprehensiveness (Shi et al., Reference Shi, Starfield, Xu, Politzer and Regan2003). Patients mainly receiving care from private general practitioners in Hong Kong reported better primary care experiences than those mainly receiving care from public general outpatient clinics, especially in accessibility and interpersonal relationships (Wong et al., Reference Wong, Kung, Griffiths, Carthy, Wong, Lo, Chung, Goggins and Starfield2010). In South Korea, among four types of primary care clinics staffed by family physicians, health cooperative clinics displayed the best primary care performance, while public health center clinics showed the worst performance (Sung et al., Reference Sung, Suh, Lee, Ahn, Choi and Lee2010).
The negative skewness of item response distribution in many items was noted in the original validation of the long PCAT version (Shi et al., Reference Shi, Starfield and Xu2001) and has been found in most other validation studies (Macinko et al., Reference Macinko, Almeida and De Sa2007; Lee et al., Reference Lee, Choi, Sung, Kim, Chung, Kim, Jeon and Park2009; Berra et al., Reference Berra, Rocha, Rodriguez-Sanz, Pasarin, Rajmil, Borrell and Starfield2011; Haggerty et al., Reference Haggerty, Burge, Beaulieu, Pineault, Beaulieu, Levesque, Santor, Gass and Lawson2011b; Berra et al., Reference Berra, Hauser, Audisio, Mantaras, Nicora, De Oliveira, Starfield and Harzheim2013; Yang et al., Reference Yang, Shi, Lebrun, Zhou, Liu and Wang2013; Lağarlıa et al., Reference Lağarlia, Eserb and Baydurc2014; Aoki et al., Reference Aoki, Inoue and Nakayama2016; Mei et al., Reference Mei, Liang, Shi, Zhao, Wang and Kuang2016) but may be more extreme in contexts such as China where low literacy requires face-to-face administration (Haggerty et al., Reference Haggerty, Beaulieu, Lawson, Santor, Fournier and Burge2011a). The item response theory analysis shows that the PCAT is most reliable in identifying negative experience of care, but the low information yield in the above average range of the domains means that it will have limited sensitivity for detecting improvements in care. Some researchers suggested that new response categories should be developed to minimize the favored response (Yang et al., Reference Yang, Shi, Lebrun, Zhou, Liu and Wang2013), but this would be challenging using the current response scale as it is difficult to imagine an intermediate category between ‘probably yes’ and ‘definitely yes’. The acceptable discriminability parameters for most items reflect adequate capacity to discriminate between poor and average performance on domains even though the four-point response values are not optimal. This suggests that some form of dichotomous scoring could be applied to each item to give greater weight to the more informative negative responses rather than averaging across all response options. This approach is used in a Brazil study (Macinko et al., Reference Macinko, Almeida and De Sa2007) and also in the Consumer Assessment of Healthcare Providers and Systems (CAHPS) (Dyer et al., Reference Dyer, Sorra, Smith, Cleary and Hays2012). Finally, to increase discriminability and improve the potential for sensitivity to improvement, it would be good to develop more items to measure excellent primary care experience. The community orientation item about home visits (CO1) is an example of an item that discriminates clearly between average and good primary care.
Another metric issue results from offering the ‘not sure/don’t remember’ option. The high rate of endorsing this response option is common in many language versions, especially in Asian countries including Korean, Japanese and Chinese. (Lee et al., Reference Lee, Choi, Sung, Kim, Chung, Kim, Jeon and Park2009; Yang et al., Reference Yang, Shi, Lebrun, Zhou, Liu and Wang2013; Wang et al., Reference Wang, Shi, Yin, Lai, Maitland and Nicholas2014; Aoki et al., Reference Aoki, Inoue and Nakayama2016; Mei et al., Reference Mei, Liang, Shi, Zhao, Wang and Kuang2016). Although respondents appreciate having such response option (Haggerty et al., Reference Haggerty, Beaulieu, Lawson, Santor, Fournier and Burge2011a), its management is analytically challenging. Although the PCAT scoring manual suggests attributing a value of 2.5, this is not supported for all items in a previous response theory analysis of English and French versions of the PCAT (Haggerty et al., Reference Haggerty, Burge, Beaulieu, Pineault, Beaulieu, Levesque, Santor, Gass and Lawson2011b). A more usual practice would be to treat the values as missing and attribute values using more ML methods. However, the approach used (excluding these as missing values) will impact on the factor analysis and the subsequent conclusions about the validity of PCAT version. Again, this seems to call for further work and international collaborations on response options that accord with patient experience in different contexts.
Collaborative international work could also address principals for measuring and/or comparing domains that affected health system specificities, for instance, in the access domain. First contact in the PCAT-Tibetan fails to meet optimal psychometric standards of construct validity and internal consistency. Similar problems were also found in other PCAT validation studies in China (Yang et al., Reference Yang, Shi, Lebrun, Zhou, Liu and Wang2013; Mei et al., Reference Mei, Liang, Shi, Zhao, Wang and Kuang2016) and other countries in Asia (Lee et al., Reference Lee, Choi, Sung, Kim, Chung, Kim, Jeon and Park2009; Lağarlıa et al., Reference Lağarlia, Eserb and Baydurc2014; Aoki et al., Reference Aoki, Inoue and Nakayama2016). Two studies of the PCAT-Chinese version found that Cronbach α was only 0.38 (Mei et al., Reference Mei, Liang, Shi, Zhao, Wang and Kuang2016) and 0.48 (Yang et al., Reference Yang, Shi, Lebrun, Zhou, Liu and Wang2013) for first contact utilization; similar values were found in the PCAT-Turkish (Lağarlıa et al., Reference Lağarlia, Eserb and Baydurc2014). The validation of PCAT-Korean concluded that first contact could not be assessed using a traditional scale with multiple correlated items, and it was treated to be a composite domain consisting of five independent single-item subscales (Lee et al., Reference Lee, Choi, Sung, Kim, Chung, Kim, Jeon and Park2009). The exploratory factor analysis of PCAT-Japanese collapsed first contact utilization and first contact access into one scale (Aoki et al., Reference Aoki, Inoue and Nakayama2016). These results are not surprising, given that access reflects the fit between how services are organized and the perceived need of the population (Penchansky and Thomas, Reference Penchansky and Thomas1981; Khan and Bhardwaj, Reference Khan and Bhardwaj1994). Consequently, the access dimension will be sensitive to contextual differences in how services are organized. Although these subscales perform well in studies in the North American context (Shi et al., Reference Shi, Starfield and Xu2001; Haggerty et al., Reference Haggerty, Levesque, Santor, Burge, Beaulieu, Bouharaoui, Beaulieu, Pineault and Gass2011c), the first contact items in original PCAT may not be appropriate in other countries with different health system organization and patient expectations. Making appointment in advance and gatekeeping by primary care worker do not pertain to many countries. For example, geographical accessibility is a major constraint for local people to get health-care services in Tibet, but it was not addressed in the PCAT. Context-specific items should be explored based on the operational definitions of first contact. Similar issues are likely to pertain to comprehensiveness and coordination with specialists.
In contrast, the domain of ongoing care most strongly predicts patient satisfaction with primary care provider in Tibet. This is consistent with a previous systematic review showing that the most important determinants of satisfaction are the interpersonal relationships and their related aspects of care (Crow et al., Reference Crow, Gage, Hampson, Hart, Kimber, Storey and Thomas2002). Recognizing the benefit of continuity of care, China is developing a family doctor contract service model to build a long-term trust doctor–patient relationship. Under this model, residents can sign a contract with a family doctor working at a community health center and be eligible for a service contract package including basic medical care, public health, and health management service. Although countries differ in the organizational support of continuing provider–patient relationships, our study and studies in other countries point to the value of a stable long-term relationship and mutual interactive communication between patient and primary care workers (Baker et al., Reference Baker, Mainous, Gray and Love2003; Paddison et al., Reference Paddison, Abel, Roland, Elliott, Lyratzopoulos and Campbell2015). Nonetheless, measures may be improved by accounting for cultural specificities in interpersonal communication and therapeutic relationships that impact on both patient experience and outcomes.
This study contributes to the growing international work supporting the relevance and need for valid and reliable measures of the patient experience of health care. Several lessons from PCAT-Tibetan validation study may be shared with our colleagues in low- and middle-income countries. First, the items or content of instruments from developed countries may not be appropriate in other countries. Some specific features of local primary care system may not be reflected in the translated instruments. For example, no items in PCAT could reflect the feature of geographical accessibility, which is a major aspect in Tibet. Some items in comprehensiveness domain are not appropriate in Tibet. Therefore, instead of adapting existed instrument directly, qualitative research is needed to understand the local population preference of primary care first. Second, more items to measure excellent primary care experience should be developed. Most existing items have adequate capacity to discriminate between poor and average performance on different primary care domains. However, items that discriminate clearly between average and good primary care are needed. Further research is required to explore the characteristics of some exemplars with good primary care experience and what the good primary care is from their narratives. Third, the choice of response categories should be careful. The current four-point response scale and its wording in PCAT may not be appropriate in some countries. Several factors could be considered when exploring the appropriate response categories, such as literacy level, response tendency, and judgment making of local population. This could be done through qualitative research.
Finally, a summary score of overall primary care experience including all domains, which is often the most used metric when assessing a health system, is not supported by our analysis of PCAT-Tibetan version. However, this psychometrically validated 27-item Tibetan version of PCAT will be useful in monitoring and evaluating the performance of primary care system in Tibet in specific areas, especially in accessibility, continuity, and coordination, which are the priorities of current health reform efforts in Tibet. The health service research is underdeveloped in Tibet, and there is no instrument measuring patient experience that could be used when this study was conducted. We hope this study could bring more researchers’ attention into primary care performance evaluation in Tibet. We also recognize that different policy interventions to achieve primary care functions are inter-related, but each policy has its own priorities. For example, the family doctor contract service model is being developed and expanded now to improve performance in accessibility and continuity; and the transformation of tiered health service delivery system aims to promote collaboration between different health-care providers and to improve coordination. Under this context, PCAT-Tibetan version is a potential useful instrument to evaluate the effectiveness of these policy interventions.
Acknowledgements
We thank Fatima Bouharaoui for statistical advice.
Financial Support
W.W. received salary support to conduct the analysis from McGill Research Chair in Family and Community Medicine at St. Mary’s Hospital and Steinberg Global Health Postdoctoral Fellowship, McGill University, Canada.
Conflict of Interest
None.
Ethical Standards
The Ethics Committee of Tibet Autonomous Regional Health and Family Planning Commission approved the study.
Supplementary material
To view supplementary material for this article, please visit https://doi.org/10.1017/S1463423619000239.