In the context of health care, economic evaluation (EE) can be defined as the analysis of the costs and effects of alternative interventions in a defined population (1). It is an important element of decision making about reimbursement or implementation of interventions in many countries (2–6). Systematic reviews (SRs) of EEs can play a key role in this process. Conducting SRs requires significant resources and the search approaches used, including the choice of databases to search, can impact on these resources (7). Ideally, researchers need to identify as many relevant records as possible, with maximum efficiency. The number of databases searched is a key factor in achieving an efficient approach. To inform the selection of databases for an efficient SR of EEs, evidence is needed on the yield of specific databases and database combinations.
More recent research related to the choice of search sources for EEs is available. Thielen et al. (16) make recommendations on the identification of studies for SRs of economic evidence for guideline development, but the selection of databases is based on a summary of recommendations and research predating the closure of NHS EED and HEED. A 2016 bibliometric analysis of the yield of fourteen databases for a reference set of EEs (9), reported that a combination of Scopus, MEDLINE and Global Health identified 91 percent of the EEs (17). Rather than identifying a reference set of EEs through a hand-search of relevant journals, or through the lists of included studies in SRs of EEs, the authors used a set of focused database searches developed specifically for the study. These searches lacked the sensitivity of established search filters for EEs. These limitations may mean that all EEs available to be found in a database were not retrieved by the authors’ searches, impacting on the reliability of their conclusions on database yield (18).
In light of the need for updated guidance on the most appropriate information sources to identify EEs, this study aims to provide further evidence on the relative yield of databases (both alone and in combination) to inform database choice for SRs of EEs. We also investigated the characteristics of studies not retrieved from the databases.
The inclusion of a record in a database does not mean that it will be identified by a search strategy; recommendations on resources to search lack value if common search practices mean that relevant records are not retrieved from these resources. Therefore, we also evaluated the quality of the search strategies used in recent reviews of EEs, using the performance of the MEDLINE strategy to assess if strategies were sufficiently sensitive.
Objectives 1, 2, and 3
We used relative recall methodology (19) to build a quasi-gold standard (QGS). A QGS is a set of relevant records against which the performance of a search strategy, or coverage of a database, is tested to determine how effective it is at retrieving particular record types. Although a QGS can be formed by a hand-search for relevant records, a QGS formed from relative recall is often used to test the performance of search strategies and database yield (20–22). In this study, we used the QGS to assess the yield of each database, and combinations of databases.
The QGS comprised EEs included in reviews commissioned or carried out by the English National Institute for Health and Care Excellence (NICE). We selected SRs that were either: (i) commissioned and funded by the health technology assessment (HTA) program on behalf of NICE and published in the journal Health Technology Assessment or (ii) conducted as part of NICE guideline development in the Public Health work-stream and published on the NICE Web site.
Searches undertaken in this context are shaped by methodological standards, reporting guidelines, and requirements for this type of evidence (3;5;23). Therefore, these reviews might be assumed to be of good quality and likely to clearly describe their methodology. No additional quality assessment of individual search strategies was undertaken. The inclusion of reviews from the NICE Public Health work-stream reflected our intention to include nonclinical topics in the QGS, increasing generalizability.
Reviews were identified by hand-searching the journal Web site of Health Technology Assessment, starting at the most recent publication and working back in date. Reviews undertaken to inform published NICE Public Health guidance were identified by browsing the Guidance section of the NICE Web pages, filtered by guidance type and starting with the most recent. The identification of candidate reviews took place in February 2017.
Eligible candidate reviews had to meet prespecified criteria (Table 1). Results were screened by one reviewer; any reviews where a clear inclusion decision could not be made were discussed with a second reviewer (or third reviewer if necessary) and agreement was reached.
Table 1. Eligibility Criteria to Be Included in the Sample of Candidate Reviews
We aimed to harvest a minimum of 350 studies from eligible reviews, with approximately 280 (80 percent) sourced from reviews published in Health Technology Assessment, and approximately seventy (20 percent) sourced from reviews produced as part of the NICE Public Health work-stream. The 80 percent/20 percent split reflected the approximate ratio of technology appraisals to public health reviews on the NICE Web site in 2017.
We selected reviews using the eligibility criteria and harvested the included EEs from the reviews. Reviews were selected from the Health Technology Assessment journal and the NICE Web pages until we reached the target number of studies. Once we achieved the target number we continued harvesting studies from any remaining eligible reviews published in that same year so we would have a complete year.
EEs included in each identified review were extracted and added to an Excel spreadsheet. Duplicates (records included in more than one review) were removed. Material submitted to NICE by the manufacturer as part of the HTA process and cited as evidence was excluded and did not form part of the QGS. References where the citation details were ambiguous and where we could not confidently identify the item being cited were also excluded. The remaining references formed our QGS set of relevant studies.
We searched nine databases for each QGS reference, to ascertain which databases included each reference. The databases comprised:
• Five healthcare databases:
• One general economics database:
• Three multidisciplinary databases:
The databases represent the range of types of database that might be searched for EEs and that previous research and available guidance suggested were important for EEs (8;10–12;17). We also chose resources we could access and that provided suitable functionality for efficient searching in the context of a SR. We did not include NHS EED as we wanted to identify the best sources of EEs in the context of the closure of NHS EED to new records.
The presence or absence of each reference in each database was recorded in an Excel spreadsheet.
Results were analyzed in Excel to identify:
• The yield ((number of QGS references found in each database / total number of QGS references) × 100) for each database alone and for all databases combined.
• The number of unique references retrieved from each database.
• The most efficient combination of databases in three scenarios. We defined ‘efficient’ as the fewest databases that could be combined to find the largest number of QGS records. The three scenarios were:
◦ The most efficient combination overall
◦ The most efficient combination of healthcare databases in the event that searchers do not have access to multi-disciplinary resources
◦ The most efficient combination of free resources in the event that searchers do not have access to subscription databases.
• The number and characteristics of references not found in any of the nine databases.
We evaluated the success of MEDLINE search strategies reported in a sample of SRs in retrieving studies included in the SR and available in MEDLINE.
Each eligible review was checked to see if it included EEs available in MEDLINE and reported a MEDLINE strategy in enough detail to enable reproduction. We reran the MEDLINE strategy in each review that met these criteria to see whether it retrieved the QGS records available in MEDLINE. We then calculated sensitivity, precision, and number needed to read (NNR) for each strategy. Sensitivity, precision, and NNR were defined as:
• Sensitivity % = (number of QGS records available in MEDLINE retrieved by reported MEDLINE strategy / total number of QGS records found in MEDLINE) × 100
• Precision % = (number of included QGS records available in MEDLINE retrieved by reported MEDLINE strategy / total number of MEDLINE records retrieved) × 100
• NNR = total number of MEDLINE records retrieved / number of included QGS records available in MEDLINE retrieved by reported MEDLINE strategy.
• The number of MEDLINE strategies that missed at least one of the QGS records found in MEDLINE
• The total number of QGS records missed across all strategies
• Mean sensitivity, precision, and NNR across all strategies
• The reasons for nonretrieval of any studies not identified by the MEDLINE strategies.
DISCUSSION AND CONCLUSIONS
Our results suggest that searching Embase, the HTA Database, and either PubMed or MEDLINE will identify the majority of EEs relevant for inclusion in SRs. In the absence of NHS EED, searchers should not rely on PubMed or MEDLINE alone, as suggested by earlier studies (8;11).
Previous research did not test the value of the HTA Database in identifying EEs. However, the HTA Database identified more unique records from our QGS (13) than any other database tested. This is likely to be because HTA Database indexes literature published by HTA agencies that is not routinely included in journal-focused bibliographic databases. HTAs from many agencies often include an EE. From May 2018 the HTA Database is not being updated while the production process transfers to INAHTA. It is currently unclear whether this will result in differences in functionality and coverage: any changes may impact on database utility. The uncertainty around the future of HTA Database is concerning as it is an important resource for identifying EEs and should be searched to identify publications related to healthcare decision making not available from other bibliographic databases. At present, other than the HTA Database, this material must be identified by a time-consuming process of searching the Web pages of individual HTA agencies or by means of a general Web search engine. Producers of HTAs may consider exploring alternative methods to enhance the visibility and accessibility of their publications to researchers.
In addition to HTA Database, unique records were also found in Embase (two records) and Scopus (one record). Both of the unique records found in Embase were conference abstracts. This highlights the potential value of including databases that index conference abstracts if this type of material is eligible for inclusion in the review.
Embase was the highest yielding database. This is likely to partly reflect Embase's policy of indexing conference abstracts. For example, twenty-eight of the thirty-five QGS records found in Embase but not in MEDLINE or PubMed were conference abstracts.
The high yield of Embase can also be explained by Elsevier's project, launched in 2010, to include all MEDLINE records in Embase (25). Embase, therefore, contains two databases. Despite this ambition to include all of MEDLINE, six of our QGS records could be found in both MEDLINE and PubMed but not in Embase. Searching both resources was necessary to achieve the highest yield with the fewest possible databases. Our results, therefore, support the current recommendation to search both MEDLINE and Embase in the context of SRs (26). Searching both databases also allows searchers to exploit the differences in indexing, record structure, and search functionality between resources to maximize retrieval.
There was no difference in the performance of MEDLINE and PubMed: they retrieved the same records. This was because we searched all available segments of MEDLINE (including In-Process & Other Non-Indexed Citations). Searching Ovid MEDLINE without these segments would have resulted in a lower yield than PubMed. All available segments in Ovid MEDLINE should be searched to maximize sensitivity: the newly released segment MEDLINE All provides a simple way to achieve this.
We note that the QGS did not contain records for relatively recent publications. Searching PubMed in addition to MEDLINE has been suggested as beneficial for identifying very recent papers not yet fully indexed for MEDLINE (27). However, this research predates the expansion of MEDLINE with the addition of new segments such as MEDLINE All. We do not currently have sufficient evidence to say whether this impacts on the conclusions of the previous research. Any additional value from searching PubMed in terms of yield, must be balanced against the comparatively limited search functionality in this interface. The inability to use advanced search syntax such as proximity searching makes it difficult to construct a strategy with the desired balance of sensitivity and precision.
Searching databases other than the core group had limited incremental yield, retrieving only four additional QGS references (1 percent). This small incremental yield does not allow for strong conclusions on the value of searching specific additional resources. However, the additional four references could all be retrieved by searching Scopus and three of the four references could be found in Science Citation Index and Social Sciences Citation Index. We suggest there is some evidence that researchers should consider searching a multidisciplinary database, particularly for nonclinical research topics. Pitt et al. have also suggested Scopus is a potentially useful resource for EEs (9).
Searching only freely available (nonsubscription) databases resulted in the identification of fewer QGS records (85 percent compared with 96 percent). Researchers who only have access to freely available databases should place an increased emphasis on supplementary search methods such as reference-list checking and citation searching to maximize retrieval of relevant studies.
Records for fourteen QGS references (4 percent) were not included in any database tested and largely comprised grey literature. Coverage of grey literature in supplementary search methods designed to retrieve this type of evidence (e.g., searches of HTA agency Web sites, conference proceedings, online sources of nonjournal reports, reference list checking, and expert contact) may be a more efficient and effective use of resources than extensive database searching. This supports the conclusion by Royle and Waugh (10) that the majority of published EEs can be identified in a small number of core databases, and beyond this, supplementary search approaches may be most productive in finding additional studies.
Despite relatively high mean sensitivity, the MEDLINE search strategies developed by the authors in the included reviews had weaknesses, resulting in nonretrieval of relevant studies. As researchers can no longer rely on searches of NHS EED or HEED to retrieve EEs missed by suboptimal searches elsewhere, it is important that search strategies in large bibliographic databases are of high quality to maximize the likelihood of identifying all relevant studies. The efficient combinations of databases we have identified will only retrieve relevant EEs if researchers conduct adequately sensitive searches.
Only one of the twenty-five (4 percent) of the records missed by MEDLINE strategies was not retrieved because of search terms used for the economics concept. This perhaps reflects the availability of published filters designed to identify EEs, such as that developed by the Centre for Reviews and Dissemination (28). Our findings suggest that improving the sensitivity of searches for EEs in these large bibliographic databases is likely to be more complex than simply encouraging the use of appropriate search filters for economic study designs. The reasons studies were missed (e.g., insufficiently sensitive search terms, illogical combinations of concepts, and the use of limits) suggest searchers need to be aware of a range of issues. We recommend that researchers designing strategies to identify EEs in general bibliographic databases such as MEDLINE use a published filter designed to identify EEs, ensure that terms for population and intervention concepts are sufficiently sensitive, consider whether date or publication type limits are appropriate, and check their strategies carefully to identify syntax errors.
The use of quality assessment tools for search strategies, such as the PRESS checklist (29), may help to achieve this. Research has also suggested that the involvement of a suitably experienced librarian or information specialist improves the quality of searches conducted as part of SRs (30–32).
The MEDLINE strategies that we tested demonstrated variable precision and NNR. NHS EED and HEED allowed the construction of relatively precise search strategies as their content was prefocused by study type. The closure of NHS EED and HEED and the subsequent reliance on larger bibliographic databases suggests that the ability of searchers to construct strategies that can achieve high sensitivity with reasonable precision will become more important. Precision can best be achieved by searching using sophisticated interfaces that allow the use of phrase searching, proximity operators and other techniques to introduce focus to strategies. Precision may be a particular challenge in reviews related to public health and other nonclinical topics.
In conclusion, we suggest searching for published EEs can be limited to key databases, as long as these databases are searched using methodologically appropriate strategies. Searchers should concentrate on developing suitable search strategies for key databases to ensure high sensitivity and adequate precision, in addition to using supplementary search approaches to retrieve evidence that is unpublished or unlikely to be identified by bibliographic databases.
Limitations of this Study
Several limitations should be taken into account when considering this study's results and conclusions.
The candidate reviews were screened against prespecified eligibility criteria by one reviewer. If a clear inclusion decision could not be made, agreement was reached with a second or third reviewer. Using a single reviewer for screening increased the risk of selection errors and bias.
Although the nine databases tested were chosen to represent the range of databases that might be searched for EEs, many other databases are available to researchers. Our study can only provide information on the yield of the included databases.
Our QGS comprised 351 EEs. Although this is a reasonably sized QGS, a larger reference set would have increased the generalizability of our research. Sourcing the QGS from a wider range of reviews could also have improved generalizability. All reviews were produced in the context of United Kingdom decision making and focused on clinical medicine or public health. Our findings may be less generalizable to reviews specifically relevant to other topics (e.g., mental health, health management) or healthcare contexts (e.g., low- and middle-income countries [LMICs]).
The robustness of these findings depends on the extent to which the QGS is representative of all relevant EEs. Representative QGS sets result from high quality search strategies that can be expected to retrieve a high proportion of all available relevant studies. Ideally, the quality of searches conducted by the SRs from which the QGS is harvested should be assessed. However, we took the pragmatic decision not to quality assess each search strategy in the reviews from which the QGS was sourced. The assumption that searches conducted in the context of NICE decision making would be of sufficient quality to provide a representative QGS is a potential limit of our methodology. Weaknesses in the search methods of source reviews could have failed to retrieve eligible EEs and lessen the degree to which our QGS was representative of all relevant studies.
We define “efficient” as the fewest databases that could be combined to find the largest number of QGS records. The number of databases is one measure of search efficiency, but not the only one. We do not take into account, for example, time taken to search a database, time taken to export records, or the number of irrelevant records retrieved. Our study also does not consider the impact of database interfaces on efficiency. Many bibliographic databases are available on more than one platform, each providing different functionality that can impact on retrieval. When viewing the database combinations reported as most efficient, the limitations of our definition should be considered.
The sensitivity of each review's MEDLINE search was used as a proxy for search quality. This provides only limited information on the quality of search methodology. Although a record for a relevant study may be missed by the MEDLINE strategy, this does not mean the strategy, when translated, will also fail to identify that record in other databases searched. Additionally, the MEDLINE strategy may not reflect the search approaches used in the other databases searched by the review. There are other methods to assess quality of searches, such as the PRESS checklist (29). As we were only concerned with whether the strategy could retrieve QGS records, elements of search methodology assessed by PRESS (e.g., errors in search syntax, missing search terms, and inappropriate limits) were not of interest unless they impacted on retrieval. Reasons for nonretrieval did closely map to several aspects of search development covered by PRESS elements.
Implications for Practice
Our findings can inform researchers’ decisions on database choice when searching for EEs following the closure of NHS EED and HEED. Although our research was carried out in the context of searching for SRs, our findings on database yield and search quality are also likely to be relevant to those searching for EEs for other purposes.
Implications for Research
We can only provide information on the value of the nine databases tested. There is scope for analysis of further databases, particularly in the context of economic reviews with a specific focus (for example nursing, mental health, or health care in LMICs). (9) The bibliometric analysis by Pitt et al. of EEs from a global perspective suggested that Global Health, a database that has particular relevance for LMIC research (33), merits further exploration.
Similarly, we can only provide information on the value of the included databases in relation to our QGS set. QGS records were harvested from reviews with a particular focus (mainly clinical, with some public health) and research context (United Kingdom health care). Further research to investigate how these findings relate to QGS sets harvested from SRs with a different focus or research context would be valuable.
Records for fourteen QGS references (4 percent) were not included in any database and we suggest supplementary search methods are used to identify these types of studies. Evidence-based research on the relative value of different supplementary search methods is needed, particularly as many methods can be resource intensive.