Development and initial psychometric properties of the Research Complexity Index

Allison A. Norful; Bernadette Capili; Christine Kovner; Olga F. Jarrín; Laura Viera; Scott McIntosh; Jacqueline Attia; Bridget Adams; Kitt Swartz; Ashley Brown; Margaret Barton-Burke

doi:10.1017/cts.2024.534

Development and initial psychometric properties of the Research Complexity Index

Published online by Cambridge University Press: 09 May 2024

Kitt Swartz and

Allison A. Norful*: Affiliation:
Columbia University School of Nursing, New York, NY, USA
Bernadette Capili: Affiliation:
Rockefeller University, New York, NY, USA
Christine Kovner: Affiliation:
New York University Rory Meyers College of Nursing, New York, NY, USA
Olga F. Jarrín: Affiliation:
Rutgers The State University of New Jersey, New Brunswick, NJ, USA
Laura Viera: Affiliation:
University of North Carolina, Chapel Hill, NC, USA
Scott McIntosh: Affiliation:
University of Rochester Medical Center–CLIC, Rochester, NY, USA
Jacqueline Attia: Affiliation:
University of Rochester Medical Center–CLIC, Rochester, NY, USA
Bridget Adams: Affiliation:
Oregon Health & Science University, Portland, OR, USA
Kitt Swartz: Affiliation:
Oregon Health & Science University, Portland, OR, USA
Ashley Brown: Affiliation:
University of North Carolina, Chapel Hill, NC, USA
Margaret Barton-Burke: Affiliation:
Memorial Sloan Kettering Cancer Center, New York, NY, USA
*: Corresponding author: A. A. Norful; Email: aan2139@cumc.columbia.edu

Article contents

Abstract
Objective:
Methods:
Results:
Conclusion:
Introduction
Materials and methods
Results
Discussion
Conclusion
Author contributions
Funding statement
Competing interests
References

Rights & Permissions

Abstract

Objective:

Research study complexity refers to variables that contribute to the difficulty of a clinical trial or study. This includes variables such as intervention type, design, sample, and data management. High complexity often requires more resources, advanced planning, and specialized expertise to execute studies effectively. However, there are limited instruments that scale study complexity across research designs. The purpose of this study was to develop and establish initial psychometric properties of an instrument that scales research study complexity.

Methods:

Technical and grammatical principles were followed to produce clear, concise items using language familiar to researchers. Items underwent face, content, and cognitive validity testing through quantitative surveys and qualitative interviews. Content validity indices were calculated, and iterative scale revision was performed. The instrument underwent pilot testing using 2 exemplar protocols, asking participants (n = 31) to score 25 items (e.g., study arms, data collection procedures).

Results:

The instrument (Research Complexity Index) demonstrated face, content, and cognitive validity. Item mean and standard deviation ranged from 1.0 to 2.75 (Protocol 1) and 1.31 to 2.86 (Protocol 2). Corrected item-total correlations ranged from .030 to .618. Eight elements appear to be under correlated to other elements. Cronbach’s alpha was 0.586 (Protocol 1) and 0.764 (Protocol 2). Inter-rater reliability was fair (kappa = 0.338).

Conclusion:

Initial pilot testing demonstrates face, content, and cognitive validity, moderate internal consistency reliability and fair inter-rater reliability. Further refinement of the instrument may increase reliability thus providing a comprehensive method to assess study complexity and related resource quantification (e.g., staffing requirements).

Keywords

Clinical research instrumentation psychometric research design workload

Type: Research Article
Information: Journal of Clinical and Translational Science , Volume 8 , Issue 1 , 2024 , e91

DOI: https://doi.org/10.1017/cts.2024.534 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2024. Published by Cambridge University Press on behalf of Association for Clinical and Translational Science

Introduction

The development and implementation of clinical research studies are challenging for investigators [Reference Smuck, Bettello and Berghout1,Reference Roche, Paul and Smuck2]. Barriers may include complex regulatory requirements, restrictive eligibility criteria, specific study timelines, and limited funding to support a study. To overcome these barriers and ensure quality and integrity [Reference Smuck, Bettello and Berghout1,Reference Good, Lubejko, Humphries and Medders3], studies must have sufficiently trained personnel to conduct the research. Specifically, clinical studies require appropriate staffing to support screening for establishing trial eligibility, participant recruitment and retention, obtaining informed consent, ensuring fidelity to treatment (e.g., study maintenance and adherence), and complying with adverse event (AE) reporting and follow-up. Despite the need for appropriate staffing, few quantitative models inform staffing needs to ensure the study’s safe conduct, achievement of study goals, and budget adherence. Further, variability across protocol complexity often presents challenges to planning appropriate allocation of resources [Reference Getz, Campo and Kaitin4].

A literature review confirmed that methods to estimate clinical research workload and study complexity across all study design types are scant. The majority of literature describes complexity in the context of clinical trials and is often isolated to oncology or pharmaceutical trials [Reference Makanju and Lai5–Reference Getz and Campo7]. Instruments specific to oncology research exist and include the Ontario Protocol Assessment Level ¹, the Wichita Community Clinical-Trial Oncology Protocol Acuity Tool ³, and the NCI Trial Complexity Elements & Scoring Model [Reference Richie, Gamble, Tavlarides, Strok and Griffin8]. These instruments do not differentiate between inpatient or outpatient settings, nor do they differentiate by cohort or credentials of the research staff (e.g., technician, clinical research nurse [RN], non-RN coordinator) who implement the research activities. Also, definitions for workload, study complexity, and methods to evaluate existing instruments’ reliability and validity are not described in detail.

Clinical research staff, often RNs, are responsible for managing complex teams and workflow of clinical research studies including (1) managing relationships with a variety of staff; (2) protocol review, logistics, staff adjustments, and budget; (3) protocol approval process including managing scientific and institutional review board committee meetings, reviews of protocol instruments, sponsor communication, and staff education; (4) research participant prescreening, protocol visit management, AEs, source documentation, invoicing, and query resolution; and (5) sponsor correspondence including AE reporting, monitoring sponsor visits, and study startup and close-out visits. These aspects of clinical research staff responsibilities must be accounted for when assessing study workload and complexity. Such metrics can provide research teams with an objective method to quantify the activities associated with clinical research studies based on factors contributing to workload. Thus, the purpose of this multi-phase study was to develop an instrument that may be used across all study design types to scale research complexity. This article describes the first two phases of the development of the Research Complexity Index (RCI): (1) identifying core elements of clinical research studies and (2) developing initial items to scale each element and evaluating the tool’s initial psychometric properties.

Materials and methods

Phase 1. Item development; content, face, and cognitive validity testing

Item development

In preliminary work, we conducted a literature review with content analysis to identify conceptual dimensions, definitions, and any existing instruments that scale research study complexity. The research team classified the content guided by the Donabedian model [Reference Donabedian9] – a process involving Structure (e.g., environment, personnel, resources), Processes (e.g., procedures), and Outcomes (e.g., study deliverables, dissemination goals). See Table 1. Through iterative revisions, we established a working definition of the construct of research complexity: Clinical research study complexity is defined as the elements that contribute to the intricacy and difficulty of a clinical study. It includes elements such as the nature of the intervention (e.g., novel drugs, complex procedures), the design of the study (e.g., randomized controlled trials, multi-site studies), the study population (e.g., rare diseases, multiple comorbidities), regulatory requirements, and data management needs. A complex study typically requires advanced planning and specialized expertise to manage and execute the study effectively.

Table 1. Proposed dimensions of research study and trial complexity (adapted from Donabedian’s model)

Next, we cross-checked our elements of research complexity to the existing instrument, National Cancer Institute (NCI) Trial Complexity Elements and Scoring Model. This NCI instrument has 10 Items to assess research complexity. After iterative discussion and review of our content analysis, we added an additional 15 items and renamed some items to better capture the details of our applied theoretical model. Table 2 displays the match between the elements of the NCI Instrument with the new instrument. Guidelines for technical and grammatical principles were followed to produce and revise clear and concise items that use language familiar to clinical research professionals [Reference Hinkin10]. We then revised items and response options to be useful in various study designs rather than limiting the instrument to clinical trials. Following a review for clarity and grammar, the new items, representing each of the study’s complexity elements, underwent face, content, and cognitive validity testing [Reference Drost11].

Table 2. Alignment of new instrument with existing National Cancer Institute tool

Data collection & analysis

In May 2022, we sent a REDCap electronic survey link via email to individuals through a random sample of institutions within the authors’ clinical research professional networks. The authors and the NCAT-funded University of Rochester Medical Center’s Center for Leading Innovation and Collaboration (CLIC) staff pretested the electronic version of the instrument for online functionality before its distribution via email. The email included a description of the project’s purpose, an anonymous survey link (content validity testing), and a request for potential participants to participate in an interview to assess face and cognitive validity. Employing a snowball technique, we also requested participants to invite colleagues to participate in the survey. CLIC staff collected and managed data using REDCap electronic data capture tools.

Content validity testing

Content validity ensures that the new scoring rubric and items are relevant to the content being measured [Reference Lynn12]. Eligibility criteria to participate were self-reported: (1) five or more years’ experience in preparing, directing, or coordinating clinical studies sponsored by industry, foundation, and/or government, and (2) completed training in research, ethics, and compliance (such as offered by the Collaborative Institutional Training Initiative (CITI Program) or equivalent).

The initial pool of items was built into REDCap survey software. Participants were recruited and asked to rate each item and response options (“scoring tiers”) on a 4-point Likert scale ranging from ‘highly relevant’ (4) to ‘highly irrelevant’ (1), and, separately, from “clear, requires no revision” (4) to “unclear, consider removal” (1). To establish initial content validity, the recommended sample size is a minimum of 6 participants. A content validity index (CVI) was computed for the individual items (I-CVI) and response options (R-CVI). Indices greater than 0.8 were eligible for inclusion and further psychometric testing [Reference Lynn12].

Face & cognitive validity testing

Each 1:1 interview was conducted via Zoom at a time convenient to the participant. The participant read each item aloud, interpreted what the item and response options were asking, and openly discussed its clarity and relevance to the construct of research study complexity. This approach permitted the researchers to focus on participant interpretations of individual survey items, relating their individual experiences to inform potential survey item revisions and to establish face and cognitive validity [Reference Geranpayeh and Taylor13]. Interviews were audio recorded to ensure descriptive validity. One interviewer, with expertise in qualitative methods and instrument development, moderated all interviews. The participant, interviewer, and at least one other study team member were present during the interview session. Study team members took notes throughout the interview pertaining to item feedback, interpretation, and suggestion for revision. Immediately following each interview, the researchers reviewed all notes and discussed participant feedback to recognize and differentiate the interpretation presented by the participants and the researchers’ interpretations of the items. Iterative revisions occurred concurrently with each subsequent interview. Through constant comparison and principles of saturation, the team conducted interviews to further revise each item for clarity and relevance until there was consensus that saturation was achieved and no new information was emerging [Reference Sandelowski14]. At this stage, the instrument was named the RCI.

Phase 2. Pilot testing and initial psychometric analysis

We pilot tested the revised instrument to obtain initial item analyses and preliminary assessment of reliability [Reference Fowler15]. We asked respondents to use the RCI to rate two preexisting protocols that were previously developed and implemented by a member of the study team. Because the targeted end user of the instrument may range from trainees to principal investigators, we purposively selected these protocols to ensure that they were not too complex thus allowing a universal understanding of study procedures to be scored. Johanson and Brooks recommend a minimum of 30 participants for initial scale development [Reference Johanson and Brooks16]. We recruited an additional convenience sample of clinical research staff through the team’s research network using a snowball technique and the same eligibility criteria as used in the face and validity testing [Reference Sadler, Lee, Lim and Fullerton17]. We sent an email to potential participants explaining the project, its voluntary nature, and the research team’s contact information [Reference Frohlich18]. An electronic survey link was embedded in the email to permit participants to easily access the pilot instrument and two unique protocol exemplars [Reference McPeake, Bateson and O’Neill19]. The first protocol exemplar was a mixed-methods study designed to evaluate general cardiovascular risk among individuals with HIV. The second protocol exemplar was a randomized, placebo-controlled clinical trial to evaluate the efficacy of fish oil and a controlled diet to reduce triglyceride levels in HIV. Participants were asked to use the new version of the instrument that was based on the content and validity testing to score each protocol.

After 31 anonymous participants completed the pilot, a finalized dataset was established. The data were exported from REDCap to SPSS v.27 to perform initial psychometric analysis including item analysis and reliability statistics. Descriptive statistics, such as range, mean, and standard deviation, were computed for each item. Inter-item correlations (IIC) and the Cronbach’s alpha for the instrument were calculated [Reference Nunnally and Bernstein20]. Corrected item-total correlations were used to determine how each item correlates to other items in the instrument. A targeted range for IIC was 0.30–0.70 to prevent under- or over- correlation of the items on the instrument. Finally, a Fleiss’ Kappa statistic was used to assess inter-rater reliability. The Fleiss’ Kappa is a statistical measure for assessing reliability of agreement between more than three raters and is used for tools that have categorical response options [Reference Fleiss, Levin and Paik21]. Following pilot testing, our research team discussed the items to assess which items contribute to the overall scoring metric’s reliability and which items warrant further revision and/or removal [Reference Fowler15].

Results

Phase 1

Participants completed the initial content validity testing using the electronic rating scale for each item and response option. Initial content validity indices indicated that 34 out of 100 collective items and scoring response options fell below the 0.8 reliability threshold and justified the need for revision during the cognitive interview phase. Seven people participated in a 1:1 interview to establish cognitive validity through iterative discussion and revision of each item and scoring response option.

At the conclusion of cognitive validity testing, all elements were retained and response options were revised to enhance clarity and relevancy for the following elements: selection of study instruments; physical equipment; budget preparation; consultant agreements; facilities or vendor agreements; hiring and job descriptions; access to target population; vulnerable populations; participant eligibility; incentives; IRB preparation; compliance reporting; expected AEs/safety risk; statistical analysis, and dissemination. Within each response option, criteria for each item were refined to scale complexity and included “Low (1 point),” “Moderate (2 points),” and “High (3 points).” The scale range of the instrument was 25 (low complexity) – 75 (high complexity).

Phase 2

Thirty-three respondents returned the survey. We reviewed the datasets from the two individual protocol scores and used case-wise deletion to manage missing data. Specifically, responses with greater than 80% missing data for each protocol independently were removed. Subsequently, 31 total responses were received for each protocol. As shown in Table 3, most participants reported that they were female (74%) and White (72%). Over 80% of respondents had a master’s degree or higher. One-quarter of respondents reported their role as a principal investigator. Other roles included research coordinator (21.2%) and clinical research nurse (39%). There was a wide variability of research areas (e.g., genetics/genomics; oncology).

Table 3. Characteristics of respondents (phase 2 pilot testing)

* Student/trainees had at least 5 years’ experience in clinical research and therefore met the eligibility criteria.

As shown in Table 4, item means and standard deviations, indicating item difficulty, ranged from 1.0 to 2.75 in Protocol exemplar 1 and 1.31 to 2.86 in Protocol exemplar 2. In Protocol 1, corrected item-total correlations, indicating item discrimination, ranged from 0.030 to .536. Fifteen items were under correlated to the other items in the scale. No items were over correlated. In Protocol 2, corrected item-total correlation ranged from 0.012 to .618. Ten items were under correlated while no items were found to be over correlated. Across both protocols, eight items were under correlated to the other items on the scale. They include facilities and vendor agreements (item 5), multiple PI agreements (item 6), access to target populations (item 9), vulnerable populations (item 10), participant eligibility (item 11), intervention administration (item 16), IRB preparation (item 20), and follow-up (item 23). Initial Cronbach’s alpha for the total scale was 0.586 in Protocol 1 and 0.764 in Protocol 2.

Table 4. Individual item analysis of the Research Complexity Index

* Item 14 had zero variance in protocol 1.

The range of total composite scores for Protocol 1 was 32 to 48 and for Protocol 2, 40-60, thus indicating that the second protocol was higher on the scale of complexity. Fleiss’ Kappa agreement for inter-rater reliability indicated fair agreement for both Protocol 1 (.338) and Protocol 2 (.277). Upon assessment of individual item ratings, there were four items (access to target population, research team, data collection procedures, and statistical analysis) that did not yield at least 60% agreement for a scoring tier. This indicates the poor performance of these items that may be driving the inter-rater reliability statistic lower. The final version of the RCI is displayed in Table 5.

Table 5. Research Complexity Index (final piloted version)

Add all three columns to get final complexity score: ___________.

Discussion

This project developed a new 25-item instrument to scale research study complexity. Following initial item development and psychometric pilot testing, the RCI demonstrates face and cognitive validity, but only fair inter-rater reliability. We found that some items were under correlated with each other despite participants indicating their critical nature when scaling complexity. This indicates that the conceptual foundation of the construct, study complexity, remains unclear. Conceptual analyses that refine the antecedents, dimensions, and consequences, of the construct of research study complexity should be explored concurrently with additional instrument revision and testing. Future testing should also include a larger sample to enable researchers to perform exploratory factor analyses. This may help form a more refined understanding of the conceptual underpinnings of the construct [Reference Braun and Clarke22].

The findings of this study are aligned with existing literature that notes the challenges of scaling research complexity. Project difficulty across all fields (e.g., engineering) is often defined as how hard it is to achieve goals and objectives [Reference Dao, Kermanshachi, Shane, Anderson and Hare23]. An empirical measure is helpful to quantify operational performance, allocate resources and personnel, and establish metrics for project or individual researcher success [Reference Jacobs24]. In academic medical institutions, researchers and academic leadership have noted the importance of recognizing resources, finances, and the establishment of guidelines and measurement systems to scale faculty effort in research [Reference Nutter, Bond and Coller25]. Some argue that, in lieu of determining effort by one’s level of grant support, transparent metrics are needed to help researchers distinguish the complexity of their activities and responsibilities [Reference Morrow, Gershkovich and Gibson26]. The RCI proposed by this study may better capture study complexity and allow researchers to better demonstrate the time, effort, and allocated resources regardless of study design or funding.

Similar to the goals of the original NCI Trial Complexity Model [Reference Richie, Gamble, Tavlarides, Strok and Griffin8], the proposed RCI may also be useful for estimating funding or resources required by the study’s most time-consuming tasks. Further, institutional allocation of resources is sometimes based on the level of acquired funding, and not necessarily informed by study design, proposed workload, or researcher experience. Yet, the experience level of investigators should be taken into consideration when scaling complexity. For example, a principal investigator with decades of experience conducting clinical trials may consider tasks less complex as compared to an early-stage investigator. The goal of any instrument used to measure research complexity should be to inform organizations how to best optimize research efficiency and cost-effectiveness through early and accurate evaluation of researcher needs. Across fields, there is some evidence that the three determinants of research efficiency include seniority, public funding, and institutional reputation [Reference Amara, Rhaiem and Halilem27]. Yet, it is recommended that institutions formulate strategies to better measure and promote operational and performance improvement [Reference Jiang, Lee and Rah28,Reference Gralka, Wohlrabe and Bornmann29]. As part of the ongoing development of this present instrument, we recommend future validity and reliability testing across settings with researchers who have varying levels of experience. Subsequently, we may grasp a better understanding of the stewardship of research resources (i.e., time, staff, budgets) needed by trainees, junior scientists, or senior faculty across all study designs [Reference Resnik30].

Limitations

This research project has limitations that should be considered prior to widespread adoption of the new instrument. First, we acknowledge that not all researchers have the same experience with all study design types (e.g. clinical trials) thus presenting potential variability of responses. Varying institutional-specific resource access may also alter the complexity of a protocol. However, since the objective of this study was to create a more universal instrument when measuring complexity across researcher- and institution-types we believe the initial piloted version serves as a sufficient prototype prior to additional testing. Future research may include homogenous clusters of researchers based on level of experience and familiarity with specific study designs. Second, while effective in targeting experienced clinical research staff, the purposive sampling strategy may not have encompassed all categories of staff involved in clinical research. We state that variation in research fields may present challenges using a universal scale that captures study complexity. However, our design built in variations with protocol exemplars and evaluated the instrument with participants of various levels of experience to allow a more rigorous analysis. The authors recognize that diversity, equity, and inclusion (DEI) is an important variable to assess in a study; however, this instrument may not capture a study’s DEI complexity. Additionally, the lack of a user manual for the study participants was another limitation that may have impacted the usability and effectiveness of the Research Complexity Index. Our findings suggest that further refinement of these terms, a user manual, and additional training may be necessary for study teams to effectively use the HCRI instrument. These will be included in the next phase of instrument development.

Conclusion

This paper presents the development and initial psychometric properties of the RCI, which demonstrates early validity and reliability. While this instrument is still in its initial stages, the potential to assist in study planning, resource allocation, and personnel management is valuable. Further construct refinement and additional psychometric testing, including factor analyses, will allow for the evaluation of construct validity.

Acknowledgments

The authors thank Marissa Chiodo, MPH; Susanne Heininger, Rebecca Laird, MBA, M.Div., and Abby Spike from CLIC and Gallya Gannot, PhD from the National Institute of Health, for their assistance with this study. The authors thank Helaine Lerner & Joan Rechnitz who endow the Heilbrunn Family Center for Research Nursing at Rockefeller University where this project was conceived.

Author contributions

AAN (Primary Author with responsibility for manuscript as a whole): Conception and Design of Work; Collection of Data; Data Analysis; Interpretation; Manuscript drafting and Revisions. BC: Conception and Design of Work; Collection of Data; Data Analysis; Interpretation; Manuscript drafting and Revisions. CK: Conception and Design of Work; Collection of Data; Data Analysis; Interpretation; Manuscript drafting and Revisions. OJ: Conception and Design of Work; Interpretation; Manuscript drafting and Revisions. LV: Interpretation; Manuscript drafting and Revisions. SM: Collection of Data; Data Analysis; Interpretation; Manuscript Drafting and Revisions. JA: Collection of Data; Data Analysis; Interpretation; Manuscript drafting and Revisions. BA: Interpretation; Manuscript drafting and Revisions. KS: Interpretation; Manuscript drafting and Revisions. AB: Interpretation; Manuscript drafting and Revisions. MBB: Conception and Design of Work; Collection of Data; Data Analysis; Interpretation; Manuscript drafting and Revisions.

Funding statement

This work was supported in part by the National Institutes of Health National Center for Advancing Clinical Translational Science grant numbers: UL1TR001873, U24TR002260, UL1TR003017, UL1TR002369, UM1TR004406, UL1TR001445, and TL1TR001875, along with the University of Rochester CLIC. Dr Norful’s effort was supported in part by the National Institute of Mental Health (grant number: K08MH130652).

Competing interests

None.

References

Smuck, B, Bettello, P, Berghout, K, et al. Ontario protocol assessment level: clinical trial complexity rating tool for workload planning in oncology clinical trials. J Oncol Pract. 2011;7(2):80–84.CrossRef Google Scholar PubMed

Roche, K, Paul, N, Smuck, B, et al. Factors affecting workload of cancer clinical trials: results of a multicenter study of the national cancer institute of Canada clinical trials group. J Clin Onco. 2002;20(2):545–556.CrossRef Google Scholar PubMed

Good, MJ, Lubejko, B, Humphries, K, Medders, A. Measuring clinical trial-associated workload in a community clinical oncology program. J Oncol Pract. 2013;9(4):211–215.CrossRef Google Scholar

Getz, KA, Campo, RA, Kaitin, KI. Variability in protocol design complexity by phase and therapeutic area. Drug Inform J. 2011;45(4):413–420.CrossRef Google Scholar

Makanju, E, Lai, K. Measuring clinical trial set-up complexity: development and content validation of a pharmacy scoring tool to support workload planning. Int J Pharm Pract. 2022;30(Supplement_2):ii31–ii32.CrossRef Google Scholar

Ross, J, Tu, S, Carini, S, Sim, I. Analysis of eligibility criteria complexity in clinical trials. Summit Transl Bioinform. 2010;2010:46–50.Google Scholar PubMed

Getz, KA, Campo, RA. Trial watch: trends in clinical trial design complexity. Nat Rev Drug Discov. 2017;16(5):307–308.CrossRef Google Scholar PubMed

Richie, A, Gamble, D, Tavlarides, A, Strok, K, Griffin, C. Establishing the link between trial complexity and coordinator capacity. Clin Res. 2020;34(2):8–16.Google Scholar

Donabedian, A. The quality of care: how can it be assessed? JAMA. 1988;260(12):1743–1748.CrossRef Google Scholar PubMed

Hinkin, TR. A brief tutorial on the development of measures for use in survey questionnaires. Organ Res Methods. 1998;1(1):104–121.CrossRef Google Scholar

Drost, EA. Validity and reliability in social science research. Educ Res Perspect. 2011;38(1):105–123.Google Scholar

Lynn, MR. Determination and quantification of content validity. Nurs Res. 1986;35(6):382–386.CrossRef Google Scholar PubMed

Geranpayeh, A, Taylor, LB (Eds). Examining Listening: Research and Practice in Assessing Second Language Listening (Volume 35). Cambridge, UK: Cambridge University Press, 2013.Google Scholar

Sandelowski, M. Sample size in qualitative research. Res Nurs Health. 1995;18(2):179–183.CrossRef Google Scholar PubMed

Fowler, FJ Jr. Survey Research Methods. Los Angeles: Sage publications, 2013.Google Scholar

Johanson, GA, Brooks, GP. Initial scale development: sample size for pilot studies. Educ Psychol Meas. 2010;70(3):394–400.CrossRef Google Scholar

Sadler, GR, Lee, HC, Lim, RSH, Fullerton, J. Recruitment of hard-to-reach population subgroups via adaptations of the snowball sampling strategy. Nurs Health Sci. 2010;12(3):369–374.CrossRef Google Scholar PubMed

Frohlich, MT. Techniques for improving response rates in OM survey research. J Oper Manag. 2002;20(1):53–62.CrossRef Google Scholar

McPeake, J, Bateson, M, O’Neill, A. Electronic surveys: how to maximise success. Nurse Res. 2014;21(3):24–26.CrossRef Google Scholar PubMed

Nunnally, J, Bernstein, I. The assessment of reliability. Psychomet Theory. 1994;3(1):248–292.Google Scholar

Fleiss, JL, Levin, B, Paik, MC. The measurement of interrater agreement. Stat Methods Rates Proportions. 1981;2(212-236):22–23.Google Scholar

Braun, V, Clarke, V. Conceptual and design thinking for thematic analysis. Qual Psychol. 2022;9(1):3–26.CrossRef Google Scholar

Dao, B, Kermanshachi, S, Shane, J, Anderson, S, Hare, E. Identifying and measuring project complexity. Procedia Eng. 2016;145:476–482.CrossRef Google Scholar

Jacobs, MA. Complexity: toward an empirical measure. Technovation. 2013;33(4-5):111–118.CrossRef Google Scholar

Nutter, DO, Bond, JS, Coller, BS, et al. Measuring faculty effort and contributions in medical education. Acad Med. 2000;75(2):200–207.CrossRef Google Scholar PubMed

Morrow, JS, Gershkovich, P, Gibson, J, et al. Measuring faculty effort: a quantitative approach that aligns personal and institutional goals in pathology at Yale. Acad Pathol. 2021;8:23742895211047985.CrossRef Google Scholar PubMed

Amara, N, Rhaiem, M, Halilem, N. Assessing the research efficiency of Canadian scholars in the management field: evidence from the DEA and fsQCA. J Bus Res. 2020;115:296–306.CrossRef Google Scholar

Jiang, J, Lee, SK, Rah, M-J. Assessing the research efficiency of Chinese higher education institutions by data envelopment analysis. Asia Pac Educ Rev. 2020;21:423–440.CrossRef Google Scholar

Gralka, S, Wohlrabe, K, Bornmann, L. How to measure research efficiency in higher education? Research grants vs. publication output. J High Educ Policy Manag. 2019;41(3):322–341.CrossRef Google Scholar

Resnik, DB. Stewardship of research resources. Account Res. 2019;26(3):246–251.CrossRef Google Scholar PubMed

Table 1. Proposed dimensions of research study and trial complexity (adapted from Donabedian’s model)

Table 2. Alignment of new instrument with existing National Cancer Institute tool

Table 3. Characteristics of respondents (phase 2 pilot testing)

Table 4. Individual item analysis of the Research Complexity Index

Table 5. Research Complexity Index (final piloted version)

Article contents

Development and initial psychometric properties of the Research Complexity Index

Abstract

Keywords

Introduction

Materials and methods

Phase 1. Item development; content, face, and cognitive validity testing

Item development

Data collection & analysis

Content validity testing

Face & cognitive validity testing

Phase 2. Pilot testing and initial psychometric analysis

Results

Phase 1

Phase 2

Discussion

Limitations

Conclusion

Acknowledgments

Author contributions

Funding statement

Competing interests

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests