Introduction
Running clinical studies is time-consuming and requires constant focus on data quality. The lack of time physicians are able to dedicate to non-care related activities make it difficult to conduct studies in parallel to patient care. This is especially true for large studies that depend on the input of multiple investigators with varying personal interest in the study. From the point of view of medical informatics, the growing amount of electronically available patient data enables the reuse of the electronic medical record for clinical research e.g. to support the two key activities in such studies: the recruitment of patients and the documentation of study data [1].
The general idea for routine data based recruitment support is to encode the eligibility criteria of a study and thus enable a computer to regularly compare them with the electronic profile of all patients within the hospital or outpatient clinic. Both, very simple algorithms based on manual translation of eligibility criteria to SQL scripts [2], [3] and more sophisticated approaches like semantic web techniques [4], probability calculation [5], [6] and natural language processing [7], [8], [9] have been applied to compare eligibility criteria with routinely documented patient data. Integration into the physician's workflow ranges from invocation of passive screening lists [10] to active systems, which notify a predefined person about new matches using pagers, emails and pop ups [11], [12]. While a retrospective review by Cuggia et al. [13] identified 28 different recruitment support systems described in the literature until 2008, only one of these compared the quality of the system's propositions against those of manual recruiters: Based on patient data from 3 years, Fink et al. [14] retrospectively evaluated the eligibility of 261 patients for 14 cancer trials and compared their results with the actual trial inclusions achieved by clinicians in that time period. They report a potential increase in the number of trial participants of 250%. A more recent study by Weng et al. [15] compared an SQL-based screening system for post-Acute Coronary Syndrome with two simpler methods. The specificity of the screening system was 19% compared to 8% and 9% for the latter two. Despite its low specificity, the screening tool contributed to an increase of enrolled patients of 66%.
The rationale for routine data based data acquisition support is the assumption, that some of the trial data is already available as part of the routine clinical documentation. This assumption was recently confirmed by El Fadly et al. [16] who found 13.4% of 232 data elements required for a trial on hypertension to be readily available from the electronic health record (EHR). Direct transferral of this data from the EHR into the study database would eliminate the need for redundant documentation of similar data and might thus decrease the amount of time an investigator needs to dedicate to each trial. The reuse of EHR data for data acquisition is becoming increasingly popular as Dean et al. [17] states in a review for the case of outcomes research. Though we were unable to find a similar review for clinical trials, individual reports do exist, like the point-of-care clinical trial (POCCT) on insulin therapy for diabetes [18] and the interventional STARBRITE trial on advanced heart failure [19]. The risk of poor data quality is however much more intensely discussed than the potential benefits [20], [21]. In fact, we could not find any publication that measured time-savings achieved through EHR-based data acquisition in a running trial.
We thus believe that there is a significant lack of reports on the application of secondary use in running clinical trials and its benefits in quantifiable measures. The objective of this research was to quantify the benefit of both single source measures by (1) comparing the sensitivity and specificity of manual and electronically supported patient recruitment and (2) comparing documentation times needed for manual and semi-automatic data acquisition.