Yearb Med Inform 2017; 26(01): 28-37
DOI: 10.15265/IY-2017-008
Special Section: Learning from Experience: Secondary Use of Patient Data
Working Group Contributions
Georg Thieme Verlag KG Stuttgart

Secondary Use and Analysis of Big Data Collected for Patient Care

Contribution from the IMIA Working Group on Data Mining and Big Data Analytics
F. J. Martin-Sanchez
1   Weill Cornell Medicine, Department of Healthcare Policy and Research, Division of Health Informatics, New York, USA
,
V. Aguiar-Pulido
2   Weill Cornell Medicine, Brain and Mind Research Institute, New York, USA
,
G. H. Lopez-Campos
3   The University of Melbourne, Health & Biomedical Informatics Centre, Melbourne, Australia
,
N. Peek
4   MRC Health e-Research Centre, Division of Informatics, Imaging and Data Science, The University of Manchester, Manchester, UK
,
L. Sacchi
5   Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
› Author Affiliations
Further Information

Publication History

Publication Date:
11 September 2017 (online)

Summary

Objectives: To identify common methodological challenges and review relevant initiatives related to the re-use of patient data collected in routine clinical care, as well as to analyze the economic benefits derived from the secondary use of this data. Through the use of several examples, this article aims to provide a glimpse into the different areas of application, namely clinical research, genomic research, study of environmental factors, and population and health services research. This paper describes some of the informatics methods and Big Data resources developed in this context, such as electronic phenotyping, clinical research networks, biorepositories, screening data banks, and wide association studies. Lastly, some of the potential limitations of these approaches are discussed, focusing on confounding factors and data quality.

Methods: A series of literature searches in main bibliographic databases have been conducted in order to assess the extent to which existing patient data has been repurposed for research. This contribution from the IMIA working group on “Data mining and Big Data analytics” focuses on the literature published during the last two years, covering the timeframe since the working group’s last survey.

Results and Conclusions: Although most of the examples of secondary use of patient data lie in the arena of clinical and health services research, we have started to witness other important applications, particularly in the area of genomic research and the study of health effects of environmental factors. Further research is needed to characterize the economic impact of secondary use across the broad spectrum of translational research.

 
  • References

  • 1 Elliott JH, Grimshaw J, Altman R, Bero L, Goodman SN, Henry D. et al. Informatics: Make sense of health data. Nature 2015; Nov 5; 527 7576 31-2.
  • 2 Martin FSanchez, Gray K, Bellazzi R, Lopez-Campos G. Exposome informatics: considerations for the design of future biomedical research information systems. J Am Med Inform Assoc 2014; 21 (03) 386-90.
  • 3 Mullins CD, Vandigo J, Zheng Z, Wicks P. Patient-centeredness in the design of clinical trials. Value Health 2014; 17 (04) 471-5.
  • 4 Tenenbaum JD, Avillach P, Benham-Hutchins M, Breitenstein MK, Crowgey EL, Hoffman MA. et al. An informatics research agenda to support precision medicine: seven key areas. J Am Med Inform Assoc 2016; Jul; 23 (04) 791-5.
  • 5 Geissbuhler A, Safran C, Buchan I, Bellazzi R, Labkoff S, Eilenberg K. et al. Trustworthy reuse of health data: A transnational perspective. Int J Med Inform 2013; Jan; 82 (01) 1-9.
  • 6 Curtis LH, Weiner MG, Boudreau DM, Cooper WO, Daniel GW, Nair VP. et al. Design considerations, architecture, and use of the Mini-Sentinel distributed data system. Pharmacoepidemiol Drug Saf 2012; Jan; 21 (Suppl. 01) 23-31.
  • 7 Overhage JM, Ryan PB, Reich CG, Hartzema AG, Stang PE. Validation of a common data model for active safety surveillance research. J Am Med Inform Assoc 2012; 19 (01) 54-60.
  • 8 Kohane IS, Churchill SE, Murphy SN. A translational engine at the national scale: informatics for integrating biology and the bedside. J Am Med Inform Assoc 2012; 19 (02) 181-5.
  • 9 Corley DA, Feigelson HS, Lieu TA, McGlynn EA. Building Data Infrastructure to Evaluate and Improve Quality: PCORnet. J Oncol Pract 2015; May; 11 (03) 204-6.
  • 10 Klann JG, Abend A, Raghavan VA, Mandl KD, Murphy SN. Data interchange using i2b2. J Am Med Inform Assoc 2016; Sep; 23 (05) 909-15.
  • 11 Klann JG, Abend A, Raghavan VA, Mandl KD, Murphy SN. Data interchange using i2b2. J Am Med Inform Assoc 2016; Sep; 23 (05) 909-15.
  • 12 OHDSI – Observational Health Data Sciences and Informatics [Internet]. [cited 2016 Dec 12]. Available from: http://www.ohdsi.org/
  • 13 Hripcsak G, Duke JD, Shah NH, Reich CG, Huser V, Schuemie MJ. et al. Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers. Stud Health Technol Inform 2015; 216: 574-8.
  • 14 Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C, Efron MJ, Iyer R, Schatz MC, Sinha S, Robinson GE. Big Data: Astronomical or Genomical?. PLoS Biol 2015; 13 (07) e1002195.
  • 15 O’Driscoll A, Daugelaite J, Sleator RD. ‘Big data’, Hadoop and cloud computing in genomics. J Biomed Inform 2013; 46 (05) 774-81.
  • 16 Tan A, Tripp B, Daley D. BRISK--research-oriented storage kit for biology-related data. Bioinformatics 2011; 27 (17) 2422-5.
  • 17 McConnell P, Dash RC, Chilukuri R, Pietrobon R, Johnson K, Annechiarico R. et al. The cancer translational research informatics platform. BMC Med Inform Decis Mak 2008; 08: 60.
  • 18 Cerami E, Gao J, Dogrusoz U, Gross BE, Sumer SO, Aksoy BA. et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov 2012; 02 (05) 401-4.
  • 19 Madhavan S, Gusev Y, Harris M, Tanenbaum DM, Gauba R, Bhuvaneshwar K. et al. G-DOC: a systems medicine platform for personalized oncology. Neoplasia 2011; 13 (09) 771-83.
  • 20 Shimokawa K, Mogushi K, Shoji S, Hiraishi A, Ido K, Mizushima H. et al. iCOD: an integrated clinical omics database based on the systems-pathology view of disease. BMC Genomics 2010; 11 (Suppl. 04) S19.
  • 21 Ohno-Machado L, Bafna V, Boxwala AA, Chapman BE, Chapman WW, Chaudhuri K. et al. iDASH: integrating data for analysis, anonymization, and sharing. J Am Med Inform Assoc 2012; 19 (02) 196-201.
  • 22 Szalma S, Koka V, Khasanova T, Perakslis ED. Effective knowledge management in translational medicine. J Transl Med 2010; 08: 68.
  • 23 Canuel V, Rance B, Avillach P, Degoulet P, Burgun A. Translational research platforms integrating clinical and omics data: a review of publicly available solutions. Brief Bioinform 2015; 16 (02) 280-90.
  • 24 Kohane IS. HEALTH CARE POLICY. Ten things we have to do to achieve precision medicine. Science 2015; 349 6243 37-8.
  • 25 Gabetta M, Limongelli I, Rizzo E, Riva A, Segagni D, Bellazzi R. BigQ: a NoSQL based framework to handle genomic variants in i2b2. BMC Bioinformatics 2015; 16: 415 Hersh WR. Adding value to the electronic health record through secondary use of data for quality assurance, research, and surveillance. Am J Manag Care 2007 Jun;13(6 Part 1):277-8.
  • 26 Carey DJ, Fetterolf SN, Davis FD, Faucett WA, Kirchner HL, Mirshahi U. et al. The Geisinger MyCode community health initiative: an electronic health record-linked biobank for precision medicine research. Genet Med 2016; 18 (09) 906-13.
  • 27 Mathews JD, Forsythe AV, Brady Z, Butler MW, Goergen SK, Byrnes GB. et al. Cancer risk in 680,000 people exposed to computed tomography scans in childhood or adolescence: data linkage study of 11 million Australians. BMJ 2013; 346: f2360.
  • 28 Been JV, Mackay DF, Millett C, Pell JP, van Schayck OC, Sheikh A. Impact of smoke-free legislation on perinatal and infant mortality: a national quasi-experimental study. Sci Rep 2015; 05: 13020.
  • 29 Deeny SR, Steventon A. Making sense of the shadows: priorities for creating a learning healthcare system based on routinely collected data. BMJ Qual Saf 2015; Aug; 24 (08) 505-15.
  • 30 Bagley SC, Altman RB. Computing disease incidence, prevalence and comorbidity from electronic medical records. J Biomed Inform 2016; Oct; 63: 108-111.
  • 31 Rusanov A, Weiskopf NG, Wang S, Weng C. Hidden in plain sight: bias towards sick patients when sampling patients with sufficient electronic health record data for research. BMC Med Inform Decis Mak 2014; 14: 51.
  • 32 Terris DD, Litaker DG, Koroukian SM. Health state information derived from secondary databases is affected by multiple sources of bias. J Clin Epidemiol 2007; 60 (07) 734-41.
  • 33 Ancker JS, Kern LM, Edwards A, Nosal S, Stein DM, Hauser D. et al. HITEC Investigators How is the electronic health record being used? Use of EHR data to assess physician-level variability in technology use. J Am Med Inform Assoc 2014; 21 (06) 1001-8.
  • 34 Safran C. Reuse of Clinical Data. Yearb Med Inform 2014; Aug 15; 09 (01) 52-4.
  • 35 Hersh WR. Adding value to the electronic health record through secondary use of data for quality assurance, research, and surveillance. Am J Manag Care 2007; Jun; 13 (6 Part 1): 277-8.
  • 36 Jensen PB, Jensen LJ, Brunak S. Mining electronic health records: towards better research applications and clinical care. Nat Rev Genet 2012; May 2; 13 (06) 395-405.
  • 37 Electronic Health Records for Clinical Research - (EHR4CR) [Internet]. [cited 2016 Dec 12]. Available from: http://www.ehr4cr.eu/
  • 38 De Moor G, Sundgren M, Kalra D, Schmidt A, Dugas M, Claerhout B. et al. Using electronic health records for clinical research: The case of the EHR4CR project. J Biomed Inform 2015; Feb; 53: 162-73.
  • 39 Bruland P, McGilchrist M, Zapletal E, Acosta D, Proeve J, Askin S. et al. Common data elements for secondary use of electronic health record data for clinical trial execution and serious adverse event reporting. BMC Med Res Methodol [Internet]. 2016 Nov 22 [cited 2016 Dec 12]; 16 Available from: http://www.ncbi.nlm.nih.gov/pmc/articles/ PMC5118882/
  • 40 Beresniak A, Schmidt A, Proeve J, Bolanos E, Patel N, Ammour N. et al. Cost-benefit assessment of using electronic health records data for clinical research versus current practices: Contribution of the Electronic Health Records for Clinical Research (EHR4CR) European Project. Contemp Clin Trials 2016; Jan; 46: 85-91.
  • 41 EMIF- European Medical Information Framework. http://www.emif.eu Last accessed 28 March 2017.
  • 42 Vaudano E, Vannieuwenhuyse B, Van Der Geyten S, van der Lei J, Visser PJ, Streffer J. et al. Boosting translational research on Alzheimer’s disease in Europe: The Innovative Medicine Initiative AD research platform. Alzheimers Dement 2015; Sep; 11 (09) 1121-2.
  • 43 McMurry AJ, Murphy SN, MacFadden D, Weber G, Simons WW, Orechia J. et al. SHRINE: enabling nationally scalable multisite disease studies. PLoS One 2013; 08 (03) e55811.
  • 44 Richesson RL, Hammond WE, Nahm M, Wixted D, Simon GE, Robinson JG. et al. Electronic health records based phenotyping in next-generation clinical trials: a perspective from the NIH Health Care Systems Collaboratory. J Am Med Inform Assoc 2013; Dec; 20 (e2): e226-31.
  • 45 Hripcsak G, Albers DJ. Next-generation phenotyping of electronic health records. J Am Med Inform Assoc 2013; 20 (01) 117-21.
  • 46 Rasmussen LV, Thompson WK, Pacheco JA, Kho AN, Carrell DS, Pathak J. et al. Design Patterns for the Development of Electronic Health Record-Driven Phenotype Extraction Algorithms. J Biomed Inform 2014; Oct; 00: 280-6.
  • 47 Liao KP, Cai T, Savova GK, Murphy SN, Karlson EW, Ananthakrishnan AN. et al. Development of phenotype algorithms using electronic medical records and incorporating natural language processing. BMJ [Internet]. 2015 Apr 24 [cited 2016 Dec 11]; 350 Available from: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4707569/
  • 48 Yu S, Liao KP, Shaw SY, Gainer VS, Churchill SE, Szolovits P. et al. Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources. J Am Med Inform Assoc 2015; Sep; 22 (05) 993-1000.
  • 49 Miotto R, Li L, Kidd BA, Dudley JT. Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records. Sci Rep [Internet]. 2016 May 17 [cited 2016 Dec 11]; 6 Available from: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4869115/
  • 50 Albers DJ, Elhadad N, Tabak E, Perotte A, Hripcsak G. Dynamical Phenotyping: Using Temporal Analysis of Clinically Collected Physiologic Data to Stratify Populations. PLoS ONE [Internet]. 2014 Jun 16 [cited 2016 Dec 12]; 09. (06). Available from: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4059642/
  • 51 Doshi-Velez F, Ge Y, Kohane I. Comorbidity Clusters in Autism Spectrum Disorders: An Electronic Health Record Time-Series Analysis. Pediatrics 2014; Jan; 133 (01) e54-63.
  • 52 Hripcsak G, Albers DJ, Perotte A. Parameterizing time in electronic health record studies. J Am Med Inform Assoc 2015; Jul; 22 (04) 794-804.
  • 53 Hripcsak G, Mirhaji P, Low AF, Malin BA. Preserving temporal relations in clinical data while maintaining privacy. J Am Med Inform Assoc 2016; Nov 1; 23 (06) 1040-5.
  • 54 Gabetta M, Malovini A, Bucalo M, Zini E, Tibollo V, Priori SG. et al. Beyond Cohort Selection: An Analytics-Enabled i2b2. Stud Health Technol Inform 2016; 228: 572-6.
  • 55 Hu Z, Jin B, Shin AY, Zhu C, Zhao Y, Hao S. et al. Real-Time Web-Based Assessment of Total Population Risk of Future Emergency Department Utilization: Statewide Prospective Active Case Finding Study. Int J Med Res 2015; Jan 13; 04 (01) e2.
  • 56 Segagni D, Sacchi L, Dagliati A, Tibollo V, Leporati P, De Cata P. et al. Improving Clinical Decisions on T2DM Patients Integrating Clinical, Administrative and Environmental Data. Stud Health Technol Inform 2015; 216: 682-6.
  • 57 van Teeffelen SR, Douglas CM, van El CG, Weinreich SS, Henneman L, Radstake M. et al. Mothers’ Views on Longer Storage of Neonatal Dried Blood Spots for Specific Secondary Uses. Public Health Genomics 2016; 19 (01) 25-33.
  • 58 Sen A, Heredia N, Senut MC, Land S, Hollocher K, Lu X. et al. Multigenerational epigenetic inheritance in humans: DNA methylation changes associated with maternal exposure to lead can be transmitted to the grandchildren. Sci Rep 2015; 05: 14466.
  • 59 Poon AF, Gustafson R, Daly P, Zerr L, Demlow SE, Wong J. et al. Near real-time monitoring of HIV transmission hotspots from routine HIV genotyping: an implementation case study. Lancet HIV 2016; 03 (05) e231-8.
  • 60 Ritchie SC, Wurtz P, Nath AP, Abraham G, Havulinna AS, Fearnley LG. et al. The Biomarker GlycA Is Associated with Chronic Inflammation and Predicts Long-Term Risk of Severe Infection. Cell Syst 2015; 01 (04) 293-301.
  • 61 Kho AN, Pacheco JA, Peissig PL, Rasmussen L, Newton KM, Weston N. et al. Electronic medical records for genetic research: results of the eMERGE consortium. Sci Transl Med 2011; 03 (79) 79re1.
  • 62 Krishnamoorthy P, Gupta D, Chatterjee S, Huston J, Ryan JJ. A review of the role of electronic health record in genomic research. J Cardiovasc Transl Res 2014; 07 (08) 692-700.
  • 63 Gottesman O, Kuivaniemi H, Tromp G, Faucett WA, Li R, Manolio TA. et al. The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future. Genet Med 2013; 15 (10) 761-71.
  • 64 Verma SS, Cooke JNBailey, Lucas A, Bradford Y, Linneman JG, Hauser MA. et al. Epistatic Gene-Based Interaction Analyses for Glaucoma in eMERGE and NEIGHBOR Consortium. PLoS Genet 2016; 12 (09) e1006186.
  • 65 Namjou B, Marsolo K, Lingren T, Ritchie MD, Verma SS, Cobb BL. et al. A GWAS Study on Liver Function Test Using eMERGE Network Participants. PloS One 2015; 10 (09) e0138677.
  • 66 Almoguera B, Vazquez L, Mentch F, Connolly J, Pacheco JA, Sundaresan AS. et al. Identification of Four Novel Loci in Asthma in European and African American Populations. Am J Respir Crit Care Med 2017; Feb 15; 195 (04) 456-63.
  • 67 Dumitrescu L, Goodloe R, Bradford Y, Farber-Eger E, Boston J, Crawford DC. The effects of electronic medical record phenotyping details on genetic association studies: HDL-C as a case study. BioData Min 2015; 08: 15.
  • 68 Crawford DC, Goodloe R, Farber-Eger E, Boston J, Pendergrass SA, Haines JL. et al. Leveraging Epidemiologic and Clinical Collections for Genomic Studies of Complex Traits. Hum Hered 2015; 79 3-4 137-46.
  • 69 Bush WS, Crosslin DR, Owusu-Obeng A, Wallace J, Almoguera B, Basford MA. et al. Genetic variation among 82 pharmacogenes: The PGRNseq data from the eMERGE network. Clin Pharmacol Ther 2016; 100 (02) 160-9.
  • 70 Denny JC, Bastarache L, Roden DM. Phenome-Wide Association Studies as a Tool to Advance Precision Medicine. Annu Rev Genomics Hum Genet 2016; 17: 353-73.
  • 71 Bush WS, Oetjens MT, Crawford DC. Unravelling the human genome-phenome relationship using phenome-wide association studies. Nat Rev Genet 2016; 17 (03) 129-45.
  • 72 Gottlieb L, Tobey R, Cantor J, Hessler D, Adler NE. Integrating Social And Medical Data To Improve Population Health: Opportunities And Barriers. Health Aff 2016; 35 (11) 2116-23.
  • 73 Austin PC. An Introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behav Res 2011; 46: 399-424.
  • 74 De Vries H, Kemps HM, van Engen-Verheul MM, Kraaijenhagen RA, Peek N. Cardiac rehabilitation and survival in a large representative community cohort of Dutch patients. Eur Heart J 2015; 36 (24) 1519-28.
  • 75 Sekhon JS, Grieve RD. A matching method for improving covariate balance in cost-effectiveness analyses. Health Econ 2012; 21 (06) 695-714.
  • 76 Steventon A, Bardsley M, Mays N. Effect of a telephonic alert system (Healthy outlook) for patients with chronic obstructive pulmonary disease: a cohort study with matched controls. J Public Health 2015; 37 (02) 313-21.
  • 77 Tibshirani R. Regression shrinkage and selection via the Lasso. J Royal Stat Soc B 1996; 58: 267-88.
  • 78 Ritchie MD, de Andrade M, Kuivaniemi H. The foundation of precision medicine: integration of electronic health records with genomics through basic, clinical, and translational research. Front Genet 2015; 06: 104.
  • 79 Hersh WR, Weiner MG, Embi PJ, Logan JR, Payne PRO, Bernstam EV. et al. Caveats for the Use of Operational Electronic Health Record Data in Comparative Effectiveness Research. Med Care 2013; 51 (08) S30-S7.
  • 80 Bassi J, Lau F. Measuring value for money: a scoping review on economic evaluation of health information systems. J Am Med Inform Assoc 2013; 20 (04) 792-801.
  • 81 Menachemi N, Brooks RG. Reviewing the benefts and costs of electronic health records and associated patient safety technologies. J Med Syst 2006; 30 (03) 159-68.
  • 82 Park H, Lee SI, Hwang H, Kim Y, Heo EY, Kim JW. et al. Can a health information exchange save healthcare costs? Evidence from a pilot program in South Korea. Int J Med Inform 2015; 84 (09) 658-66.
  • 83 Pisano F, Lorenzoni G, Sabato SS, Soriani N, Narraci O, Accogli M. et al. Networking and data sharing reduces hospitalization cost of heart failure: the experience of GISC study. J Eval Clin Pract 2015; 21 (01) 103-8.
  • 84 Boonstra A, Versluis A, Vos JF. Implementing electronic health records in hospitals: a systematic literature review. BMC Health Serv Res 2014; 14: 370.
  • 85 Davis AP, Wiegers TC, Rosenstein MC, Mattingly CJ. MEDIC: a practical disease vocabulary used at the Comparative Toxicogenomics Database. Database. Oxford: 2012. 2012:bar065.
  • 86 Casey JA, Schwartz BS, Stewart WF, Adler NE. Using Electronic Health Records for Population Health Research: A Review of Methods and Applications. Annu Rev Public Health 2016; 37: 61-81.
  • 87 Nalichowski R, Keogh D, Chueh HC, Murphy SN. Calculating the benefits of a Research Patient Data Repository. AMIA Annu Symp Proc. 2006: 1044.
  • 88 Bowton E, Field JR, Wang S, Schildcrout JS, Van Driest SL, Delaney JT. et al. Biobanks and electronic medical records: enabling cost-effective research. Sci Transl Med 2014; 06 (234) 234cm3.
  • 89 NUCATS drives top recruiting in NIH-funded heart failure trials and ‘big data” analysis in novel clinical program [cited 2016 Dec 12]. Available at: http://nucats.northwestern.edu/about/success-stories/drives-top-recruiting-nih-funded-heart-failure-trials-big-data-analysis-novel-clinical-program
  • 90 RAPID evaluates transplant clinic performance & patient outcomes in real-time. [cited 2016 Dec 12]. Available at: http://nucats.northwestern.edu/about/success-stories/rapid-evaluates-transplant-clinic-performance-patient-outcomes-real-time