Methods Inf Med 2011; 50(05): 397-407
DOI: 10.3414/ME10-01-0020
Original Articles
Schattauer GmbH

Effectiveness of Lexico-syntactic Pattern Matching for Ontology Enrichment with Clinical Documents

K. Liu
1   Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
,
W. W. Chapman
1   Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
2   Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
,
G. Savova
3   Department of Health Services Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, USA
,
C. G. Chute
3   Department of Health Services Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, USA
,
N. Sioutos
4   Lockheed Martin Corporation, Fairfax, Virginia, USA
,
R. S. Crowley
1   Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
2   Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
5   Department of Pathology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
› Author Affiliations
Further Information

Publication History

received: 03 March 2010

accepted: 06 October 2010

Publication Date:
18 January 2018 (online)

Summary

Objective: To evaluate the effectiveness of a lexico-syntactic pattern (LSP) matching method for ontology enrichment using clinical documents.

Methods: Two domains were separately studied using the same methodology. We used radiology documents to enrich RadLex and pathology documents to enrich National Cancer Institute Thesaurus (NCIT). Several known LSPs were used for semantic knowledge extraction. We first retrieved all sentences that contained LSPs across two large clinical repositories, and examined the frequency of the LSPs. From this set, we randomly sampled LSP instances which were examined by human judges. We used a twostep method to determine the utility of these patterns for enrichment. In the first step, domain experts annotated medically meaningful terms (MMTs) from each sentence within the LSP. In the second step, RadLex and NCIT curators evaluated how many of these MMTs could be added to the resource. To quantify the utility of this LSP method, we defined two evaluation metrics: suggestion rate (SR) and acceptance rate (AR). We used these measures to estimate the yield of concepts and relationships, for each of the two domains.

Results: For NCIT, the concept SR was 24%, and the relationship SR was 65%. The concept AR was 21%, and the relationship AR was 14%. For RadLex, the concept SR was 37%, and the relationship SR was 55%. The concept AR was 11%, and the relationship AR was 44%.

Conclusion: The LSP matching method is an effective method for concept and concept relationship discovery in biomedical domains.

 
  • References

  • 1 Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucl Acids Res 2004; 32 (01) 267-270.
  • 2 Cowell L, Smith B. Infectious disease ontology. In: Sintchenko V. editor. Infectious Disease Informatics. New York City: Springer; 2010. pp 373-395.
  • 3 The Gene Ontology Consortium.. The Gene Ontology project. Nucl Acids Res 2008; 36 (01) 440-444.
  • 4 HL7: HL7 Reference Information Model. Available from: http://www.hl7.org/implement/standards/rim.cfm
  • 5 Achour S, Dojat M, Rieux C, Bierling P, Lepage E. A UMLS-based Knowledge Acquisition Tool for Rule-based Clinical Decision Support System Development. J Am Med Inform Assoc 2001; 8: (04) 351-360.
  • 6 Collier N, Kawazoe A, Jin L, Shigematsu M, Dien D, Barrero RA. et al. A multilingual ontology for infectious disease surveillance: rationale, design and challenges. Language Resources and Evaluation 2006; 40: 405-413.
  • 7 Kashyap V, Morales A, Hongsermeier T. On implementing clinical decision support: achieving scalability and maintainability by combining business rules and ontologies. In: Proceedings of the Annual Symposium of American Medical Informatics Association 2006. Washington, DC: 2006. pp 414-418.
  • 8 Haynes B, McKibbon A, Wilczynski N, Walter S, Werre S, for the Hedges T. Optimal search strategies for retrieving scientifically strong studies of treatment from Medline: analytical survey. Brit Med J 2005; 330 7501 1179.
  • 9 Sneiderman CA, Demner-Fushman D, Marcelo Fiszman M, Ide NC, Rindflesch TC. Knowledge-based methods to help clinicians find answers in MEDLINE. J Am Med Inform Assoc 2007; 14 (06) 772-780.
  • 10 Meystre S, Haug PJ. Natural language processing to extract medical problems from electronic clinical documents: Performance evaluation. J Biomed Inform 2006; 39 (06) 589-599.
  • 11 Liang T, Lin Y-H. Anaphora Resolution for Biomedical Literature by Exploiting Multiple Resources. In: Dale R, Wong K-F, Su J, Kwong OY. editors. Natural Language Processing – IJCNLP. Berlin/Heidelberg: Springer; 2005. pp 742-753.
  • 12 Pustejovsky J, Rumshisky A, Castano J. Rerendering semantic ontologies: Automatic extensions to UMLS through corpus analytics. Language Resources and Evaluation Workshop on Ontologies and Lexical Knowledge Bases. Las Palmas, Canary Islands, Spain: 2002 pp 60-67.
  • 13 Girju R, Badulescu A, Moldovan D. Learning semantic constraints for the automatic discovery of part-whole relations. In: Proceedings of the Human Language Technology Conference. Edmonton, Canada: 2003. pp 80-87.
  • 14 Wagner C. End-users as expert system developers. Journal of End User Computing 2000; 12 (03) 3-13.
  • 15 Wagner C. Breaking the knowledge acquisition bottleneck through conversational knowledge management. Information Resources Management 2006; 19 (01) 70-83.
  • 16 Waterman DA. A guide to expert systems. Addison-Wesley Longman Publishing Co., Inc.; 1985
  • 17 Druss BG, Marcus SC. Growth and decentralization of the medical literature: implications for evidence-based medicine. J Med Libr Assoc 2005; 93 (04) 499-501.
  • 18 Chun H-W, Tsuruoka Y, Kim J-D, Shiba R, Nagata N, Hishiki T. Extraction of gene-disease relations from Medline using domain dictionaries and machine learning. In: Proceedings of Pacific Symposium on Biocomputing. Maui, HI: 2006. pp 4-15.
  • 19 Collier N, Park H, Ogata N, Tateishi Y, Nobata C, Ohta T. et al. The GENIA project: corpus-based knowledge acquisition and information extraction from genome research papers. In: 9th Conference of the European Chapter of the Association for Computational Linguistics. Bergen, Norway: 1999. pp 271-272.
  • 20 Daraselia N, Yuryev A, Egorov S, Novichkova S, Nikitin A, Mazo I. Extracting human protein interactions from MEDLINE using a full-sentence parser. Bioinformatics 2004; 20 (05) 604-611.
  • 21 Chapman WW, Dowling JN, Wagner MM. Fever detection from free-text clinical records for biosurveillance. J Biomed Inform 2004; 37 (02) 120-127.
  • 22 South BR, Chapman WW, Delisle S, Shen S, Kalp E, Perl T. et al. Optimizing a Syndromic Surveillance Text Classifier for Influenza-like Illness: Does Document Source Matter?. In: Proceedings of the Annual Symposium of American Medical Informatics Association. Washington, DC: 2008. pp 692-696.
  • 23 Cornet R, De Keizer NF, Abu-Hanna A. A framework for characterizing terminological systems. Methods Inf Med 2006; 45: 253-266.
  • 24 de Keizer NF, Abu-Hanna A. Understanding terminological systems II: terminology and typology. Methods Inf Med 2000; 39: 22-29.
  • 25 de Keizer NF, Abu-Hanna A, Zwetsloot-Schonl JHM. Understanding terminological systems I: terminology and typology. Methods Inf Med 2000; 39: 16-21.
  • 26 Buitelaar P, Cimiano P, Magnini B. Ontology learning from text: method, evaluation and applications. Breuker J, Dieng R, Guarino N, Mantaras RLd, Mizoguchi R, Musen M. editors. Amsterdam, Berlin, Oxford, Tokyo, Washington DC:: IOS Press; 2005
  • 27 Gomez-Perez A, Manzano-Macho D. An overview of method and tools for ontology learning from texts. The Knowledge Engineering Review 2005; 19 (03) 187-212.
  • 28 Caraballo S. Automatic construction of a hyper-nym-labeled noun hierarchy from text. In: Proceedings of the 37th Conference on Computational Linguistics. College Park, MD: 1999. pp 120-126.
  • 29 Cederberg S, Widdows D. Using LSA and noun coordination information to improve the precision and recall of automatic hyponymy extraction. In: Proceedings of the 7th Conference on Natural Language Learning. Edmonton, Canada: 2003. pp 111-118.
  • 30 Downey D, Etzioni O, Soderland S, Weld DS. Learning text patterns for Web information extraction and assessment. In: Proceedings of the American Association for Artificial Intelligence Workshop on Adaptive Text Extraction and Mining. San Jose, CA: 2004. pp 50-55.
  • 31 Hearst MA. Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 12th Conference on Computational Linguistics. Nantes, France: 1992. pp 539-545.
  • 32 Church KW, Hanks P. Word association norms, mutual information, and lexicography. In: Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics. Vancouver, BC, Canada: 1989. pp 76-83.
  • 33 Grefenstette G. Sextant: exploring unexplored contexts for semantic extraction from syntactic analysis. In: Proceedings of the 30th annual meeting of the Association for Computational Linguistics. Newark, DE: 1992. pp 324-326.
  • 34 Grefenstette G. Explorations in automatic thesaurus discovery. Boston, MA: Kluwer Academic Publisher; 1994
  • 35 Kavalec M, Svatek V. A study on automated relation labeling in ontology learning. In: Buitelaar P, Cimiano P, Magnini B. editors. Ontology Learning from Text: Method, Evaluation and Applications. Amsterdam, Berlin, Oxford, Tokyo, Washington DC: IOS Press; 2005. pp 44-58.
  • 36 Nenadâ G, Spasiâ I, Ananiadou S. Automatic discovery of term similarities using pattern mining. In: Proceedings of the 2nd International Workshop on Computational Terminology. Taipei, Taiwan: Association for Computational Linguistics; 2002. pp 1-7.
  • 37 Ryu P-M, Choi K-S. Measuring the specificity of terms for automatic hierarchy construction. In: Proceedings of the European Conference on Artificial Intelligence Workshop on Ontology Learning and Population. Valencia, Spain: 2004
  • 38 Liu K, Hogan WR, Crowley RS. Natural language processing methods and systems for biomedical ontology learning. J Biomed Inform 2010. In press
  • 39 ODIE toolkit.. 2010. Available from: http://bioontology.org/tools/ODIE.html
  • 40 Crowley RS, Chavan G, Mitchell K, Liu K, Savova G, Chapman W. et al. ODIE – A workbench for cyclic entity recognition and ontology enrichment. In: Proceedings of the Annual Symp of American Medical Informatics Association. Washington, DC: 2010. Submitted.
  • 41 Mukherjea S, Sahay S. Discovering biomedical relations utilizing the World-Wide Web. In: Proceedings of the Pacific Symposium on Biocomputing. Maui, HI; 2006. pp 164-175.
  • 42 Berland M, Charniak E. Finding parts in very large corpora. In: Proceedings of the 37th Conference on Computational Linguistics. College Park, MD; 1999. pp 57-64.
  • 43 Sundblad H. Automatic acquisition of hyponyms and meronyms from question corpora.. In: Proceedings of the 15th European Conference on Artificial Intelligence. Lyon, France;: 2002
  • 44 Fiszman M, Rindflesch TC, Kilicoglu H. Integrating a hypernymic proposition interpreter into a semantic processor for biomedical texts. In: Proceedings of the Annual Symposium of the American Medical Informatics Association. Washington, DC: 2003. pp 239-243.
  • 45 Health Insurance Portability and Accountability Act of 1996. Available from: http://aspe.hhs.gov/admnsimp/pl104191.htm
  • 46 National Cancer Institute Thesaurus (NCIT) 2010. Available from: http://ncit.nci.nih.gov
  • 47 Mejino JLV, Rubin DL, Brinkley JF. FMA-RadLex: an application ontology of radiological anatomy derived from the Foundational Model of Anatomy reference ontology. In: Proceedings of the Annual Symposium of the American Medical Informatics Association. Washington, DC: 2008. p 465.
  • 48 Liu K, Chapman W, Hwa R, Crowley RS. Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger. J Am Med Inform Assoc 2007; 14 (05) 641-650.
  • 49 GATE.. June 2010. Available from: http://gate.ac.uk/
  • 50 Chapman WW, Dowling JN, Hripcsak G. Evaluation of training with an annotation schema for manual annotation of clinical conditions from emergency department reports. Int J Med Inform 2008; 77 (02) 107-113.
  • 51 Riloff E. Automatically generating extraction patterns from untagged text. In: Proceedings of the 13th National Conference on Artificial Intelligence. Portland, OR; 1996. pp 1044-1049.
  • 52 Xu R, Morgan A, Das AK, Garber A. Investigation of unsupervised pattern learning techniques for bootstrap construction of a medical treatment lexicon. In: Proceedings of the Workshop on Bio NLP, Boulder; Colorado: 2009. pp 63-70.
  • 53 Pantel P, Ravich D, Hovy E. Towards terascale knowledge acquisition. In: Proceedings of THe Conference on Computational Linguistics. Barcelona; Spain: 2004. pp 771-777.
  • 54 Snow R, Jurafsky D, Ng AY. editors. Learning syntactic patterns for automatic hypernym discovery. Cambridge, MA: MIT Press; 2005
  • 55 Embarek M, Ferret O. Learning patterns for building resources about semantic relations in the medical domain. In: Proceedings of the 6th International Confernce on Language Resources and Evaluation. Marrakech; Morocco: 2008. pp 2006-2012.