Effectiveness of Lexico-syntactic Pattern Matching for Ontology Enrichment with Clinical Documents

K. Liu; W. W. Chapman; G. Savova; C. G. Chute; N. Sioutos; R. S. Crowley

doi:10.3414/ME10-01-0020

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035037.xml

Share / Bookmark

Facebook X Linkedin Weibo

Download PDF

Methods Inf Med 2011; 50(05): 397-407
DOI: 10.3414/ME10-01-0020

Original Articles

Schattauer GmbH

Effectiveness of Lexico-syntactic Pattern Matching for Ontology Enrichment with Clinical Documents

K. Liu

¹Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA

,

W. W. Chapman

¹Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA

²Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA

,

G. Savova

³Department of Health Services Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, USA

,

C. G. Chute

³Department of Health Services Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, USA

,

N. Sioutos

⁴Lockheed Martin Corporation, Fairfax, Virginia, USA

,

R. S. Crowley

¹Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA

²Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA

⁵Department of Pathology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA

› Author Affiliations

Further Information

Publication History

received: 03 March 2010

accepted: 06 October 2010

Publication Date:
18 January 2018 (online)

Abstract
Full Text
References

Permissions and Reprints

Summary

Objective: To evaluate the effectiveness of a lexico-syntactic pattern (LSP) matching method for ontology enrichment using clinical documents.

Methods: Two domains were separately studied using the same methodology. We used radiology documents to enrich RadLex and pathology documents to enrich National Cancer Institute Thesaurus (NCIT). Several known LSPs were used for semantic knowledge extraction. We first retrieved all sentences that contained LSPs across two large clinical repositories, and examined the frequency of the LSPs. From this set, we randomly sampled LSP instances which were examined by human judges. We used a twostep method to determine the utility of these patterns for enrichment. In the first step, domain experts annotated medically meaningful terms (MMTs) from each sentence within the LSP. In the second step, RadLex and NCIT curators evaluated how many of these MMTs could be added to the resource. To quantify the utility of this LSP method, we defined two evaluation metrics: suggestion rate (SR) and acceptance rate (AR). We used these measures to estimate the yield of concepts and relationships, for each of the two domains.

Results: For NCIT, the concept SR was 24%, and the relationship SR was 65%. The concept AR was 21%, and the relationship AR was 14%. For RadLex, the concept SR was 37%, and the relationship SR was 55%. The concept AR was 11%, and the relationship AR was 44%.

Conclusion: The LSP matching method is an effective method for concept and concept relationship discovery in biomedical domains.

Keywords

Ontology learning from text - knowledge acquisition - ontology enrichment - natural language processing - lexico-syntactic pattern

References
1 Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucl Acids Res 2004; 32 (01) 267-270.

PubMed Google Scholar
2 Cowell L, Smith B. Infectious disease ontology. In: Sintchenko V. editor. Infectious Disease Informatics. New York City: Springer; 2010. pp 373-395.

Crossref Google Scholar
3 The Gene Ontology Consortium.. The Gene Ontology project. Nucl Acids Res 2008; 36 (01) 440-444.

Crossref PubMed Google Scholar
4 HL7: HL7 Reference Information Model. Available from: http://www.hl7.org/implement/standards/rim.cfm

PubMed
5 Achour S, Dojat M, Rieux C, Bierling P, Lepage E. A UMLS-based Knowledge Acquisition Tool for Rule-based Clinical Decision Support System Development. J Am Med Inform Assoc 2001; 8: (04) 351-360.

Crossref PubMed Google Scholar
6 Collier N, Kawazoe A, Jin L, Shigematsu M, Dien D, Barrero RA. et al. A multilingual ontology for infectious disease surveillance: rationale, design and challenges. Language Resources and Evaluation 2006; 40: 405-413.

PubMed Google Scholar
7 Kashyap V, Morales A, Hongsermeier T. On implementing clinical decision support: achieving scalability and maintainability by combining business rules and ontologies. In: Proceedings of the Annual Symposium of American Medical Informatics Association 2006. Washington, DC: 2006. pp 414-418.

Google Scholar
8 Haynes B, McKibbon A, Wilczynski N, Walter S, Werre S, for the Hedges T. Optimal search strategies for retrieving scientifically strong studies of treatment from Medline: analytical survey. Brit Med J 2005; 330 7501 1179.

Crossref PubMed Google Scholar
9 Sneiderman CA, Demner-Fushman D, Marcelo Fiszman M, Ide NC, Rindflesch TC. Knowledge-based methods to help clinicians find answers in MEDLINE. J Am Med Inform Assoc 2007; 14 (06) 772-780.

Crossref PubMed Google Scholar
10 Meystre S, Haug PJ. Natural language processing to extract medical problems from electronic clinical documents: Performance evaluation. J Biomed Inform 2006; 39 (06) 589-599.

Crossref PubMed Google Scholar
11 Liang T, Lin Y-H. Anaphora Resolution for Biomedical Literature by Exploiting Multiple Resources. In: Dale R, Wong K-F, Su J, Kwong OY. editors. Natural Language Processing – IJCNLP. Berlin/Heidelberg: Springer; 2005. pp 742-753.

Google Scholar
12 Pustejovsky J, Rumshisky A, Castano J. Rerendering semantic ontologies: Automatic extensions to UMLS through corpus analytics. Language Resources and Evaluation Workshop on Ontologies and Lexical Knowledge Bases. Las Palmas, Canary Islands, Spain: 2002 pp 60-67.

PubMed Google Scholar
13 Girju R, Badulescu A, Moldovan D. Learning semantic constraints for the automatic discovery of part-whole relations. In: Proceedings of the Human Language Technology Conference. Edmonton, Canada: 2003. pp 80-87.

Google Scholar
14 Wagner C. End-users as expert system developers. Journal of End User Computing 2000; 12 (03) 3-13.

PubMed Google Scholar
15 Wagner C. Breaking the knowledge acquisition bottleneck through conversational knowledge management. Information Resources Management 2006; 19 (01) 70-83.

PubMed Google Scholar
16 Waterman DA. A guide to expert systems. Addison-Wesley Longman Publishing Co., Inc.; 1985

Google Scholar
17 Druss BG, Marcus SC. Growth and decentralization of the medical literature: implications for evidence-based medicine. J Med Libr Assoc 2005; 93 (04) 499-501.

PubMed Google Scholar
18 Chun H-W, Tsuruoka Y, Kim J-D, Shiba R, Nagata N, Hishiki T. Extraction of gene-disease relations from Medline using domain dictionaries and machine learning. In: Proceedings of Pacific Symposium on Biocomputing. Maui, HI: 2006. pp 4-15.

Google Scholar
19 Collier N, Park H, Ogata N, Tateishi Y, Nobata C, Ohta T. et al. The GENIA project: corpus-based knowledge acquisition and information extraction from genome research papers. In: 9th Conference of the European Chapter of the Association for Computational Linguistics. Bergen, Norway: 1999. pp 271-272.

Google Scholar
20 Daraselia N, Yuryev A, Egorov S, Novichkova S, Nikitin A, Mazo I. Extracting human protein interactions from MEDLINE using a full-sentence parser. Bioinformatics 2004; 20 (05) 604-611.

Crossref PubMed Google Scholar
21 Chapman WW, Dowling JN, Wagner MM. Fever detection from free-text clinical records for biosurveillance. J Biomed Inform 2004; 37 (02) 120-127.

Crossref PubMed Google Scholar
22 South BR, Chapman WW, Delisle S, Shen S, Kalp E, Perl T. et al. Optimizing a Syndromic Surveillance Text Classifier for Influenza-like Illness: Does Document Source Matter?. In: Proceedings of the Annual Symposium of American Medical Informatics Association. Washington, DC: 2008. pp 692-696.

Google Scholar
23 Cornet R, De Keizer NF, Abu-Hanna A. A framework for characterizing terminological systems. Methods Inf Med 2006; 45: 253-266.

Article in Thieme Connect PubMed Google Scholar
24 de Keizer NF, Abu-Hanna A. Understanding terminological systems II: terminology and typology. Methods Inf Med 2000; 39: 22-29.

Article in Thieme Connect PubMed Google Scholar
25 de Keizer NF, Abu-Hanna A, Zwetsloot-Schonl JHM. Understanding terminological systems I: terminology and typology. Methods Inf Med 2000; 39: 16-21.

Article in Thieme Connect PubMed Google Scholar
26 Buitelaar P, Cimiano P, Magnini B. Ontology learning from text: method, evaluation and applications. Breuker J, Dieng R, Guarino N, Mantaras RLd, Mizoguchi R, Musen M. editors. Amsterdam, Berlin, Oxford, Tokyo, Washington DC:: IOS Press; 2005

Google Scholar
27 Gomez-Perez A, Manzano-Macho D. An overview of method and tools for ontology learning from texts. The Knowledge Engineering Review 2005; 19 (03) 187-212.

PubMed Google Scholar
28 Caraballo S. Automatic construction of a hyper-nym-labeled noun hierarchy from text. In: Proceedings of the 37th Conference on Computational Linguistics. College Park, MD: 1999. pp 120-126.

Google Scholar
29 Cederberg S, Widdows D. Using LSA and noun coordination information to improve the precision and recall of automatic hyponymy extraction. In: Proceedings of the 7th Conference on Natural Language Learning. Edmonton, Canada: 2003. pp 111-118.

Google Scholar
30 Downey D, Etzioni O, Soderland S, Weld DS. Learning text patterns for Web information extraction and assessment. In: Proceedings of the American Association for Artificial Intelligence Workshop on Adaptive Text Extraction and Mining. San Jose, CA: 2004. pp 50-55.

Google Scholar
31 Hearst MA. Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 12th Conference on Computational Linguistics. Nantes, France: 1992. pp 539-545.

Google Scholar
32 Church KW, Hanks P. Word association norms, mutual information, and lexicography. In: Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics. Vancouver, BC, Canada: 1989. pp 76-83.

Google Scholar
33 Grefenstette G. Sextant: exploring unexplored contexts for semantic extraction from syntactic analysis. In: Proceedings of the 30th annual meeting of the Association for Computational Linguistics. Newark, DE: 1992. pp 324-326.

Google Scholar
34 Grefenstette G. Explorations in automatic thesaurus discovery. Boston, MA: Kluwer Academic Publisher; 1994

Google Scholar
35 Kavalec M, Svatek V. A study on automated relation labeling in ontology learning. In: Buitelaar P, Cimiano P, Magnini B. editors. Ontology Learning from Text: Method, Evaluation and Applications. Amsterdam, Berlin, Oxford, Tokyo, Washington DC: IOS Press; 2005. pp 44-58.

Google Scholar
36 Nenadâ G, Spasiâ I, Ananiadou S. Automatic discovery of term similarities using pattern mining. In: Proceedings of the 2nd International Workshop on Computational Terminology. Taipei, Taiwan: Association for Computational Linguistics; 2002. pp 1-7.

Google Scholar
37 Ryu P-M, Choi K-S. Measuring the specificity of terms for automatic hierarchy construction. In: Proceedings of the European Conference on Artificial Intelligence Workshop on Ontology Learning and Population. Valencia, Spain: 2004

Google Scholar
38 Liu K, Hogan WR, Crowley RS. Natural language processing methods and systems for biomedical ontology learning. J Biomed Inform 2010. In press

PubMed Google Scholar
39 ODIE toolkit.. 2010. Available from: http://bioontology.org/tools/ODIE.html

PubMed Google Scholar
40 Crowley RS, Chavan G, Mitchell K, Liu K, Savova G, Chapman W. et al. ODIE – A workbench for cyclic entity recognition and ontology enrichment. In: Proceedings of the Annual Symp of American Medical Informatics Association. Washington, DC: 2010. Submitted.

Google Scholar
41 Mukherjea S, Sahay S. Discovering biomedical relations utilizing the World-Wide Web. In: Proceedings of the Pacific Symposium on Biocomputing. Maui, HI; 2006. pp 164-175.

Google Scholar
42 Berland M, Charniak E. Finding parts in very large corpora. In: Proceedings of the 37th Conference on Computational Linguistics. College Park, MD; 1999. pp 57-64.

Google Scholar
43 Sundblad H. Automatic acquisition of hyponyms and meronyms from question corpora.. In: Proceedings of the 15th European Conference on Artificial Intelligence. Lyon, France;: 2002

Google Scholar
44 Fiszman M, Rindflesch TC, Kilicoglu H. Integrating a hypernymic proposition interpreter into a semantic processor for biomedical texts. In: Proceedings of the Annual Symposium of the American Medical Informatics Association. Washington, DC: 2003. pp 239-243.

Google Scholar
45 Health Insurance Portability and Accountability Act of 1996. Available from: http://aspe.hhs.gov/admnsimp/pl104191.htm

PubMed
46 National Cancer Institute Thesaurus (NCIT) 2010. Available from: http://ncit.nci.nih.gov

PubMed
47 Mejino JLV, Rubin DL, Brinkley JF. FMA-RadLex: an application ontology of radiological anatomy derived from the Foundational Model of Anatomy reference ontology. In: Proceedings of the Annual Symposium of the American Medical Informatics Association. Washington, DC: 2008. p 465.

Google Scholar
48 Liu K, Chapman W, Hwa R, Crowley RS. Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger. J Am Med Inform Assoc 2007; 14 (05) 641-650.

Crossref PubMed Google Scholar
49 GATE.. June 2010. Available from: http://gate.ac.uk/

PubMed Google Scholar
50 Chapman WW, Dowling JN, Hripcsak G. Evaluation of training with an annotation schema for manual annotation of clinical conditions from emergency department reports. Int J Med Inform 2008; 77 (02) 107-113.

Crossref PubMed Google Scholar
51 Riloff E. Automatically generating extraction patterns from untagged text. In: Proceedings of the 13th National Conference on Artificial Intelligence. Portland, OR; 1996. pp 1044-1049.

Google Scholar
52 Xu R, Morgan A, Das AK, Garber A. Investigation of unsupervised pattern learning techniques for bootstrap construction of a medical treatment lexicon. In: Proceedings of the Workshop on Bio NLP, Boulder; Colorado: 2009. pp 63-70.

Google Scholar
53 Pantel P, Ravich D, Hovy E. Towards terascale knowledge acquisition. In: Proceedings of THe Conference on Computational Linguistics. Barcelona; Spain: 2004. pp 771-777.

Google Scholar
54 Snow R, Jurafsky D, Ng AY. editors. Learning syntactic patterns for automatic hypernym discovery. Cambridge, MA: MIT Press; 2005

Google Scholar
55 Embarek M, Ferret O. Learning patterns for building resources about semantic relations in the medical domain. In: Proceedings of the 6th International Confernce on Language Resources and Evaluation. Marrakech; Morocco: 2008. pp 2006-2012.

Google Scholar

Subscribe to RSS

Share / Bookmark

Effectiveness of Lexico-syntactic Pattern Matching for Ontology Enrichment with Clinical Documents

Publication History

Summary

Keywords

References