Presented at the 2000 APIII Conference Return to 2000 Abstract Index
A REUSABLE SOFTWARE PLUG-IN ENABLING EXPANSION OF BIOMEDICAL DATABASE QUERIES USING THE UMLS
University
of Alabama at Birmingham
Birmingham, Alabama
Peter G. Anderson, DVM, PhD
Dwain E. Woode, Kristopher N. Jones, Peter G. Anderson
Department of Pathology, University of Alabama at Birmingham
Background: Information retrieval from computerized biomedical information sources (such as MEDLINE and other online databases) is hampered by the lack of a standard vocabulary used by authors, indexers, and searchers. The Unified Medical Language System (UMLS) of the National Library of Medicine is a comprehensive medical vocabulary designed for use by developers in creating software to aid in biomedical information retrieval. The purpose of this project was to create a freestanding online medical thesaurus software component to aid in the retrieval of data from online information databases by expanding user queries to include similar terms.
Design: Using the MetamorphoSys Java extraction tool, Microsoft Access, and text editors, multiple vocabulary subsets of varying size and complexity were selected from the infinite possibilities afforded by the massive UMLS data set (2.8 Gigabytes of data representing over 730,000 concepts). These subsets were imported into tables in a MySQL database—a platform-independent, Web-compatible database system—on a Web server. Database querying, administration, and benchmarking were achieved with tools written in the Practical Extraction and Reporting Language (PERL).
Results: A previously reported subset of UMLS terms useful for describing surgical pathology images was queried against each of the thesauri. Synonyms returned for each term against the test thesauri ranged from a mean of 0.31 (mode and median of 0) to 124.28 (mode=3 and median=49). Weighing search specificity against sensitivity, a thesaurus returning a mean of 39.41 results (mode=2 and median=13) was selected as the most all-purpose dataset. Benchmarks indicated that the query expansion process added less than one second (0.74 seconds [n=20]) to the overall search time.
Conclusion:
The query expansion engine developed for this project
is compatible with almost all Web-accessible search engines.
While adding very little additional time to the user’s
wait, expanded queries offer much more sensitivity in returned
results by including additional search terms similar to
those specified by the user. This thesaurus tool has been
successfully used for expansions against in-house image
databases and existent Web search engines
