Presented at the 1999 APIII Conference Return to 1999 Abstract Index
AUTOMATIC INDEXING OF A PATHOLOGY IMAGE ARCHIVE USING UMLS
Baltimore Veteran's
Administration Medical Center
Pathology and Laboratory Medicine Service
Baltimore, Maryland
G. William Moore,
MD, PhD
G. William Moore,
MD, PhD1,2,3; David S. Brenner, MD2; Jules J. Berman, PhD,
MD3,4
1Pathology and Laboratory Medicine Service, Veterans Affairs
Maryland Health Care System, Baltimore, Maryland
2Department of Pathology, University of Maryland School
of Medicine, Baltimore, Maryland
3Department of Pathology, The Johns Hopkins Medical Institutions,
Baltimore, Maryland
4Resources Development Branch, National Cancer Institute,
National Institutes of Health, Bethesda, Maryland
Background: The value of any large image archive resides in the ability to select and retrieve images based upon features of interest in the images. Images can be automatically encoded from descriptive text (image-legends), into concept codes of the Unified Medical Language System (UMLS) of the U. S. National Library of Medicine. The technique permits powerful image categorization and retrieval, and is generalizable to image archives of enormous size.
Design: A collection of 5,465 pathology image-legends was encoded into UMLS concept codes, via a computer translation program that parses and maps plain-text image-legends into lists of UMLS terms. Indexing software was written in M-language (formerly, MUMPS), and display software was written in the Practical Extraction and Reporting Language (PERL).
Results: Each image-legend yielded an average of 15.6 index-terms per legend, ranging in frequency from five terms in the least-indexed legend to 58 terms in the most-indexed legend. The program assigned 3,016 distinct UMLS concepts to the entire image-legend text-file. Of the 3,016 concepts, 875 were assigned uniquely to a single image-legend; the remaining 2,141 concepts were each assigned to multiple image-legends. In a manual survey of image-legends, 3.1% of UMLS concepts which should have been assigned by the system were not (false-negative rate).
Conclusion: Since the image-legends contain image descriptors (e.g., eosinophilic, small blue, green, electron microscopy), as well as pathologic terms (i.e., body-site and disease names), this UMLS index can be used to retrieve images by a wide variety of concepts. Since UMLS includes over one million internal links among synonymous and related terms, image retrieval via a UMLS-encoded index may succeed, even when a chosen query term is not included in the image-legend.
Related URL: http://www.netautopsy.org/apep99im.htm
