Presented at the 1999 APIII Conference Return to 1999 Abstract Index
JAPANESE LANGUAGE ANNOTATION OF AN INTERNET PATHOLOGY IMAGE ARCHIVE
Baltimore
Veteran's Administration Medical Center
Pathology and Laboratory Medicine Service
Baltimore, Maryland
G. William Moore,
MD, PhD
Daisuke
Nonaka, MD1; G. William Moore, MD, PhD1,2,3; Yoichi Satomura,
MD4
1Department of Pathology, University of Maryland School
of Medicine, Baltimore, Maryland
2Pathology and Laboratory Medicine Service, Veterans Affairs
Maryland Health Care System, Baltimore, Maryland
3Department of Pathology, The Johns Hopkins Medical Institutions,
Baltimore, Maryland
4Department of Medical Informatics, Chiba University School
of Medicine, Chiba, Japan
Background: Anatomic pathology images in a large archive must be recoverable both by pathologic diagnosis and by descriptive content. The Image Archive of The Johns Hopkins Autopsy Resource website (JHAR-IA) consists of over five thousand uncopyrighted anatomic pathology images from the Armed Forces Institute of Pathology Electronic Fascicles (AFIP-EF). The images have been computer-indexed in the Unified Medical Language System (UMLS), based upon corresponding English-language legend-texts. For Japanese speakers who use English as a second language, it is helpful to annotate this text in Japanese, so that images may be recalled by Japanese keywords. Japanese is a particularly challenging language for Internet annotation, since text must be displayed in any of three alphabets (Katakana, Hiragana, Kanji), and the Kanji system is ambiguous and non-phonetic.
Design: All words and UMLS concepts in the pathology image legend-texts of the AFIP-EF posted on the JHAR-IA were pointed to phonetic Japanese transliterations in Katakana. Some words and concepts were pointed to Hiragana words or to Kanji ideograms, displayed using the Shift Japan Industrial Standard (SJIS) font, available on most computers marketed in Japan. Indexing software was written in M-language (formerly, MUMPS), and display software was written in the Practical Extraction and Reporting Language (PERL). Both software systems employed a unique English name to display each Kanji ideogram, as well as phonetic On-readings and Kun-readings, specified by Japanese Government Ministry of Education publications.
Results: There were 5,465 pathology images posted on the JHAR-IA, consisting of 5,364 distinct words and 3,016 distinct UMLS concepts, ranging in frequency from 5,465 occurrences of four UMLS terms to one occurrence apiece of 875 UMLS terms. The Japanese annotations included 632 Kanji terms, each assigned a unique English name.
Conclusion: English is the dominant language of the Internet, but non-native English speakers may need assistance in locating images based upon non-English keywords. The Johns Hopkins Autopsy Resource Image Archive website may be queried on the Internet with either English or Japanese query-words, and bilingual annotations.
Related URL: http://www.netautopsy.org/apep99jp.htm
