APIII - Advancing Practice, Instruction & Innovation Through Informatics

Marriott City Center, Pittsburgh, PA | September 20 - 23, 2009

Presented at the 2000 APIII Conference                        Return to 2000 Abstract Index


WEB-BASED FREE-TEXT QUERY SYSTEM FOR SURGICAL PATHOLOGY REPORTS WITH AUTOMATIC CASE DE-IDENTIFICATION

Baltimore VA Medical Center
Baltimore, Maryland
G. William Moore, MD, PhD

Robert E. Miller, MD1, John K. Boitnott, MD1, G. William Moore, MD, PhD1,2,3

1Departments of Pathology, The Johns Hopkins Medical Institutions
2Baltimore VA Maryland Health Care System
3University of Maryland School of Medicine, Baltimore, Maryland

Background: There is increasing interest in inter-institutional sharing of free-text surgical pathology reports. However, it is necessary to de-identify proper names (providers, institutions) that are sometimes included in the native text of such reports.

Design: Free-text surgical pathology reports at The Johns Hopkins Hospital are indexed and available to hospital staff. Proper names in the free-text database were identified either from available lists of persons, places, and institutions, or else by their proximity to keywords, such as 'Dr.' or 'hospital'. The free-text was parsed, and all proper names were substituted with a suitable token, prior to display on the web-based query system.

Results: On June 1, 2000, the Johns Hopkins surgical pathology database index contained 159,071 patients with surgical pathology cases; 361,957 surgical pathology cases; and 694,443 surgical pathology specimens. Age/sex demographics were complete for 99.3% of patients, with 60.1% females and 39.2% males, and a predominance of patients in the fourth (15.3%), fifth (14.2%), sixth (14.7%), and seventh (16.0%) decades. Race/ethnicity data were available on 77.3% of patients, including 50.6% whites and 24.1% African-Americans. In the most recent complete year (1999), there were 29,596 cases and 61,531 specimens. Organ-systems represented in the web-indexed database included: gastrointestinal, 28.7%; lymphoreticular, 15.1%; gynecologic, 14.0%; bone, 7.1%; and breast, 5.8%. There were 23,911 (6.6%) cases containing proper-names, that were tokenized by the de-identification system.

Conclusion: This study demonstrates that free-text surgical pathology reports can be de-identified for proper names (providers, institutions) that are sometimes included in these reports.

Related URL: http://www.netautopsy.org/apep00wb.htm

Search