2005 Scientific Session Abstracts

Automated Identification and Coding Of Cancer Pathology Reports

Peter Brueckner, MD, MBA. ( pbrueckner@aim.on.ca) Artificial Intelligence Inc Medicine Inc.; Associate Professor, University of Toronto, Toronto, Ontario. (pbrueckner@aim.on.ca)

Context : Cancer pathology reports are required for various purposes: cancer registration, clinical trials, research projects, quality control activities and so on. An automated system, developed by AIM, to identify reports of cancer can provide consistent results in real time with significant reduction in labor compared to a manual process. The performance characteristics of the software and its adaptability to new settings have been determined and the results are presented.

Technology : The system consists of two principal components: a lexical analyzer and a filter. The lexical analyzer operates on unstructured text in one or more sections of the report to identify terms or concepts in relation to a discipline specific lexicon. Modifiers, variations in terminology and ambiguous words or phrases are interpreted to place identified terms in the proper context. Terms in the lexicon are categorized for differential processing through the filter. The filter accepts combinatorial and conditional settings based on both the results of the lexical analysis and other data elements in each report.

Design : Performance has been quantified in terms of sensitivity and specificity of overall filter performance, generally using carefully arbitrated data sets for reference. For any given filter setting, performance is a function of the characteristics of the reports, the lexicon and the filter logic. Improvements are made by making changes to the latter two. Field experience with the system over the past five years has provided a basis for estimates of the rate of improvement with testing/adjustment cycles. Modifications to the system must take into account processing speed to ensure efficient operation.

Results : Where the lexicon is based on a nomenclature associated with a coding system, coding accuracy may be determined and the factors affecting it are being explored. A special application of the coding function will be the automated identification of synoptic data elements to assess conformance with synoptic reporting requirements and to provide operator assistance for the generation of synoptic reports.

Conclusions : Automated identification and coding of pathology reports is feasible and is capable of attaining high levels of accuracy.