APIII - Advancing Practice, Instruction & Innovation Through Informatics

Marriott City Center, Pittsburgh, PA | September 20 - 23, 2009

Presented at the 2000 APIII Conference                        Return to 2000 Abstract Index


FULL-TEXT DIAGNOSIS SEARCHING IS A RAPID, LOW-COST TECHNIQUE THAT COULD OBVIATE UP-FRONT AUTOMATED SNOMED CODING

University of Washington Medical Center
Seattle, Washington
Rodney Schmidt, MD, PhD

Background: Full-text searching in Microsoft (MS) SQL Server 7.0 works by creating word usage indices and searching the indices rather than the text of the diagnoses. Free-text queries are easily implemented because they are structured similarly to string search queries with the exception of different search verbs; the MS Search service automatically translates queries against the diagnosis table to queries against the full-text indices.

Design: To evaluate it's usefulness compared to direct string searches, we constructed a test database containing 502,365 diagnoses from mixed surgical, cytology, and autopsy reports. Pre-indexing the diagnoses identified 237,024 unique words at a one-time cost of approximately 5.5 hours of computing time (dual Pentium Pro 200 MHz, 256MB RAM) and 18.8% storage overhead.

Results: String searches executed in 200-250 seconds with little dependence on number of search terms, frequency of occurrence of the terms, or number of retrieved cases. Full-text searches for one term took 0.4 - 120 seconds; the duration was directly proportional to the number of cases retrieved. Full-text searches executed more quickly as the number of required search terms increased; 4-word searches completed in less than 1 second. Multi-word full-text search duration was proportional to the number of cases retrieved and only weakly related to the frequency with which the search terms were used. The MS-supplied full-text search tools include those for adjacency searches where TermA is required to be near TermB. We compared efficiency of case identification using a commercial automated SNOMED coder that incorporates adjacency into the coding algorithm with identification using full-text adjacency searches. In situations where multiple synonymous phrases are associated with a single SNOMED code, full-text queries were easily constructed that OR'd together search criteria for each phrase (this process could be automated). Both techniques identified similar numbers of appropriate cases but both generated false-positive and false-negative results. Full-text searches were acceptably fast even for complex queries combining multiple synonymous phrases.

Conclusion: Adjacency full-text searches can deliver similar case identification functionality to up-front automated SNOMED coding without the overhead of up-front coding and without necessarily requiring maintenance of a formal lexicon of SNOMED phrases.

Search