Presented at the 2000 APIII Conference Return to 2000 Abstract Index
FULL-TEXT DIAGNOSIS SEARCHING IS A RAPID, LOW-COST TECHNIQUE THAT COULD OBVIATE UP-FRONT AUTOMATED SNOMED CODING
University
of Washington Medical Center
Seattle, Washington
Rodney Schmidt,
MD, PhD
Background: Full-text searching in Microsoft
(MS) SQL Server 7.0 works by creating word usage indices
and searching the indices rather than the text of the diagnoses.
Free-text queries are easily implemented because they are
structured similarly to string search queries with the exception
of different search verbs; the MS Search service automatically
translates queries against the diagnosis table to queries
against the full-text indices.
Design: To evaluate it's usefulness compared to direct string searches, we constructed a test database containing 502,365 diagnoses from mixed surgical, cytology, and autopsy reports. Pre-indexing the diagnoses identified 237,024 unique words at a one-time cost of approximately 5.5 hours of computing time (dual Pentium Pro 200 MHz, 256MB RAM) and 18.8% storage overhead.
Results: String searches executed in 200-250 seconds with little dependence on number of search terms, frequency of occurrence of the terms, or number of retrieved cases. Full-text searches for one term took 0.4 - 120 seconds; the duration was directly proportional to the number of cases retrieved. Full-text searches executed more quickly as the number of required search terms increased; 4-word searches completed in less than 1 second. Multi-word full-text search duration was proportional to the number of cases retrieved and only weakly related to the frequency with which the search terms were used. The MS-supplied full-text search tools include those for adjacency searches where TermA is required to be near TermB. We compared efficiency of case identification using a commercial automated SNOMED coder that incorporates adjacency into the coding algorithm with identification using full-text adjacency searches. In situations where multiple synonymous phrases are associated with a single SNOMED code, full-text queries were easily constructed that OR'd together search criteria for each phrase (this process could be automated). Both techniques identified similar numbers of appropriate cases but both generated false-positive and false-negative results. Full-text searches were acceptably fast even for complex queries combining multiple synonymous phrases.
Conclusion: Adjacency full-text searches can deliver similar case identification functionality to up-front automated SNOMED coding without the overhead of up-front coding and without necessarily requiring maintenance of a formal lexicon of SNOMED phrases.
