2005 Scientific Session Abstracts

Information Specification Development for Disease-Specific Tissue Phenotype Annotation

David Aronow, MD, MPH (drdata@ardais.com); Bonnie Zeigler, PhD; Leona Belanger, MA, RN; Kristel Hackett, BS; Sunny Worel MLIS; James Flynn, PhD, Clinical Informatics Group, Ardais Corporation, Lexington, MA

Context: Researchers require high quality annotation to derive the greatest benefit from biospecimens. The annotation must be sufficiently granular, tissue-specific, diagnosis-relevant, and employ standardized information representations. Ardais Corporation was awarded a National Institute of Standards and Technology Advanced Technology Program grant to develop such an annotation system.

Technology: The Clinical Genomics Database Project is comprised of two groups: the clinical informatics group responsible for information and functional requirements, and the software engineering group responsible for toolkit development. These groups have created a metadata-driven system to automatically generate Java code for disease-specific research data collection systems.

Design: Three critical risks were identified by the informatics group:

  • Lack of relevance to researchers
  • Slow response to changing requirements
  • Isolation from the informatics and research communities

These risks were mitigated through:

  • Recruit and cross-train multidiscipline clinical informatics group
  • Current reference materials
  • Appropriately define the granularity of Information Model components
  • Create Information Specifications (I-Spec) in Excel
  • I-Specs definition by independent Curators, with harmonization and external consultant review
  • Empower Curators to manage I-Spec metadata
  • In-group capacity to build I-Spec analyzers and processors
  • Revise Information Model as understanding of the information evolves
  • Participate in professional activities
  • Align vocabulary extensions to standard terminologies
  • Submit vocabulary extensions to standards organizations

Results: Consultant review of 75 Diagnosis Group I-Specs found a tendency to retain low relevance details, while high relevance items were rarely missed. Most reviewers recommended adding 0% to 1% new content and removing 2% to 15% of existing content.

Conclusions: Working in Excel proved successful for creating, reviewing and manipulating specifications containing multiple many-to-many relationships between Information Model components.

Definition of customer needs was not trivial. For example, researchers accepted clinical diagnoses and did not need most details of the diagnostic journey, such as History and Physical Exam.

Using a process of discovery, we allowed the information we sought to inform its representation. For example, Ancillary Data Elements were created as predefined bundles of questions asked only if specific values are entered for primary questions.

Treating diagnostic test concepts as values proved excessively complex to model. We remodeled them as elements, with results as their values. Appropriate Ancillary Data Elements are triggered depending on the results.