Web-based Data Annotation and Query Tools to Enhance Translational Research for Head and Neck Malignancies
Sambit K. Mohanty MD; University of Pittsburgh Medical Center; Amita T. Mistry MD; University of Pittsburgh Medical Center; Harpreet Singh MS; University of Pittsburgh Medical Center; Ashokkumar A. Patel MD; Case Western Reserve University; Jennifer L. Hetrick CTR; University of Pittsburgh Medical Center; Ann M. Egloff PhD; University of Pittsburgh Medical Center; Sharon B. Winters MS; University of Pittsburgh Medical Center; Jennifer R. Grandis MD; University of Pittsburgh Medical Center; Michael J. Becich MD; University of Pittsburgh Medical Center; Anil V. Parwani MD; University of Pittsburgh Medical Center;
Content:
The head and neck neoplasm data annotation and Query tools is a developing bioinformatics driven system to integrate data from various clinical, questionnaire, tumor registry and pathological systems into a single architecture supported by a set of common data elements (CDEs) in order to expedite translational research particularly for the SPORE (Specialized Programs of Research Excellence) program. These systems are designed to facilitate semantic interoperability in the development of data elements, and to make the data flexible for the system, and ultimately, understandable for end-users.
Technology:
This is a Web-based data annotation and query tool designed as part of the Organ Specific Data-warehouse. The annotation system is supported in a three-tiered architecture, and is implemented on an Oracle Application Server on a Compaq DL360 Server running Win2K with SP. Additionally, this application uses the Oracle http server and mod_plsql extensions to generate dynamic pages from the database to the users. The database is the Oracle 9.2.0.1 Enterprise Edition implemented on a SunFire V880 Server running Solaris 2.8.
Design:
The annotation warehouse system is supported in a three-tiered architecture on an Oracle Application Server with the query front ends are rendered in XML from Java Servlets. The various components of this annotation tool include: 1. Common data elements development based upon the College of American Pathologists (CAP) Checklist and North American Association of Central Cancer Registries (NAACR) standards; 2. Data Entry Tool is a portable and flexible Oracle-based data entry device which is an easily mastered, web-based tool; and 3. Data Query Tool allows researchers to search de-identified information within the warehouse through a point and click interface, thus enabling only the selected data elements to be essentially copied into a data mart using a dimensional-modeled structure from the warehouses relational structure.
Results:
Researchers are able to effectively utilize tissue specimens because of the standardized subsets of clinical, pathological, and outcome descriptors developed through the database. Over 6090 cases with more than 9400 pathology case accessions are maintained within the Head and Neck Database with accompanying followup information. The majority of the primary disease cases are from Caucasian males. Head and neck-specific data have had the highest volume of requests from researchers (both SPORE and non-SPORE) among all disease sites in the UPMC Network Registry, and had increased in number every year for the past 4 years.
Conclusion:
Biorepositories, such as the head and neck malignancies virtual biorepository, will act as a central resource through which researchers can identify well-characterized biospecimen samples. The Annotation and Query tool has the power and versatility to provide richly annotated data coupled with outcome analysis to investigators. It will also ensure confidentiality and privacy with de-identified patient data while tissues are only made available to researchers with IRB and Scientific Review Committee approvals. Finally, integrating multimodal data sets in the annotated tissue repository, and creating efficient and specialized query environments will provide the research community with critical information to adequately support their research environments.
