A STUDY OF PATHOLOGY RESEARCHERS AND THE DATA ELEMENTS AND QUERIES USED IN INTERACTIONS WITH A DATA WAREHOUSE FOR TISSUE BANKING

Kristina T. C. Panizzi, MAE, Masood Siddiqui, MD, Kristopher N. Jones, BS, BA, and Peter G. Anderson, DVM, PhD
Sujin Kim, MS
UPMC Health System
Pittsburgh, PA 15216

Introduction: Tissue banking as a central repository of rich information resources has increased its critical role in the discipline of pathology informatics as well as in other medical disciplines. However, tissue related information has not identified and heterogeneous database have not organized, integrated, and been accessible. Therefore, providing information on who use, what information needed, and which information should be investigated. In addition, examining efficient uses of integrated data warehouse to support various user needs and providing a framework of tissue related resources based on users, queries, and data elements is important.

Objectives: This study is designed: (1) to characterize user groups, user queries, and tissue related data elements; (2) to investigate relationship among users, queries, and data elements; (3) to model dimensional data warehouse for tissue bank; and (4) to investigate relationship between users, queries, data elements, dimensions, facts, level of hierarchies, and pre-computation.

Materials & Methods: A total of 183-research faculty, staff, fellow, and resident from Department of Pathology at UPMC Health Systems will be recruited. A self-reporting survey (Web & paper) and some databases such as Medline, SCI, Mars etc. will be used to collect the participants¡¯ publication records. Oracle SQL/Data dictionary will be used to collect data elements as well as DICOM, TBIS, Mars, CPCTR, CAP & ADASP structured report and GATC data element.

Results: Pilot study was done: (1) to validate four given categories of user, query, and data elements and (2) to refine survey questionnaire and some experimental procedures. It is reported that the initial survey designed to capture individual research interests as well as demographic information is too complex and not suitable for some user group such as non-clinicians.

Conclusions: Once the survey is redesigned, the survey data analysis will be based on publication records, transaction logs, search screens, and content analysis of grant proposal. In publication record analysis, source of identification for publication will be based on a bibliographic database (Medline), Science Citation Index (SCI), and personal record (homepage, post interview). Variables to be collected include, title of the article, MeSH, journal title, and article indexed.