Data Management and Exchange in the Cancer Grid
http://bmi.osu.edu/areas_and_projects/cabig
Scott Oster (oster@bmi.osu.edu) MS; Bioinformatics, Ohio State University; Shannon Hastings MS; Bioinformatics, Ohio State University; Stephen Langella MS; Bioinformatics, Ohio State University; Tahsin Kurc PhD; Bioinformatics, Ohio State University; Joel Saltz MD, PhD; Bioinformatics, Ohio State University;Content:
The National Cancer Institute has recently started an initiative, called the cancer Biomedical Informatics Grid (caBIG) (http://cabig.nci.nih.gov), to develop applications and the underlying system architecture that will facilitate multi-institutional collaborative studies. The caBIG will bring together numerous disparate data sources, each with many different data types. It can be anticipated that some data types will overlap in structure and semantics and will evolve over time. Thus, standard mechanisms are required to manage the structural definition of datasets and the way data definitions and corresponding data are exchanged. In this abstract, we describe a framework, Mobius, designed to support management of metadata definitions and metadata instances in a Grid environment.
Technology:
The caBIG is envisioned to be heavily model-driven. That is, all data objects will be defined in a series of interrelated object models, as defined by the application domains using controlled vocabularies and common data elements. These models will then be augmented with semantic information and models. The NCI caBIO, caDSR, EVS are potential viable solutions to address these issues. However, a unifying system is needed to enable the discovery, exchange, and management of these models in a distributed environment, and on-demand creation of databases from these models. In the caBIG Architecture Workspace group, there is agreement that XML is a viable approach as the data and metadata exchange format. Furthermore, XML data instances should conform to Grid published XML schema definitions. Mobius is a framework of services and protocols that collectively provide support for distributed management and integration of both data definitions (via XML schemas) and data instances (as databases of XML instances).
Design:
Mobius consists of three core services: Global Model Exchange (GME), Data Translation Service (DTS), and Metadata and Data Instance Management (Mako). Its design is motivated by the requirements of Grid-wide data access and integration and by our interactions with OGSA-DAIS and OGSA-DAI groups (www.ogsadai.org.uk). The GME is a distributed service that provides a protocol for publishing, versioning, and discovering XML schemas. It provides the ability for inserted schemas that reference entities already existing in other schemas and in the global schema defined by a researcher. The DTS provides schema-to-schema translation of instance data, enabling otherwise data-incompatible service interaction. This enables data definitions and Grid services to evolve over time, while still maintaining baseline interoperability. Mako is a distributed data storage service that provides users the ability to create on demand databases, store instance data, query instance data, and organize instance data into collections. Mako exposes data resources as XML data services through a set of well-defined interfaces. Mobius utilizes a well-defined XML-based protocol as its base, enabling alternate service definitions, and interfaces in Java, enabling alternate service implementations.
Results:
Mobius has been used in implementation of several distributed applications. The Mobius team, the caBIG caGRID team, and the OGSA-DAI team are currently developing a comprehensive plan of attack to develop an integrated prototype.
Conclusion:
Efficient integration of data in a Grid environment will be key to successful collaborations between disparate groups of cancer researchers. We expect that frameworks, such as Mobius, that build on standards and offer services and protocols for distributed management of data and metadata definitions will play a significant role achieving this goal.
