APIII - Advancing Practice, Instruction & Innovation Through Informatics

Marriott City Center, Pittsburgh, PA | September 20 - 23, 2009

Grid PACS: A Grid-enabled System for Management and Analysis of Large Image Datasets

http://www.projectmobius.org

Shannon Hastings (langella@bmi.osu.edu) MS; Bioinformatics, Department of Biomedical Informatics, Ohio State University; Scott Oster MS; Bioinformatics, Department of Biomedical Informatics, Ohio State University; Stephen Langella MS; Bioinformatics, Department of Biomedical Informatics, Ohio State University; Tahsin Kurc PhD; Bioinformatics, Department of Biomedical Informatics, Ohio State University; Joel Saltz PhD; Bioinformatics, Department of Biomedical Informatics, Ohio State University;

Content:

Biomedical image data can provide rich information about morphological and functional characteristics of biological systems. This information can facilitate one to better understand disease mechanisms, and predict, explain, and extrapolate potential treatment approaches and outcomes. While advances in data acquisition technologies have improved the resolution and speed at which we can collect 2D, 3D, and time dependent image data, major challenges to more effective use of biomedical imaging in basic and clinical research are the efficient management of large image datasets and image processing workflows and the effective sharing of data and workflows in a collaborative environment. In this poster, we present a Grid-enabled framework, referred to as Grid PACS, that is designed to address the storage, querying, and processing requirements of large-scale image databases in a distributed environment.


Technology:

The distributed data storage and management components of the Grid PACS system were built using the Mobius Project. For distributed image analysis we employed the DataCutter project.


Design:

The Grid PACS system implements a set of core services that extend the traditional image archival and communication systems in the following ways: 1) The architecture supports invocation of procedures on ensembles of images. These procedures can be used to carry out post processing, error correction, image processing, and visualization. The system enables distributed execution of operations across storage and compute nodes. 2) The system makes it possible to employ compute clusters to carry out computationally intensive processing tasks and storage clusters to store large image datasets. An image server can be set up on one node of a cluster or the entire cluster. 3) Application-specific data types (e.g., results of an image analysis workflow) can be registered in the system as a new schema. The system enables on-the-fly creation of distributed databases conforming to a given schema. 4) Data can be stored and accessed at distributed sites. Multiple image servers can be grouped to form a collective that can be queried as if it is located in a central repository.


Results:

N/A


Conclusion:

Using this system, we have developed support for storing and analyzing large ensembles of MRI images (consisting of thousands of images obtained) and large digitized microscopy images (with each image up to a few gigabytes in size). The poster will describe the overall Grid PACS architecture, the implementation of support for MRI and digitized microscopy images, and will present performance results on terabyte scale datasets using a deployment of the system on three PC clusters connected over local- and wide-area networks.


Search