APIII - Advancing Practice, Instruction & Innovation Through Informatics

Marriott City Center, Pittsburgh, PA | September 20 - 23, 2009

High-Throughput Analysis of Cancer Tissue Microarrays using a Worldwide Community Grid

Wenjin Chen PhD; The Cancer Institute of New Jersey; David J. Foran PhD; UMDNJ, Cancer Insitute of New Jersey; Michael Feldman PhD; Univ. of Pennsylvania; lin, yang MSEd; Rutgers University, Cancer Insitute of New Jersey; Gratian Salaru MD; UMDNJ;

Content:

Breast cancer accounts for about 30% of all cancers and 15% of cancer deaths in women in the United States. Recent advances in computer-assisted analysis provide promising new directions for investigating the underlying mechanisms of disease progression and for improving prognostic accuracy and therapy planning for certain patient populations. In this paper, we introduce a Grid-enabled decision support prototype for performing automated analysis of imaged breast tissue microarrays. As of this writing, more than 100,000 digitized specimens (1200 * 1200 pixels) have been processed on IBMs World Community Grid as part of the Help Defeat Cancer project that was launched in July, 2006. As part of a retrospective study, our team is investigating the relationship between the growing library of protein expression signatures which are being generated for imaged specimens and the corresponding surgical diagnosis of record, histologic type, tumor grade and response to therapy.

Technology:

The mixed set of breast tissue microarrays used in our experiments were prepared at the Cancer Institute of New Jersey, Yale University, University of Pennsylvania and Imgenex Corporation, San Diego, CA. Immunostained specimens were digitized using a 40X volume scan on a Trestle/CLARiENT MedMicro, whole slide scanner system. Our team has worked out the details of imaging and managing tissue microarrays using both standard robotic and high-throughput whole slide scanners. Software has been developed to automatically delineate the tissue discs comprising the arrays, decompose those discs into their constituent staining maps, package the arrays and staining maps into work units, and process those units on a grid. The total number of computers currently participating in the World Community Grid efforts is approximately 250, 000 worldwide and still growing. Since the start of the project in July, 2006 through April, 2007 approximately 2, 909 years of computation has been executed.

Design:

In order to determine the most salient image features from among the spectrum of possibilities, each breast microarray was represented by a vector in d = 4000 dimensional space, which required the implementation of a non-linear dimension reduction method referred to as ISOMAP. Based on this analysis, each specimen was represented by a feature vector in reduced subspace, where d = 500. Maximal margin classifiers including Support Vector Machines (SVM) and boosting were shown to be appropriate for these experiments since the number of training vectors was comparable with their dimensionality.

Results:

Certificated surgical pathologists provided independent confirmation of all labels for all breast tissues originating from each of the collaborating hospitals and institutions. The dataset used in these preliminary feasibility experiments consisted of 611 normal and 3133 cancer tissue specimens. One at a time, four different algorithms (KNN, Bayesian, soft margin SVM and boosting) were applied to the data 10 times, using different parts of the training images drawn by random sampling. Because there are more positive samples than the negative samples, we obtained higher false positive errors and lower false negative errors than the average error. It was clear from our studies that Gentle Boosting and Soft Margin SVM performed better than the other methods due to the fact that the training set was small. Gentle Boosting implemented with an eight-node CART decision tree produced a classification accuracy of 86.16% even when only 20% of the dataset was used for training.

Conclusion:

In this paper, we have presented a Grid-enabled framework which used image-based features to perform high-throughput analysis of imaged breast cancer tissue microarrays. Gentle Boosting using an eight-node CART decision tree as the weak learner provided the best results during the course of the feasibility studies. While these preliminary results are promising we plan to expand the scope of these experiments to allow a systematic investigation of the usefulness of the reference expression library for performing automated scoring and interpretation of a much larger set of cancer tissue specimens and conduct experiments using a much broader ensemble of antibodies and stains.

Search