In the past years microarrays have been the most used technology to monitoring the expression of thousands of genes in parallel. The huge amount of data produced by such techniques requires the development of new computational techniques to manage and to evaluate the data, and even to formulate new biological hypotheses. To this purpose, co-clustering methods have been widely used. Since they identify groups of genes that show similar activity patterns under a specific subset of the experimental conditions, co-clustering techniques provide a first insight on gene expression data, and allow one to identify potential transcription modules (genes regulated by the same transcription factors), that are more likely to concern genes that are coexpressed in the same group of conditions.
However, in many applications, distance metrics based only on expression levels fail in capturing biologically meaningful clusters. Indeed, several works proposed to define distance metrics based on different sources of information. As an advantage, additional information could help in resolving ambiguities or in avoiding erroneous linking based on spurious similarities.

We propose a methodology in which a standard expression based co-clustering algorithm is enhanced by sets of constraints which take into account the similarity/dissimilarity (inferred by means of Gene Ontology information) between pairs of genes.
Moreover, deciding an adequate number of clusters is not trivial, and a bad choice may influence negatively the quality of coclustering results. Thus, we adopt a preprocessing method that automatically determines a congruent number of clusters per rows and columns.


Availability

The GOClust is available upon request at

References:

Visconti A., Cordero F., Ienco D., and Pensa R.G., Coclustering under Gene Ontology Derived Constraints for Pathway Identification, Biological Knowledge Discovery Handbook: Preprocessing, Mining and Postprocessing of Biological Data, Mourad Elloumi and Albert Y. Zomaya (Eds.), 2014, John Wiley & Sons, pp. 625-642

Cordero F, Pensa R.G., Visconti A., Ienco D, and Botta M., Ontology-driven Co-clustering of Gene Expression Data In Proceedings of AI*IA 2009: Emergent Perspectives in Artificial Intelligence, XI International Conferences of the Italian Association for Artificial Intelligence - Reggio Emilia, December 9-12, 2009, volume 5883 of Lecture Notes in Artificial Intelligence (LNAI), pag 426-435, Springer ISSN: 0302-9743


Developers:

Visconti A., Ienco D., Cordero F., Pensa R.G.