br Fig Distribution of gene and miRNA counts
Fig. 3. Distribution of gene and miRNA counts in normal and cancer biclusters.
TopGO annotation results of top six ranked cancer-specific biclusters.
BiclusterID GO Term Annotated Expected p
GO:0061324 canonical Wnt signaling pathway involved in positive regulation of cardiac outflow tract cell proliferation 2 0.06 0.00103
Top 20 ranked Tirapazamine and miRNAs related to breast cancer.
Rank Symbol Name PubMed
5 AKAP14 Methyltransferase NA
A-Kinase Anchoring Protein 14
6 OLFML3 Olfactomedin Like 3 NA
7 TMEM164 Transmembrane Protein 164 NA
8 MRPL21 Mitochondrial Ribosomal Protein NA
15 TPCN2 NA NA
16 ALLC Allantoicase NA
17 ACAP1 ArfGAP With Coiled-Coil, Ankyrin NA
‘NA’ indicates that no literature support was found for its involvement in breast cancer.
3.4. Breast cancer-related coding gene-miRNA interactions
Among 529 breast cancer-specific biclusters, 79 biclusters havep value < 0.05. If miRNAs and coding genes were in the same bicluster with some gene-miRNA interactions recorded in the miRTarBase database , we selected these interactions as potential breast cancer-related gene-miRNA interactions. To show most im-portant breast cancer related gene-miRNA interactions, coding genes and miRNAs most frequently appeared in the top 3 most significant biclusters (bicluster numbers 27, 61 and 81) were selected for gene-miRNA interaction analysis, as shown in Fig. 4. Genes BRIP1, FGFR2, PTEN and TP53 are all well-known breast cancer-related genes, and miRNAs has-mir-19b-1, has-mir-138 and hsa-mir-378 have direct
Fig. 4. Gene-miRNA interactions between common genes and miRNAs in top 3 significant breast cancer-related biclusters.
literature support in their relationships with breast cancer [52–54]. Given that all interactions in Fig. 4 are known gene-miRNA interactions in miRTarBase, these interactions most likely play important roles in breast cancer.
3.5. Comparison with other methods
To the best of our knowledge, no biclustering method was directly used for detecting cancer-related coding genes and miRNAs, and their interactions. The commonly used methods can be divided into three categories, single gene, gene module and network based methods. For
Fig. 5. Performance comparison between our method and Endeavour, ToppNet and MGOGP in terms of number of identified genes among the 115 known breast cancer genes.
comparison, we selected one representative method from each of the three categories, i.e., Endeavour  for single gene based method, ToppNet  for network based method, and MGOGP  for gene module based method. These three methods have a similar data source input requirement as our method; for example they all need training and testing gene sets. In order to make these methods more comparable, for Endeavour we chose HPRD and Bio-molecular pathways as the data sources; for MGOGP we used gene sets from the GSEA website and the same expression datasets as input. We fine-tuned parameters of each method and made sure the results were obtained under their best per-formance. For comparison, all four methods used the same datasets, and compared to the same 115 known breast cancer-related genes. More specifically, for all the four methods, we used all the 604 known cancer-related genes as the known cancer genes, and all genes in preprocessed TCGA RNA-Seq expression datasets as the candidate genes. We counted the number of genes in the 115 known breast cancer genes among the top 50 and top 100 genes selected by each method. As shown in Fig. 5, our method detected more known breast cancer genes than other three methods. Furthermore, our method detects not only cancer-related genes but also cancer-related miRNAs, as well as their interactions. In the top ranked 100 genes, there are also 12 miRNAs, of which 10 miRNAs are known breast cancer-related miRNAs. In addition, our method was much faster than any of the other methods.