br A feature selection and classification framework is offer
A feature selection and classification framework is offered by Valavanis, Pilalis, Georgiadis, Kyrtopoulos, and Chatziioan-nou (2015). The proposed framework uses evolutionary algorithms and Gene Ontology (GO) tree and is applied on 450k human methylation data of breast cancer and B-cell lymphoma.
Le, Uy, Dung, Binh, and Kwon (2013) tried to identify the asso-ciations between diseases and protein complexes. Firstly, a protein complex network is constructed where two protein complexes are connected by using their shared genes. Then, random walk with restart (RWR) algorithm is applied on that network in order to rank the protein complexes based on their relative importance to
known disease protein complexes. The performance of that method is evaluated by the leave-one-out cross-validation method. That method is applied on the breast cancer dataset.
González and Belanche (2013) applied an algorithm for feature selection using Simulated Annealing and discrete multivariate joint Tunicamycin on five public domain Microarray gene expression data samples aiming to find the small subsets of highly relevant genes.
Luque-Baena, Urda, Subirats, Franco, and Jerez (2013) compared between genetic algorithm with constructive neural networks and the classical Stepwise Forward Selection (SFS) algorithm in predict-ing the cancer outcome. Welch t-test filtering method is embedded into the two algorithms. Those two algorithms are applied on six cancer gene expression datasets.
A method for prediction biomarker mining is introduced by Popovic, Sifrim, Pavlopoulos, Moreau, and De Moor (2012). A ge-netic algorithm with a novel fitness function and a bagging-like model averaging scheme is applied on three independent publicly available Microarray datasets for colon cancer; one for training and one for testing and the last one for external validation. Ingenuity Pathway Analysis (IPA) is used as a functional analysis to estimate the biological relevance of the resulting gene signature.
Prostate cancer biomarker genes are identified by Raza and Jaiswal (2013) by constructing a gene regulatory network using two-stage filtering approach t-test and fold-change measure. Af-ter identifying significant genes by using the two filtering meth-ods, Pearson correlation coe cient is used to compute regulatory relationships between the identified genes.
Jirapech-Umpai and Aitken (2005) used genetic algorithm as a wrapper feature selection for predicting gene markers for the leukemia disease. They assessed the performance using a low vari-ance estimation technique and presented an analysis of the pre-dicted genes. they concluded that the choice of feature selection criteria have a significant effect on the classification accuracy.
Ooi and Tan (2003) applied genetic algorithm to the problem of multi-class prediction. A GA-based gene selection scheme is presented to predict the marker gene group, as well as the op-timal group size, which maximized classification success using a maximum likelihood (MLHD) classification method. The GA/MLHD-based approach is applied to The NCI60 gene expression dataset contains the Gene Expression profiles of 64 cancer cell lines. the approach achieved higher classification accuracies than other pub-lished predictive methods on the same multi-class test dataset.
García and Sánchez (2015) presented a two-stage classification model based on combining feature selection with the dissimilarity based representation paradigm. The ReliefF algorithm was used in the first stage to generate a subset of top-ranked genes, whereas, in the second stage, a dissimilarity space formed by the samples of the selected genes was used in constructing a classifier. The perfor-mance of the dissimilarity-based models was analyzed by means of a collection of experiments to classify eight Microarray gene expression datasets using an Artificial Neural Network, a Support Vector Machine and the Fishers linear discriminant classifier built on the gene space, and the same classifiers built on the dissimilar-ity space. The experimental results showed that the dissimilarity-based classifiers outperform the feature-based models.