Intelligent Computing Lab.
Bioinformatics in NCTU, Taiwan.
S.-Y. Ho*, L.-S. Shu, and J.-H. Chen,
"Intelligent Evolutionary Algorithms for Large Parameter Optimization
Problems," IEEE Trans. Evolutionary Computation, vol. 8, no. 6, pp.
522-541, Dec. 2004. (SCI, EI)(impact factor: 3.688, rank: 3/78, highly cited paper) [Abstract]
Ubiquitylation plays an important role in regulating protein functions. Recently, experimental methods were developed toward effective identification of ubiquitylation sites. To efficiently explore more undiscovered ubiquitylation sites, this study first evaluates promising sequence-based features and classifiers for the prediction of ubiquitylation sites by assessing three kinds of features (amino acid identity, evolutionary information, and physicochemical property) and three classifiers (support vector machine, k-nearest neighbor, and NaAveBayes). Consequently, an informative physicochemical property mining algorithm (IPMA) is proposed to select an informative subset of 531 physicochemical properties. A prediction system UbiPred was implemented by using an SVM with the feature set of 31 informative physicochemical properties selected by IPMA, which can improve the accuracy from 72.19% to 84.44%. UbiPred can predict ubiquitylation sites accompanied with a prediction score each to help biologists in identifying promising sites for experimental verification.
This study proposes an efficient sequence-based method (named ProLoc-GO) by mining informative GO terms for predicting protein subcellular localization. For each protein, BLAST is used to obtain a homology with a known accession number to the protein for retrieving the GO annotation. A novel genetic algorithm based method (named GOmining) combined with a classifier of support vector machine (SVM) is proposed to simultaneously identify a small number m out of the n GO terms as input features to SVM, where m << n. Two existing data sets SCL12 (human protein with 12 locations) and SCL16 (Eukaryotic proteins with 16 locations) with <25% sequence identity are used to evaluate ProLoc-GO which has been implemented by using a single SVM classifier with the m=44 and m=60 informative GO terms, respectively. ProLoc-GO using input sequences yields test accuracies of 88.1% and 83.3% for SCL12 and SCL16, respectively.
Both modeling of antigen-processing pathway including major histocompatibility complex (MHC) binding and immunogenicity prediction of those MHC-binding peptides are essential to develop a computer-aided system of peptide-based vaccine design that is one goal of immunoinformatics. Numerous studies have dealt with modeling the immunogenic pathway but not the intractable problem of immunogenicity prediction due to complex effects of many intrinsic and extrinsic factors. This study proposes a computational method to mine a feature set of informative physicochemical properties from MHC class I binding peptides to design a support vector machine (SVM) based system (named POPI) for the prediction of peptide immunogenicity. POPI, utilizing the m = 23 selected properties, is the first computational system for prediction of peptide immunogenicity based on physicochemical properties.
Accurate prediction methods of protein subnuclear localizations rely on the cooperation between informative features and classifier design. This study proposes an evolutionary support vector machine (ESVM) based classifier with automatic selection from a large set of physicochemical composition (PCC) features to design an accurate system for predicting protein subnuclear localization, named ProLoc. ESVM using an inheritable genetic algorithm combined with SVM can automatically determine the best number m of PCC features and identify m out of 526 PCC features simultaneously. ProLoc utilizing the selected m=33 and 28 PCC features has accuracies of 56.37% for SNL6 and 72.82% for SNL9, which are better than 51.4% for the SVM-based system using k-peptide composition features applied on SNL6, and 64.32% for an optimized evidence-theoretic k-nearest neighbor classifier utilizing pseudo amino acid composition applied on SNL9, respectively.
SODOCK is an optimization algorithm based on particle swarm optimization (PSO) for solving flexible protein-ligand docking problems. PSO is a population-based search algorithm. It is very simple and efficient. At present, SODOCK is cooperated with the environment of AutoDock 3.05. The computer simulation results shows that SODOCK is superior to the default optimization algorithm of AutoDock, Lamarckain genetic algorithm (LGA), in terms of docked energy and convergence performance.