AxPEP_Frontend - AxPEP

AxPEP Server

Antimicrobial peptides (AMPs) are promising candidates in the fight against multidrug-resistant pathogens due to its broad range of activities and low ecotoxicology. Some AMPs also display antitumor and antivirus functions making them alternative drug candidates for these important diseases. To facilitate the discovery of AMPs and their functions, we provide this one-stop server for antimicrobial peptide and other activity prediction for unknown sequences. Three methods are currently available:

AmPEP: Predict antimicrobial activity
Deep-AmPEP30: Convolutional neural network model for short sequence <=30 residues
RF-AmPEP30: Random forest model for short sequence <=30 residues
BERT-AmPEP60: Peptide minimum inhibitory concentration (MIC) prediction against E. coli and S. aureas for sequences between 5 and 60 residues.

Our methods and server are in constant development. How often is our server accessed? See statistics page.

Deep-AmPEP30: Short Antimicrobial Peptide Prediction

Short-length AMPs are considered better drug options as they have enhanced antimicrobial activities, higher stability, and lower manufacturing cost. As existing AMP prediction methods often mixing long sequences and short sequences in both the training and validation of the prediction model, we found out that their prediction accurcies are surprisingly low (60-77%) for short AMPs. To meet the needs of short AMP prediction, we developed Deep-AmPEP30. This is a sequence-based classification method using selected types of PseKRAAC reduced amino acids composition as features (see Figure 3) and convolutional neural network as learning algorithm. Deep-AmPEP30 was tuned to optimize the prediction of short AMPs of 30 AA or less in length and tested to achieve good performances in accuracy 83%, AUC-ROC 0.92 and AUC-PR 0.94.

Figure 3. Steps to generate the feature vector of an example peptide sequence using PseKRAAC feature Type 7-Cluster 15.

Figure 4. The architecture of our CNN-based classifier for short AMP prediction.

Reference

Yan, J.; Bhadra, P.; Li, A.; Sethiya, P.; Qin, L.; Tai, H. K.; Wong, K. H.; and Siu, Shirley W. I.*

Deep-AmPEP30: Improve short antimicrobial peptides prediction with deep learning.

Molecular Therapy - Nucleic Acid 2020, 20, 882-894.

AmPEP: Antimicrobial Peptide Prediction

AmPEP is a sequence-based classification method for AMP using random forest. The prediction model is based on the distribution patterns of amino acid properties along the sequence:

Figure 1: Encode a peptide sequence into distribution patterns of 7-type & 3-class of physiochemical properties.

Using our collection of large and diverse set of AMP/non-AMP data (3268/166791 sequences), we evaluated 19 random forest classifiers with different positive:negative data ratios by 10-fold cross-validation. Our optimal model, AmPEP with 1:3 data ratio achieved a very high accuracy of 96%, MCC of 0.9, AUC-ROC of 0.99 and Kappa statistic of 0.9. Descriptor analysis by Pearson correlation coefficients of AMP/non-AMP distributions revealed that reduced feature sets (from full-feature of 105 to minimal-feature of 23) can achieve comparable performance in all aspects except some reductions in precision. Furthermore, AmPEP achieved high performance in terms of AUC-ROC (0.995), AUC-PR (0.957), MCC (0.921) and kappa (0.962) using a benchmark dataset. Our performance is 1-5% better than two published methods iAMPpred and iAMP-2L.

Figure 2: The prediction model of AmPEP is based on random forest (originally implemented in MATLAB, but now in R for online server).

This online prediction model has been reimplemented in R and tested to achieve very close accuracy to our original MATLAB implementation used for publication. If you want to run the MATLAB code yourself, feel free to download it from here. A re-implementation of the AmPEP with Python is also available here.

Reference

Bhadra, P.; Yan, J.; Li, J.; Fong, S.; Siu, Shirley W. I.*

AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest.

Scientific Reports 2018, 8, 1697.

BERT-AmPEP60 - ProtBERT fine-tuned IC50 AMP prediction models for Escherichia coli and Staphylococcus aureus

ProtBERT finetuned AMP regressors: Antimicrobial Peptide Activity Prediction for sequences with 5 to 60 residues

We proposed a deep learning model based on the fine-tuned Bidirectional Encoder Representations from Transformers (BERT) architecture to extract embedding features from input sequences and predict minimum inhibitory concentrations (MICs) for target bacterial species. Using the transfer learning strategy, we built regression models for Escherichia coli (EC) and Staphylococcus aureus (SA) using data curated from DBAASP. In five independent experiments with 10% leave-out sequences as test sets, the optimal EC and SA models achieved an average mean squared error of 0.2664 and 0.7530 (log µM), respectively. They also showed a Pearson’s correlation coefficient of 0.7955 and 0.7530, and a Kendall’s tau coefficient of 0.5797 and 0.5222, respectively.

Figure 5: Overview of the proposed model. Each amino acid sequence is first tokenized for data representation, and the BERT encoder layers derived from the pre-trained ProtBERT model are fine-tuned for the downstream AMP regression task.