Jiarui Chen, Hong-Hin Cheong, and Shirley Weng In Siu.
xDeep-AcPEP: Deep Learning Method for Anticancer Peptide Activity Prediction based on Convolutional Neural Network and Multi-Task Learning.

Journal xxx, 2020 (under review)

What are Anticancer Peptides (ACP)?

Cancer is one of the leading causes of death worldwide. Conventional cancer treatment relies on radiotherapy and chemotherapy, but both methods bring severe side effects to patients, as these therapies not only attack cancer cells but also damage normal cells. Anticancer peptides (ACPs) are a promising alternative as therapeutic agents that are efficient and selective against tumor cells. Several modes of mechanism of ACPs are known: They attack cancers by disrupting their cell membranes. They penetrate into the mitochondria, causing release of cytochrome C and apoptosis. They may target certain membrane receptors, modulating signal transduction and cell cycle.

AcPEP: Method to Classify ACPs and non-ACPs

(Work in progress)

xDeep-AcPEP: Method to Predict the Biological Activity of ACPs against Cancers

xDeep-AcPEP is a novel regression method based on convolutional neural network and multi-task learning to predict the bioactivity of anticancer peptides. A set of cancer-specific models were trained using the CancerPPD data sets to predict for six tumor cells: breast, colon, cervix, lung, skin, and prostate.

As shown in the workflow figure (Figure 1), we chose the following 4 descriptors to describe a sequence into numerical form: AAINDEX (AAI), BLOSUM62 (BLO), Z-scale descriptor (ZSC) and Binary profile (BIN). The encoder contains two 1D-convolutional layers with ReLU, two average pooling layers, two batch normalization layers and one max pooling layer. The regressor contains three fully connected layers with one final output neuron. We define the applicability domain (AD) of each model to allow estimation of the uncertainty in the prediction for an unknown instance. The Euclidean distance between an instance and the centroid of the training data in the feature space is measured. If the instance is within a pre-defined cutoff (Z), then prediction can be made with confidence.

Figure 1. The development workflow of xDeep-AcPEP.

Model Performance

Using repeated five-fold cross validation, we assessed the performance of our models in a range of AD cutoffs (Z=0.5 to 2.0), i.e. four domains with incremental coverage areas were defined. The results in Figure 2 show:

  1. For all tissue types, there is a trend that the performance of the model improves as the scope of the AD shrinks (decreasing Z).

  2. With AD shrinks, a large amount of data is dropout and may lead to an unstable change in the resulting model (increasing standard deviation).

Switching from Z= 1.0 to Z= 0.5, a large amount of data is dropout that led to a substantial change in the resulting model. We want to find a balance between data coverage and model performance, i.e. we want to include as much data as possible while trying to reduce noisy data or outliners that are affecting the performance. Because of the unstable performance of the AD models using Z= 0.5, we eventually selected 1.0 as the default Z value.

Overall, the optimal models with AD=1.0 achieve an average MSE of 0.24 (-log M) and PCC of 0.74.

Figure 2. Multi-task models with different AD cutoffs.