AxPEP_Frontend - AxPEP

EcoTox-GCN

Graph Convolutional Neural network models for chemical ecotoxicology prediction.

Overview

Currently, Regulating chemicals to protect the environment based on ecotoxicological assessments is a major challenge. However, experimental ecotoxicity tests are time-consuming and expensive, which underscores the need for accurate prediction methods. In this study, we conducted a comprehensive analysis on the application of machine learning and graph-based learning techniques for the ecotoxicological prediction of chemicals. A total of 84 models were constructed using a combination of three molecular representations (Morgan, MACCS, Mol2vec), six machine learning algorithms (KNN, NB, RF, SVM, XGB, DNN), and five graph neural networks (GAT, GCN, MPNN, Attentive FP and FPGNN). In predicting the ecotoxicity of three aquatic taxonomic groups - fish, crustaceans, and algae - GCN achieved the best performance overall.

Figure 1. Model construction pipeline.

Model Performance

We adopted the scaffold splitting method with splitting ratios of 0.8:0.1:0.1 to divide the dataset into training, validation and test sets. All experiments were repeated 5 times on test set to observe the variability of the results and obtain an accurate measure of model performance through the average ROC-AUC score.

Figure 2. Test performance of fingerprint-based ecotoxicity prediction models. (A) AUC scores of the Morgan-based models. (B) AUC scores of the MACCS-based models. (C) AUC scores of the Mol2vec-based models. (D) AUC scores of the Graph-based models.

Overall, the predictive performance of molecular graph-based models is generally superior to that of ML and DL models based on molecular fingerprints and molecular embeddings. Consequently, using molecular graph-based models allows for more effective capturing of complex relationships and features within chemical structures, leading to improved predictive performance.

Prediction Models

This web server offers three ecotoxicological prediction models, namely A2A for Algae, C2C for Crustaceans, and F2F for Fish. According to our evaluations, the accuracy of these models in terms of AUCROC is 0.989 for Algae, 0.984 for Crustaceans, and 0.982-0.992 for Fish.