CYPstrate consists of a collection of machine learning classifiers (random forest and support vector machines) for the prediction of substrates and non-substrates of the nine most important human CYP isozymes in the metabolism of xenobiotics (i.e. CYPs 1A2, 2A6, 2B6, 2C8, 2C9, 2C19, 2D6, 2E1 and 3A4). The models are trained on a high-quality data set of 1831 substrates and non-substrates compiled from public sources.
Two distinct prediction modes are available to cover different use cases (see below). Computation is currently limited to 500,000 compounds per query, which takes approximately 14 hours to calculate.
For more details, see [manuscript to be available soon].
CYPstrate offers two prediction modes:
In best performance mode, for each CYP isozyme, several models are combined by hard voting strategy (majority voting). This approach yields maximum accuracy but some compounds of interest may not be covered.
In full coverage mode the tool uses one classification model per CYP and guarantees full coverage of the input space for all molecules that are successfully preprocessed by the tool. These models still achieve a high prediction performance, but are worse compared to the models described above.
If you are using CYPstrate for your research, please cite all of the following publications:
[Manuscript: available soon]
Stork, C.; Embruch, G.; Šícho, M.; de Bruyn Kops, C.; Chen, Y.;
Svozil, D.; Kirchmair, J. NERDD: a web portal providing access to in
silico tools for drug discovery. Bioinformatics
2020.
doi:10.1093/bioinformatics/btz695