Predicting substrates in the CYP mediated phase I metabolism of xenobiotics


1. Prediction modes:

Best performance
Reports predictions from a consensus model (hard voting classifier). Use this prediction mode if the accuracy of predictions is of primary importance and coverage is not a key concern (predictions are only reported for compounds for which a consensus is reached).

Full coverage
Reports predictions from the best-performing single classifier. Use this prediction mode in cases were coverage is an important factor.

2. Molecular similarity:

Enable the calculation of molecular similarity of your query molecules to the nearest neighbor of the training instances for each CYP:

Provide input molecule(s):


Example: CCOC(=O)N1CCN(CC1)C2=C(C(=O)C2=O)N3CCN(CC3)C4=CC=C(C=C4)OC

or upload a file with a list of SMILES
or upload an sdf file
or draw your own molecule


CYPstrate consists of a collection of machine learning classifiers (random forest and support vector machines) for the prediction of substrates and non-substrates of the nine most important human CYP isozymes in the metabolism of xenobiotics (i.e. CYPs 1A2, 2A6, 2B6, 2C8, 2C9, 2C19, 2D6, 2E1 and 3A4). The models are trained on a high-quality data set of 1831 substrates and non-substrates compiled from public sources.

Two distinct prediction modes are available to cover different use cases (see below). Computation is currently limited to 500,000 compounds per query, which takes approximately 14 hours to calculate.

For more details, see [manuscript to be available soon].

Prediction modes

CYPstrate offers two prediction modes:

In best performance mode, for each CYP isozyme, several models are combined by hard voting strategy (majority voting). This approach yields maximum accuracy but some compounds of interest may not be covered.

In full coverage mode the tool uses one classification model per CYP and guarantees full coverage of the input space for all molecules that are successfully preprocessed by the tool. These models still achieve a high prediction performance, but are worse compared to the models described above.

How to cite

If you are using CYPstrate for your research, please cite all of the following publications:

[Manuscript: available soon]

Stork, C.; Embruch, G.; Šícho, M.; de Bruyn Kops, C.; Chen, Y.; Svozil, D.; Kirchmair, J. NERDD: a web portal providing access to in silico tools for drug discovery. Bioinformatics 2020.