USP 30 library
Key steps:
- Removal of compounds with undesirable properties:
- Substructure filters for removal of PAINS [1].
- Filters using non MedChem-friendly (REOS) SMARTS [2].
- Virtual screening guided by Machine learning models and docking using AutoDock Vina:
- Machine learning models on ECFP fingerprints built using USP30 activity data from ChEMBL32 database – select 10-20K
compounds.
- Use Qed [3] and diversity picking (MaxMin algorithm [4]) to screen (docking) diverse and higher quality structures
- Docking 7K best compounds using AutoDock Vina.
- Final choice based on estimated energy and ligand efficiency (LE)
Discovery of non-covalent USP30-inhibitors with novel chemotypes.
Key decisions/detail:
l All available ChemDiv small molecule library 1.6M was filtered for PAINS, REOS and MedChem (including Lipinski Ro5) resulting in 950K structures
l A Machine learning model for initial screening was built using activity data for USP30 from ChEMBL32: Logistic Regression model build on 13 descriptors (including e.g. cLogP, NumHDonors, NumHAcceptors, MolWt, MaxPartialCharge, topological index Kappa3), appeared most significant for the CatBoost model (using all the available descriptors) built on the training set (and likely overfitted).
l Logistic regression model was used to filter 100K structures with highest probability of being USP30 active l The structures with Qed [1] > 0.5 were used for further processing
l Finally a 10K set of the most diverse compound was chosen using MaxMin algorithm (ECFP2048)
l Structure based screening
l Two PDB structures were chosen for modeling
l 5OHP with diubiquitin substrate and catalytic C77A replacement l 8D0A with a small molecule covalent inhibitor (released in 2023)
l Structures appreciably differ in the loops conformations defining binding pocket
l AutoDock Vina docking was done to each of the receptor models
l Finally compounds that reveal binding (estimated by Vina) with <7 kcal/mol and having LE >= 0.3 for at least one receptor
model were retained, resulting in 2680 prospective compounds