Ion Channels Targeted Library
Medicinal and Computational Chemistry Dept., ChemDiv, Inc., 6605 Nancy Ridge Drive, San Diego, CA 92121 USA, Service: +1 877 ChemDiv, Tel: +1 858-794-4860, Fax: +1 858-794-4931, Email: [email protected]
Due to poor efficiency of the mass random bioscreening concept in drug discovery, the current paradigm holds that target-specific properties of small-molecule compound libraries must be addressed as early as possible. In general, the existing target and ligand structure-based technologies cannot adequately address all the problems of rational drug design, particularly those connected with virtual screening of large compound databases for novel active chemotypes. An alternative design for target-specific libraries is based on the similarity of molecular physicochemical properties of active compounds for certain protein families. We applied this approach in the design of our ion-channel (IC) focused library using several neural network (NN) QSAR methods, particularly Kohonen and Sammon maps for data analysis and visualization.
1. Ion channels as promising drug targets
Modulation of ion transmembrane channels is the basis of therapy for a variety of illnesses. The development of drugs that modulate the entry of ions into cells may provide clinically significant benefits in the treatment of cardiovascular diseases, cerebral and peripheral vascular disorders, male and female sexual dysfunctions, diabetes, asthma, drug-induced ulcers of the gastrointestinal tract, epilepsy, and several types of neuropathic pain.
Voltage-gated ion channels (VGICs) play an important role in numerous cell types and occur as large families of related genes with cell-specific expression patterns. VGICs are transmembrane proteins that mediate the influx of ions (Ca2+, Na+, K+) in response to membrane depolarization and thereby initiate multiple cellular activities . The phylogenetic trees of the VGIC families and subfamilies can be identified from the several databases . Several drugs already marketed have generated substantial revenues.
Considering the publication of the human genome and the progress in transcription profiling revealing the tissue specific distributions of ion channels, they will play a much more important role as therapeutic drug targets in the future. These advancements combined with cell-based assays providing biologically relevant information to genomic and proteomic information will make ion channels a favorable class of selective, tissue specific, drug targets.
2. Potassium channels openers
Potassium channels openers (PCOs) are encoded by a substantial multi-gene family in higher organisms. PCOs are a structurally heterogeneous group of compounds that relax vascular smooth muscle and reduce cardiac muscle contractivity by increasing membrane conductance to potassium1. PCOs have therapeutic potential in a number of disease states: hypertension, irritable bladder syndrome, male and female sexual dysfunctions, diabetes, asthma, drug-induced ulcers of gastrointestinal tract, and cardioprotectants in ischemic heart disease . Blockage of the HERG potassium channel can lead in rare cases to drug-induced arrhythmias, which has led to a number of drugs being withdrawn from the market.
PCOs share a common structure of a tetramer of four alpha-subunits, each contributing one P-domain to an ion selective pore (Fig. 1). The properties of the channel can be modified by auxiliary subunits. The human genome contains 77 genes encoding pore-forming subunits. The genome of the fruit-fly Drosophila melanogaster contains 21, but surprisingly the simple nematode worm Caenorhabditis elegans has 65.
The transmembrane pore of K+ channels is composed of four identical subunits, of which two are shown. The ion pathway contains a narrow selectivity filter (yellow) and a wide central cavity (asterisk). Three helical elements include the outer helix (M1), pore helix (P), and inner helix (M2). The gate is formed by the inner helix bundle.
Analysis of the X-ray crystallography data showed that the potassium channel from S. lividans is shaped like a cone or “inverted teepee.” According to the published data, the structure helps explain one of the great biophysical mysteries – the chemical nature of the pore’s main ion conduction pathway. Potassium ions are normally surrounded by water. When they slip into the channel, the potassium ions shed water. In order for this to happen, the pore must offer a surrogate for water. Ion discrimination takes place in a region of the pore called the selectivity filter. This area is called a filter because it is narrower than the rest of the channel. When a potassium ion enters the channel, water gets transported out. Oxygen atoms from the protein then surround the ion, making it more stable. Scientists have also wondered why the sodium ion, which is smaller than the potassium ion, does not jump into the potassium channel. Again, the structure may provide insight. There is a suggestion that the selectivity filter, which is held in a very precise conformation, is more tuned for the larger potassium ion.
Phylogenomics uses phylogenetic trees to visualize the relationships between gene family members in genomes. Using the trees it is possible to predict the properties of uncharacterised family members and see how the families might have evolved in the different organisms. Thus, phylogenetic trees constructed for K+ transporters are depicted in Fig. 2 and Fig. 3. The first figure shows a tree of all K+ transporters obtained from A. thaliana. They has five major branches: a) KUP/HAK/KT transporters (13 genes), b) Trk/HKT transporters (1 gene), c) KCO (2P/4TM) K+ channels (6 genes), d) Shaker-type (1P/6TM) K+ channels (9 genes), and e) K+/H+ antiporter homologues (6 genes). Predicted membrane topologies for each branch are shown in the figure. The apparent absence of K+ channels of the 2P/8TM family is remarkable as is the diversity in the AtKUP/HAK/KT transporters. Proteins for which a complete cDNA sequence is available are indicated by bold letters and lines. AGI genome codes are given except for AtKUP3, AtKUP4, AtHAK5, AtHKT1, GORK, KAT2 and AKT2 (GenBank accessions) because of errors in the sequences predicted by AGI. Programs used were HMMTOP  for topology predictions of the KEA and AtKUP/HAK/KT families, ClustalX  for alignments, and tree-view  for graphical output.
A non-rooted tree (Fig. 3) reflects the structural and functional properties of A. thaliana K+ channels. The two major branches are the 2P/4TM-type and the 1P/6TM (Shaker)-type channels, as depicted by the sketches. For KAT1 the proposed topology has been confirmed experimentally . The 1P/6TM (Shaker-type) channels are further subdivided into the depolarization-activated GORK and SKOR and the KATs and AKTs. All 1P/6TM channels possess a putative cyclic nucleotide-binding site (CNB), and AKT channels also have an ankyrin repeat consensus site (AR, see sketches). P-loops are labeled with asterisks. Proteins for which a complete cDNA sequence is available are indicated by bold letters and lines.
More than 1K known potassium channel openers (PCOs) were used as a reference database. All compounds were selected from the Ensemble database  which is a licensed, and contains known pharmaceutical agents compiled from the patent and scientific literature. This database was used as a source of structural information in the stage of morphing active molecules. Several representative examples of active PCOs were entered into preclinical trials and used as compound prototypes (Fig. 4).
3. Sodium channels
Sodium channels (SCs) play an important role in the neural network by transmitting electrical impulses rapidly throughout cells and cell networks, thereby coordinating higher processes ranging from locomotion to cognition . These channels are large transmembrane proteins, which are able to switch between different states to enable selective permeability for sodium ions. For this process, a potential action is needed to depolarize the membrane, and hence these channels are voltage-gated. The voltage-gated sodium channels could be targeted, either selectively or in combination with other cellular processes, for the treatment of stroke, epilepsy, and several types of neuropathic pain.
More than 190 known sodium channel inhibitors (SCIs) were used as a reference database. All compounds were selected from the Prous Science Integrity Database (see ref above) which is a licensed database of known pharmaceutical agents compiled from the patent and scientific literature. This database was used as a source of structural information in the stage of morphing active molecules. Representative examples of active SCIs entered into clinical trials and used as compound-prototypes are shown in Fig. 5.
4. Voltage-dependent calcium channels
Voltage-dependent calcium channels (VDCCs) were first identified in crustacean muscle by Paul Fatt and Bernard Katz (1953). These muscles showed action potentials in the absence of external Na+ that were dependent on calcium (Ca2+) entry. The first VDCC to be cloned was α1S (skeletal), following purification of the DHP receptor from skeletal muscle; it is concentrated in the T tubules, providing a rich source of starting material . The purified oligomeric complex from muscle consisted of five proteins: α1 (~200kD), α2 (~140 kD), β (~50 kD), δ (~20 kD) and γ (~30 kD). cDNA clones were gathered using primers obtained from the amino acid sequence of proteolytic fragments of the individual proteins. The skeletal muscle α1S subunit is unique in its properties; it activates slowly with relatively large gating charge movements. In addition to the skeletal muscle α1S subunit, three further subtypes of L type calcium channel α1 subunit have been identified: C, D and F, which form the SCDF family of calcium channels. Another VDCC gene, termed α1F, has been identified whose expression is restricted to the retina. On the basis of homology, it is also thought to encode an L type channel and a mutation in this gene has been identified to be responsible for one type of congenital night blindness . In addition, there are also the ABE family of calcium channels composed of several subtypes, including the neuron-specific B type clone (α1B), α1A and α1E, and the GHI family . For example, voltage-dependent presynaptic inhibition can be reconstituted with cloned and expressed calcium channels; it is shown by all three of the first subfamily of channels, with α1B showing the greatest ability to be modulated by G proteins and α1E the least. There are many different ion channels that fall under the general umbrella of non-selective cation channels. This simply means that they show little selectivity for Ca2+, K+ or Na+. The ion flux that occurs depends on the membrane potential and the concentration of each ion on either side of the membrane.
Ca2+ is generally present at a concentration of a few mM in the extracellular space, but inside the cell, the cytoplasmic concentration is about 0.1 μM. This is kept low by a number of different pumps and buffering systems, as well as the general impermeability of the plasma membrane to the entry of Ca2+. VDCCs have subsequently been found in all types of excitable cell: vertebrates, invertebrates, and even plants. They fulfill numerous functions depending on the tissue and is thus, not surprising that a number of subclasses of VDCC have been identified. Examination of the biophysical properties of VDCCs required the advent of voltage-clamp and subsequently patch clamp technology. VDCCs are normally closed at resting membrane potentials and open upon depolarization, due partly of the channel structure sensing the change in transmembrane voltage. The resultant current through the cell membrane can be characterized by a number of properties, including the membrane potential range over which the channel opens and the kinetics or time-dependent properties of the current. Different single channel currents can also be identified with varying properties; the task of matching these single channel types with the currents observed in entire cells is a difficult one, but has been made easier by the cloning of the cDNAs for a number of VDCCs and the use of selective drugs and toxins to identify specific current components that correspond to particular channel types.
Low threshold and high threshold voltage-gated calcium channels
In a number of tissues, including certain cardiac muscle cells, neurons, and other excitable cells, it became apparent that there are two types of calcium current. One is activated by small depolarizations and shows rapid voltage-dependent inactivation; this is termed low voltage-activated (LVA), or T for transient. This type is sensitive to dihydropyridines. The second is activated by large depolarizations and is termed high voltage-activated (HVA). They respond for the contraction of smooth, skeletal, cardiac muscle and mediate hormone release. The single calcium channels underlying these currents are also clearly distinct, T type channels being of small conductance (5-9 pS in 110 mM Ba2+) and show rapid inactivation during a voltage step, whereas HVA channels are of larger conductance (13-24 pS) . HVA currents have been further subdivided; in skeletal and cardiac muscle, the HVA current was termed L for long lasting, and was found to be sensitive to a number of calcium channel antagonist drugs including the 1,4dihydropyridines (DHPs) such as nifedipine, phenylalkylamines, and benzothiazepines. Furthermore, L type current could be enhanced by another drug in the DHP class, called BayK8644, which has proved very useful as a diagnostic tool for the presence of L type channels. Subsequent studies by Tsien and colleagues  in sensory neurons showed the presence not only of L-type currents, but also of a second HVA component of current that was termed N (for neuronal). This was found to have an intermediate single channel conductance (13-18 pS) and was not sensitive to DHPs but was irreversibly inhibited by ω-conotoxin GVIA (ω-CTX GVIA), a peptide toxin from the cone shell mollusc Conus geographus. These channels mediate neurotransmitter release at some synapses and represent a specific target for various neurotransmitters and hormones. Another subgroup of calcium currents, insensitive to both ω-CTX GVIA and DHPs, has now been reported in many tissues, indicating the presence of further current components. An extreme example is the cerebellar Purkinje cell, where only a small proportion of the calcium current corresponds to N and L current, and the major calcium current in these cells has been termed P type. A selective blocker for the Purkinje cell calcium current has been found in a peptide toxin from the venom of the American funnel web spider Agelenopsis aperta, called ω-Agatoxin IVA (ω-Aga IVA). At higher concentrations it also blocks a current component that has been termed Q (the letter after P), although the distinction between P and Q current is not always clear. In many neurons, despite the application of all three blockers, there often remains a substantial proportion that cannot be classified as L, N or P/Q; this residual current has been termed R (for resistant). Thus in native neurons and other cell types, biophysical properties and selective drugs and toxins allow the identification of 5 distinct current components: T, L, N, P/Q and R. The more recent challenge has been to marry these components with the recently cloned VDCC classes.
Channels that can be classed as non-selective cation channels encompass a number of receptor channels, including nicotinic, 5HT3 receptors, the glutamate receptor subclasses termed NMDA and AMPA receptors. These receptors are all thought to form pentamers of subunits, each of which has 4 transmembrane segments and an extracellular N and C terminal (Fig. 6). While this structure is well accepted for the nicotinic acetylcholine receptor, it is not proven for all members of the group. For example, the ATP (P2X) receptor channels appear to have two transmembrane segments and a P region. Of interest, certain subtypes of receptor channels are more Ca2+-permeable than others; their temporal or tissue-specific expression may play a role in a number of functional switches, for example during development . Other non-selective cation channels include trp channels, which have the putative topology of 6 transmembrane α helices and a P loop between S5 and S6 .
The inositol trisphosphate and ryanodine receptors are present in the membranes of the endoplasmic and sarcoplasmic reticulum, and are involved in the release of Ca2+ into the cytoplasm from these intracellular stores . They have a very similar structure, each consisting of 4 subunits, with an estimated 12 transmembrane segments at the C terminal end and a very large 8 cytoplasmic N terminal domain. This forms a vestibule for drug binding and allosteric effects associated with Ca2+-dependent Ca2+ release.
The trigger that opens this channel is IP3, generated by the activation of receptors that stimulate phospholipase C (PLC). The IP3 receptor is thus an integral part of a number of pathways involving G protein coupled receptors, linked to the Gq/11 subclass of GTP binding protein and stimulating PLCβ, or growth factor receptors coupled to PLCγ. Their effect is to increase cytoplasmic Ca2+ from intracellular stores via elevation of IP3, rather than by direct entry across the plasma membrane. At least three IP3 receptor isoforms are known.
The skeletal muscle ryanodine receptor (RyR1) is one of the largest cloned proteins. Each monomer has over 5000 amino acids; thus, the tetrameric channel has a molecular weight of over 2 million. There are also two other ryanodine receptor isoforms (RyR2 and 3) in cardiac muscle, brain, and other tissues. In skeletal muscle, ryanodine receptors are activated by direct mechanical coupling to skeletal muscle L type Ca2+ channels brought about by juxtaposition of the T tubules and the sarcoplasmic reticulum; this causes Ca2+ release from the sarcoplasmic reticulum without prior Ca2+ entry through the L type channels.
5. The core approach and computational methods for IC-targeted library design
In the present study we have effectively applied several advanced methods for in silico evaluation of the specific activity of compounds against particular voltage-gated ion channels (calcium, potassium, and sodium channels).
In the new millennium, pharmaceutical drug discovery is undergoing tremendous changes due to progress in genome research, massive advent of combinatorial synthesis, and high-throughput biological screening. Although these important modern technologies now provide incredible opportunities to pharmaceutical researchers, there are some serious problems associated with the effect of combinatorial explosion. The costs of high-throughput screening or parallel synthesis per one sample may be very low, but they become fairly expensive when multiplied by millions of compounds. Moreover, several papers report that the large number of compounds synthesized and screened did not result in the increase in viable drug candidates ; therefore, there is vital need for development of special technologies for making combinatorial synthesis and library design cost-effective.
The main objective of a rational library design is selection of synthetic candidates that possess desirable properties. The “corner stones” of this process are depicted in Fig. 7. Initially, efforts were focused on maximizing diversity , sometimes with the introduction of biased pharmacophoric structural motifs. A medicinal chemistry component has subsequently been introduced, resulting in drug and lead-like libraries reflecting the need for soluble molecules with the optimized in vitro pharmacokinetic profile. Further interest in a concise screening campaign yielded biased libraries that are focused against a single biological target or a family of related targets (ICs, kinases, GPCRs, NHRs and so forth).
Various ligand and target structure-based design strategies can be implemented in focused library design when a set of known active ligands or 3D structure of the target are available. Additional design elements include cost, synthetic feasibility, physicochemical, PKPD and toxicity properties. These parameters are taken into account by the knowledge-based approaches when relevant experimental and calculated information empowers knowledge-oriented process of rational library design. Moreover, modern computational approaches allow for a simultaneous optimization of several variables. These allow a library designer to 1) control the relative significance of various objectives and 2) intelligently select compounds for the synthesis.
Currently, several advanced computational approaches are widely used to compose rational selecting molecule libraries for synthesis and for further biological evaluation. Specifically, the following conceptually and algorithmically diverse methods can be effectively applied:
- ligand structure-based design;
- target structure-based approaches;
- chemogenomics approaches;
- design based on special data mining algorithms;
- optimization of ADME/Tox properties.
The current study rationally implicates several approaches within the title multi-step design conception.
5.1. Ligand structure-based design
Historically, ligand structure-based design is the most widely used approach to the design of target-directed chemical libraries. Methods that start from hits or leads are among the most diverse, ranging from 2D substructure search and similarity-based techniques to analysis of 3D pharmacophores and molecular interaction fields (Fig. 8).
Specific structural fragments of biologically active molecules can be used as the core elements for generating targeted libraries. The most straightforward approach is related to 2D substructure search for analogs of known ligands . These “privileged” substructures  have been applied successfully in the framework of ligand-based strategy. Target-directed libraries based on privileged substructures can be effectively designed without any prior knowledge on the structure of endogenous ligand, which in turn means that even orphan receptors can be addressed as potential drug targets . Limitations of this approach include rather restricted availability of privileged substructures for known target families and related IP issues.
Another group of methods address molecular similarity . Similarity methods include two independent aspects: representation of molecules and assessment of their similarity. For example, calculation of 2D molecular fingerprints similarity represents relatively simple yet practical library design principle; it is frequently used to select molecules that have diverse structures but similar activity . Alternatively, individual library compounds are represented by Kier-Hall topological descriptors and molecular similarities between compounds. These are evaluated quantitatively by modified pair wise Euclidean distances in multidimensional descriptor space . This method, called Focus-2D, represents a useful approach to rational design of targeted combinatorial libraries.
Going beyond analysis of 2D structural representation—virtual libraries can be searched using 3D molecular queries . 3D Pharmacophore fingerprints detect the presence of pre-defined pharmacophores in a molecule using a systematic conformational search . Researchers at Tripos have developed topomer-shape similarity searching, an algorithm that identifies similar compounds by comparing steric interactions between a given query and molecules in a virtual library . This patented technology can effectively generate target-specific libraries around the known ligands used as input queries.
The computational ligand-based strategies are currently progressing to advanced field-fit based methods. In general, such methods remain indispensable in those cases where the structure of binding site of the target protein is unknown.
5.2. Target structure-based approaches
Due to the rapidly increasing availability of structures of target proteins which can be used as templates for virtual screening, combinatorial synthesis and target structure-based design have begun to converge in the process of drug discovery. Many lead generation programs include analysis of X-ray structures of therapeutic biotargets to prioritize compounds for high-throughput screening or to establish a tractable collection for lower throughput assays . A natural trend recognized in the past few years is the application of similar techniques for increasing the likelihood of including active compounds in a focused combinatorial library. There are many examples from literature in which combinatorial library synthesis successfully complemented structure-based design techniques in drug discovery .
In the past few years, we witnessed a rapid progress in development of powerful computational technologies, which combine elements of structure-based design and combinatorial chemistry . Computational programs developed on the basis of these approaches generally start from a synthetically accessible combinatorial template that is complimentary to a target binding site. A database of available building blocks for each point of randomization is then considered. The substituents are selected on the basis of their ability to 1) interact with a specific residue(s) in the active site and 2) couple with the template through accessible synthetic reactions compatible with the combinatorial protocol (synthetic feasibility). The generated list of accessible virtual ligands is then computationally screened against the active site and ranked on the basis of the scoring function available. For example, starting with a combinatorial template positioned in the active site of the target protein, the SurflexDock program (Tripos) uses a special scoring function to rank potential substituents at each position on the template. Based on the calculated score, a target-specific library of synthetically accessible molecules is then generated, which may then be prioritized for synthesis and assay.
Alternatively, knowledge of the active site parameters can be used for the generation of pharmacophore hypotheses which are then applied for library design. The pharmacophores define a design space that can be used to select compounds using an informative library design tool. The method was used in prioritizing molecules biased against a cyclin-dependent kinase target, CDK-2. Researchers at Vernalis developed sets of strategies to address receptor flexibility (CDK-2 and HSP90) in virtual screening experiments using multiple crystallographic structures . Based on their assessment, combination of flexible receptor docking algorithm and a robust scoring scheme for hits resulted in a significant improvement of binding affinities.
Customized algorithms, which combine combinatorial library design tools with structure-based design techniques, are viewed by both scientific and business communities as a serious competitive advantage. Despite this fact, there are several key questions about these product:
- What are the performance and limitations of the approach?
- Is the method properly validated? Is the user interface convenient?
- Are the programs compatible with other industry-standard chemoinformatics platforms?
Questions such as these will be taken into consideration should one implement these programs for target directed research. It should also be noted that most of these technologies are still in their infancy, and future practical works will highlight their role in contemporary drug discovery. Practical utility of the target-structure-based approach in the design of chemical libraries is still limited because of the requirement of quality crystallographic data, detailed knowledge of the ligand binding mode and inherent issues concerning scoring functions. The stepwise procedure of selection and filtering using simpler ligand-based technologies can reduce the virtual databases to a manageable size. Such pre-screening procedure leaves the high-ranking molecules for further analysis by biostructure-based docking and scoring, and thus provides both activity enrichment and structural novelty.
5.3. Library design based on special data mining algorithms
Pharmaceutical lead discovery and optimization have historically followed a sequential process in which relatively small sets of individual compounds are synthesized and tested for bioactivity. The information obtained from such experiments is then used for the selection of further molecules. With the advent of high-throughput synthesis and screening technologies, relatively simple statistical techniques of data analysis have been largely replaced by a massive parallel mode of processing, in which many thousands of molecules are synthesized and tested. As a result, the complete analysis of large sets of diverse molecules and their structural activity patterns have become an emerging problem. Hence, there is considerable interest in novel computational approaches that may be applied to extraction and utilization of useful information from such data sets. Among such 2D and 3D-clustering approaches, top computational dimensionality reduction techniques include Sammon maps, various neural-net-based (NN) methods, back-propagation (BPNNs), feed-forward neural networks (FFNNs), self-organizing maps (SOMs), support-vector machines (SVM), genetic algorithms (GAs), principal component analysis (PCA), and factor analysis (FA). In the current study we have successfully applied BPNN, FFNN, Kohonen-based SOM and Sammon reduction technique to the IC-focused library design.
Visual analysis of multivariate data sets have established itself as a powerful means in data mining to detect non-obvious and relevant information for further exploitation, in particular, topology and distance preserving mappings. Using the Kohonen-based SOMs  or distance preserving Sammon-based NLM , for example, are well suited for data visualization and data mining purposes.
The general idea of Kohonen-based SOMs is to map a set of vectorial samples onto a two-dimensional lattice in a way that preserves the topology of the original space. Kohonen maps were actively used for analysis and visualization of large datasets originated from screening campaigns. In particular, they appeared to be effective in the analysis of large databases created and hosted by the National Cancer Institute (NCI) . Kohonen maps were used by Gasteiger et al. for the analysis and visualization of HTS data; the developed structure-activity model was further utilized to design candidates for new sweeteners . The same group of researchers used SOMs for analysis of structure-activity relationships for 5,513 compounds from a combinatorial library . Based on the results of these studies, the authors suggest that the self-organizing maps can serve not only as an indicator of structure-activity relationships, but as the basis of a classification system allowing for the predictive modeling of combinatorial libraries.
By contrast to SOM, NLMs represent relative distances between all pairs of compounds in the descriptor space of a 2D map. The distance between two points on the map directly reflects the similarity of the compounds . NLMs have previously been used for the visualization of protein sequence relationships in two dimensions and comparisons between large compound collections, which are represented by a set of molecular descriptors . However, for large data sets, NLM computation is becoming more and more intractable. In addition, the approach may generate 2D mapping that poorly approximates the original distances when the number of compounds is large. Several heuristic variants were introduced to alleviate the NLM complexity problem and make it useful for mapping large data sets . Usually, a significant speed gain can be achieved by these modified approaches as compared to NLM. At the same time, they provide better distance and topology preservation as compared to Kohonen maps.
The described computing tools provide interactive, fast, and flexible data visualizations of chemical data that help and even enhance human thought processes. However, visualization alone is often inadequate when multiple data points need to be considered. A number of data mining methods, which seek to identify significant relationships in large multidimensional databases, are now being used for library design.
Partitioning methods occasionally struggle to provide the accuracy associated with more powerful, albeit less informative techniques such as machine learning and statistical approaches. Due to these reasons, there is a continuing need for the application of more accurate and informative classification techniques to quantitative structure-activity relationship (QSAR) analysis. The goal of a classifier is to produce a model that can separate new untested compounds into classes using a training set of already classified compounds.
It is important that QSAR methods are quick, give unambiguous models, do not rely on any subjective decisions about the functional relationships between structure and activity, and are easy to validate. In the past 10-15 years, methods based on artificial neural networks have been shown to overcome some of these problems. For example, these can manage both linear and nonlinear SARs observed in real practice. There are reports that describe successful application of neural network algorithms to cluster compounds in large datasets with low signal-to-noise values. A recent review  on the concepts behind neural networks applied to QSAR analysis, points out problems that may be encountered, suggests ways of avoiding the pitfalls, and introduces several exciting new neural network methods discovered during the last decade. Besides ANNs, there are a number of unique classification methods with high prediction accuracy, including the support vector machine (SVM) and Genetic algorithm.
5.4. Prediction and optimization of ADME/Tox properties based in in silico methods
Poor pharmacokinetics and toxicity are important causes of costly late-stage failures in drug development. It is generally recognized that in addition to optimized potency and specificity, chemical libraries should also possess favorable ADME/Tox and drug-like properties . Assessment of a drug-like character is an attempt to decipher molecular features that are likely to lead to a successful in vivo and, ultimately clinical candidate . Many of these properties can be predicted before molecules are synthesized, purchased or even tested in order to improve overall lead quality.
Considerable research efforts were focused on novel machine learning algorithms that predict ADME/Tox properties of new chemical entities. Computer-aided techniques now permit enhancement of the latter strategy with an additional set of in silico filters (Fig. 9).
These calculations can be performed with very large numbers of molecules and act as a form of multidimensional selection filter. For example, comparative molecular fields analysis (CoMFA) and pharmacophore approaches (for review, see ) have been used to model binding modes of metabolizing cytochrome P450 (CYP) enzymes, transporters such as P-glycoprotein , nuclear hormone receptors , and ion channels , important for drug-drug interactions. Recursive partitioning methods have been used extensively with these large sets of molecules and either continuous or binary data . Kohonen self-organizing and Sammon maps have successfully been applied to model various ADME/Tox properties, including cytochrome P450 mediated drug metabolism , blod-brain barrier permeability (BBB), human intestinal absorption (HIA), plasmaprotein binding affinity (PPB), volume of distribution (Vd), plasma half-life time (T1/2), and specific cell toxicity. Many of the reported to-date ADME/Tox models are rule-based. For example, some research groups have used relatively simple filters like the rule of 5  and others  to limit the types of molecules evaluated with in silico methods and to focus libraries for high throughput screening. However, being designed as rapid “computational alert” tools aimed at single property of interest, they cannot offer a comprehensive picture when it comes to understanding ADME models.
Multivariate data mining techniques can serve as the basis for advanced ADME filters. Thus, we have developed a method for early evaluation of several important pharmacokinetic parameters, including Vd and T1/2 . These two parameters determine the dose regimen of a drug; the early prediction of both properties would be of a great benefit. It was demonstrated that such complex properties can be effectively modeled using the non-linear mapping algorithms based on a pre-selected set of electronic, topological, spatial, and structural descriptors. Generated models demonstrated good predictive power in the internal and external validation experiments with up to 80-90% compounds classified accurately. The achieved accuracy level can be used as a guide in modifying and optimizing these pharmacokinetic properties in chemical libraries.
Collection of algorithms for prediction of a number of ADME/Tox related properties is now integrated on the basis of our SmartMining/ADMET software suite available from ChemDiv. To date, they were initially validated on human intestinal absorption, blood–brain-barrier, plasma half-life, volume of distribution, plasma protein binding , CYP450 substrate/non-substrate potential , and binding affinity  models. These algorithms were further extended to evaluation of important physico-chemical properties such as DMSO solubility  and target-specific activity . While other software tools for ADME modeling are available (for example ), the SmartMiningbased collection of predictive classification tools is both extensive and well validated in multiple library design projects. These methods are particularly suited for rapid evaluation of both large and medium-sized compound libraries in connection with early ADME/Tox profiling.
6. Concept and applications
IC-targeted library design at CDL involves:
• A combined profiling methodology that provides a consensus score and decision based on various advanced computational tools:
- Unique bioisosteric morphing and funneling procedures in designing novel potential ICs ligands with high IP value. 2D/3D-structure similarity and compound diversity. We apply CDL’s proprietary ChemosoftTM software and commercially available solutions from Accelrys, MOE, Daylight and other platforms.
- Neural Network tools for target-library profiling, in particular Self-organizing Kohonen maps, performed in SmartMining software. We have also used the Sammon mapping and Support vector machine (SVM) methodology as more accurate computational tools to create our IC-focused library.
- 3D-pharmacophore modeling/searching as well as 3D-molecular docking study for the individual classes of ICs agents.
- “Rapid Elimination of Swill” (REOS) filters. Computational-based `in silico` ADME/Tox assessment for novel compounds includes prediction of human CYP P450-mediated metabolism and toxicity as well as many pharmacokinetic parameters, such as Brain-Blood Barrier (BBB) permeability, Human Intestinal Absorption (HIA), Plasma Protein binding (PPB), Plasma half-life time (T1/2), Volume of distribution in human plasma (Vd), etc.
The fundamentals for these applications are described in a series of our recent articles on the design of exploratory small molecule chemistry for bioscreening [for related data visit ChemDiv. Inc. online source: www.chemdiv.com]. Our multi-step in silico approach to IC-focused library design is schematically illustrated in Fig. 10.
This common approach was effectively applied for the developing of our IC-focused, in particular for calcium, potassium and sodium channels.
• Synthesis, biological evaluation and SAR study for the selected structures:
- High-throughput synthesis with multiple parallel library validation. Synthetic protocols, building blocks and chemical strategies are available.
- Library activity validation via bioscreening; SAR is implemented in the next library generation.
6.1. Bioisosteric transformations, 2D-structure similarity/diversity and topological pharmacophore
The entitled methods are crucial in drug design and development . For example, the bioisosteric morphing refers to the compounds or substructures that share similar shapes, volumes, electronic distributions, physicochemical properties, therefore having similar biological activity . Bioisosteric approach is useful for morphing the marginal chemotypes; several key bioisosteric transformations, topological similarities, and pharmacophore within ICagents are illustrated in Fig. 11(a,b).
To continue reading, click the link below.