Molecular structure and dynamics underly all biological processes. Most research in biomedical fields is directly or indirectly associated with molecular behavior. In the field of biochemistry, experimental approaches have been directly targeting these molecular properties, albeit within a specific “window” of observable length and time scales. Computational approaches can offer much wider windows, but present only hypotheses about the system rather than “real” observables. Integrated computational and experimental approaches, therefore, present unparalleled strategies for exploring biological systems, a philosophy that has guided us in my laboratory.

Perhaps the most outstanding challenge in computational biology is sampling of molecular conformational states. Most research in our laboratory has been dedicated to solving this critical problem. We have been pursuing this goal by developing rapid discrete molecular dynamics simulations, novel and unique in the level of accuracy molecular docking algorithm that we have already used to find novel lead compounds to treat CF and reduce chronic and acute pain (in both cases these compounds are been pursued by biotech companies). We are currently pursuing understanding of other human diseases and developing novel pharmaceutical strategies to combat them, and actually searching for compounds using a combination of virtual drug screening and experiment. Application of these tools also allows rational design of proteins that can control other proteins in living cells and organisms. These tools are becoming invaluable resources that enable decoupling cellular networks, as well understanding the impact of a specific protein on the phenotype of an organism. Below, we describe ongoing projects in the lab.

METHODOLOGIES

Molecular Modeling and Simulations

Many biological processes occur at time scales not accessible to traditional simulation approaches. To reach biologically-relevant time scales, simplified (coarse-grained) models of macromolecules have been developed, in which groups of atoms are replaced by effective atoms (beads), thereby reducing the number of degrees of freedom. We have developed one of the first simplified protein models for a rapid simulation approach known as Discrete Molecular Dynamics (DMD). Unlike molecular mechanics simulations driven by physical forces, DMD simulations are driven by collision events (between atoms/beads). The DMD algorithm gains efficiency over traditional molecular dynamics (MD) because of two characteristics: on average a larger time step and a faster searching/updating algorithm. Therefore, DMD permits rapid and accurate sampling of the conformational space of proteins and protein complexes and is known for its rapid performance in simulating polymer fluids, single homopolymers, proteins, protein aggregates, and gases and liquids.

We have developed a multiscale simulation engine that integrates a variety of simplified models to span time scales ranging between sub-nanoseconds (side-chain motions) and hours (protein aggregation). For reaching sub-nanosecond time scales, we coupled DMD to quantum mechanical simulations. Such coupling allows studies of complex phenomena like the influence of protein dynamics on its enzymatic activity. Key steps to integration are methodologies for transferring between coarse-grained and all-atom models, which are often highly non-trivial since a unique solution to a specific transformation may not exist.

Figure 1
Figure 1. Ab initio folding using DMD simulations.

To reach biologically-relevant time scales not reachable by traditional simulation approaches, we developed molecular models for the rapid simulation approach, Discrete Molecular Dynamics (DMD). These models differ by the accessible time scales. Despite their simplicity, these models have strikingly predictive power as shown by our recent studies. We built a web-based platform, iFold (http://ifold.dokhlab.org), for performing DMD protein simulations. iFold is the first molecular dynamics engine available via web interface. To perform protein design and structure prediction, we developed a protein modeling suite, MEDUSA, that can account for conformational rearrangements during design. Using MEDUSA, we developed a methodology, Eris (eris.dokhlab.org), to efficiently and accurately compute the stability changes of proteins upon mutations.

Drug Discovery

Uncovering structures of molecular complexes via computational docking is at the heart of virtual drug screening and many structural modeling efforts. Modeling both receptor and ligand flexibility is important in order to capture conformation changes induced by ligand binding, but is a major challenge in computational drug discovery. This challenge stems from the vast conformational space sampled by both ligand and receptor and by their relative positioning: upon ligand binding, the system loses translational entropy and a significant number of configurations (configurational entropy) of the ligand, which depends strongly on its number of degrees of freedom (e.g. the number of rotatable bonds). The complexity of configurational entropy calculation requires an approach that incorporates both receptor and ligand flexibility in a computationally efficient manner.

Figure 2
Figure 2. Small Ligand and peptide docking.

Small molecule docking. Existing flexible docking approaches model ligand and receptor flexibilities either separately or in a loosely-coupled manner, which captures the conformational changes inefficiently. The principal limitation is that no current docking algorithm directly models the “induced fit” phenomenon, whereby both ligand and receptor undergo synergistic conformational changes. We have developed a truly flexible docking approach, MedusaDock, which models both ligand and receptor flexibilities simultaneously and synergistically (Figure 2), is distinct from ensemble-docking algorithms, which perform rigid docking of ensembles of separately generated ligand and target conformations. MedusaDock benchmarks demonstrate rapid sampling efficiency and high prediction accuracy in both self-docking (to the co-crystallized state) and cross-docking (to a state crystallized with a different ligand), the latter of which mimics the virtual screening procedure in computational drug discovery. We also demonstrated remarkable performance in the recent Community Structural-Activity Resources (CSAR) docking competition, showing in a blind test the ability to predict binding site and pose (root-mean-square distance < 2.5Å) of drugs in 80% of test cases.

The high predictive power of MedusaDock comes from several innovations, including generation of a stochastic rotamer library of ligands, an efficient docking protocol, and a novel ligand pose-ranking method. We expect broad adaption of these novel methodologies and application of MedusaDock in ligand-receptor interaction predictions and drug discovery. My laboratory has successfully used MedusaDock for virtual drug screening for two targets: cystic fibrosis transmembrane conductance regulator (CFTR) and μ-opioid receptor. In both cases, we identified and experimentally validated leads that are now under further optimization in a commercial setting. Currently, we are engaged in several virtual screening projects by utilizing MedusaDock.

Peptide docking. Understanding molecular mechanisms of protein-peptide recognition is vital for predicting and designing novel protein-peptide interactions, and applicable to systems in biology, medicine, and pharmaceutical sciences. Modeling protein-peptide associations is challenging due to many peptide degrees of freedom, ill-defined boundaries of the peptide-binding site, and “induced fit” conformational changes of protein and peptide upon binding. Prediction of binding sites and poses of peptides has been one of the most outstanding problems in computational structural biology. Until recently, no single method simultaneously predicted the binding pocket on the protein surface and the experimental pose of the peptide using only the unbound conformation of the protein and the sequence ofthe peptide. We recently proposed a DMD-based approach that allows direct observation of atomistic details of protein-peptide interactions. Using our new method, we revealed the critical contribution of electrostatics and conformational flexibility in peptide recognition.

We are now extending our peptide docking methodology to develop peptide-based sensors, such as a recently designed dye-conjugated peptide that senses cysteine-rich intestinal protein (CRIP1), overexpressed in 90% of breast cancers. Among several other targets we pursue are competitive inhibitors of the ricin A-chain, a potential bioterrorism threat.

Protein Design of Tools for Sensing and Controlling Proteins

Mutagenesis is a valuable exploratory tool in molecular biology and biotechnology. To what extent mutations affect thermodynamic stability and structure of proteins is often vital to experimental design. Estimations of protein stability remain challenging in computational molecular biology. We developed novel methodology, Eris, for accurately predicting mutation-induced protein stability changes. The distinguishing innovation allowing Eris to outperform most computational techniques is the physical force-field combined with fast conformation-sampling algorithm. Since our introduction of the web-platform (eris.dokhlab.org) to the community, we have had >1,400 subscribers performing >20,000 jobs. Eris, combined with a search algorithm for a sequence with targeted stability, allows protein design, designing mutations that modulate protein stability, and peptides that bind to a protein. Perhaps the most challenging design we have performed was of a novel unstructured protein, uniRapR, that transitions to distinct structure upon binding of rapamycin. We delineated the allosteric pathway to the active site of kinases, and used uniRapR to control kinases with rapamycin. Without rapamycin, uniRapR allosterically destabilizes the active site, inactivating the kinase (Figure 3). Upon addition of rapamycin, uniRapR gains structure and stabilizes the active site, activating the kinase. We demonstrated this methodology in living cells and in zebrafish, and showed its transferability to various kinases. This successful design was possible because we combined DMD simulations with protein design.

Figure 3
Figure 3. Control of Src activity with uniRapR domain.
Figure 4
Figure 4.Scheme for allosteric modulation.

The ability to design proteins whose activities can be controlled with specificity and precise timing has broad implications for the study of protein, cell, and tissue dynamics. Our objective is to develop a widely applicable computational methodology to characterize the allosteric network in a protein and to identify the allosteric regulation hot spots (Figure 4). These identified hot spots can be potentially used in both rational drug design and protein engineering. The applicability of the developed methodology is underscored by the large number of proteins with three-dimensional structures known in one or more functional states. In our laboratory, we will focus on utilizing protein allostery to engineer regulatable proteins using both drug-activatable (RapR) and photo-activatable molecular switches. Allosteric regulation by light allows both spatial and temporal control of protein activity, and will be tremendously valuable for studies of cellular dynamics. The advantage of the drug-activatable molecular switches is their ability to be used in live animal tissues not accessible to light sources. Although drug-mediated activation allows only temporal control of protein activity, a “caged” RapR approach recently developed in collaboration with Drs. Alex Dieters and Klaus Hahn, can add spatial control. This approach relies on a photoactivated rapamycin that is uniquely responsive to the iFKBP we use for Src regulation. Upon successful development of the methodology, we will have means to computationally predict allosteric regulation hot spots in proteins and a set of molecular imaging tools that benefit several fields of research, including cancer, developmental biology, and pharmacology.

RNA structure and dynamics
Figure 5
Figure 5. Ab initio RNA structure prediction.

RNA molecules are known for their central roles in gene expression, splicing, and translation. However, only ~2% of transcribed RNA are protein-coding in humans, with the remainder fulfilling crucial roles in many cellular processes. RNA function is often achieved via specific ensembles of 3D conformations. Hence, knowledge of underlying 3D structure is often a fundamental prerequisite to completely understanding RNA functions. RNA 3D structure determination remains a paramount challenge in structural biology, with no universal methods to solve their structures. Only a couple hundred structures of small RNA molecules are currently solved, compared to >80,000 protein structures. High-resolution methods such as X-ray crystallography and NMR spectroscopy offer unparalleled atomic-level insight into RNA structure. However, many RNAs are not amenable to structural characterization by these methods because of conformational flexibility or large size. Recently, novel computational methods to determine RNA 3D structure have emerged, but are severely limited with regard to size, typically short fragments of 10-15 nucleotides, due to sampling problems. We have pioneered ab initio DMD simulations for determination of RNA structures up to 50 nucleotides. Our iFoldRNA webserver (ifoldrna.dokhlab.org) has been used widely since its introduction (~2,000 subscribers worldwide, >20,000 jobs computed).

Perhaps the most daunting problem in 3D structure prediction is RNA flexibility, which requires significant sampling for accurate structure determination. While the DMD algorithm offers a significant advantage in sampling conformation space, we are still limited by computational power and molecular size. One approach to limit conformational space is to use experimental data to bias conformational ensembles toward those observed experimentally. Constraints reduce the entropic contribution to the free energy of the molecular unfolded state, thereby stabilizing the native state(s). Hence, with sufficient number of experimental constraints, we can theoretically obtain accurate RNA structural models. In fact, NMR structure determination is based on these same principles: NOE constraints are determined using NMR, and these constraints are “satisfied” using computational algorithms (e.g. CYANA). Use of DMD with a force-field developed to fold RNA molecules permits use of significantly less number and much more diverse types of constraints. We have made significant effort to utilize various types of experimental constraints to determine 3D structures of large RNA (Figure 3). Our goal is to develop an automated, robust computational protocol for 3D structure determination using various types of experimental constraints and their combinations, and make this protocol widely available via iFoldRNA.

We plan to use the iFoldRNA platform to generate experimentally-testable hypotheses of RNA structure based on sequence and a diverse range of experimental constraints, including NMR and FRET. We have integrated each of these techniques with DMD to build structures of a fluoride riboswitch and free and bound Tetrahymena telomerase RNA (manuscripts in preparation). We plan to extend iFoldRNA so that the user will be able to input any type of constraints and the sequence of an RNA molecule, and obtain its 3D structure.

DISEASE ETIOLOGY

Protein misfolding is increasingly implicated in many diverse human diseases, including amyotrophic lateral sclerosis (ALS), cystic fibrosis (CF), anemia, phenylketonuria, hyperammonemia, Lesch-Nyhman syndrome, Fabry disease, and others. Protein misfolding results either in loss of function (e.g. in CF) or in toxic gain of function (e.g. in ALS). The fundamental significance of protein misfolding is difficult to over-estimate, yet little is known about molecular mechanisms that result in protein misfolding at the atomic level. Understanding protein misfolding and aggregation will greatly impact modern medicine and biology, and will allow development of novel pharmaceutical strategies.

We use a multidisciplinary approach that uniquely combines biophysics, biochemistry, and structural and computational biology to uncover mechanisms of protein misfolding and aggregation. We then use our knowledge of these mechanisms to uncover etiologies of human disease and develop novel therapeutic strategies. We study a broad range of diseases associated with protein misfolding, but focus on two very distinct ones: ALS and CF, described below.

Amyotrophic Lateral Sclerosis
Figure 6
Figure 6. Minimal pathway for SOD1 aggregation.

The emerging view of processes leading to neuronal death in ALS is that transient oligomeric states of disease-relevant proteins that appear during aggregation are cytotoxic. Specifically, misfolded non-native SOD1, mutations of which are associated with ALS, is found before onset of symptoms in mouse models of the disease and participates in numerous aberrant interactions with cellular components. These aberrant interactions are believed to cause cellular pathologies associated with ALS, such as loss of mitochondrial membrane potential and errors in misfolded protein clearance, supporting a primary role for soluble misfolded SOD1 in ALS pathogenesis. Soluble non-native SOD1 is found in sporadic (non-heritable) ALS patients, suggesting that SOD1 misfolding and aggregation are noxious factors, even in the absence of disease-associated mutations. Blocking surface patches that facilitate formation or deleterious interactions of soluble SOD1 oligomers holds therapeutic potential, but requires detailed structural and mechanistic insight that is not yet available. Our ultimate goal is to determine causes, mechanisms, and consequences of SOD1 misfolding and aggregation in ALS, particularly in the initial stages, while SOD1 is still soluble and therapeutic intervention is more tractable. We have identified a metastable oligomeric state common to both wild type and mutant SOD1. We also found that a physiological oxidative post-translational modification triggers dissociation of wild type and mutant SOD1 dimers, the first step in SOD1 aggregation (Figure 6). We hypothesize that both mutations and oxidative modifications induce structural changes in SOD1 that promote oligomerization and expose surfaces that participate in non-native cellular interactions. To test this hypothesis, we are currently characterizing structural changes that occur as SOD1 transitions from its native state into soluble, potentially toxic misfolded species. Since the stability of SOD1 dimers plays the most important role in SOD1 aggregation, we plan to perform drug screening to identify compounds that stabilize SOD1, for which we are currently developing cellular assays using the neuron cell line NSC-34.

Cystic Fibrosis
Figure 7
Figure 7. Theoretical Model of CFTR structure.

Misfolding of CFTR due to deletion of Phe508 (90% of CF cases) results in loss of mature functional CFTR and manifests in abnormal ion and disruption of fluid homeostasis essential for human lung function. Although great progress has been made in understanding the molecular basis of CF, an understanding of the defect in the CFTR ion channel remains a mystery. One of the most significant challenges that hamper our understanding of CFTR is inability to experimentally determine its structure due to structural disorder. We have computationally built the structure of CFTR and validated many of its elements in >30 cysteine crosslinking experiments. We discovered that mutation ΔF508 (Figure 7, red) disrupts interactions between nucleotide-binding domain (NBD1) and the transmembrane region (CL4, blue), thereby destabilizing tertiary structure. To restore critical NBD1:CL4 interactions, we performed virtual screening of the ZINC library of drug-like compounds (1.3×106 compounds) and identified several potential leads. One in particular, designated I16, shows significant rescue of CFTR-ΔF508 (U.S. Patent Application No. 61/636,945 April 23, 2012).

We are further studying allosteric sites for potential drug docking that are expected to have a similar effect to I16. We have previously determined that the regulatory insertion domain (RI) has a pronounced effect on protein stability, and its reengineering fully rescues CFTR. We are planning to perform virtual screening and experimental testing of compounds that stabilize RI.

Our work in the fields of ALS and CF are two main examples of our endeavors. We are also pursuing studies in other areas pertaining to human health. The diverse and unique set of computational and experimental techniques developed in our lab opens up new horizons for understanding and fighting human disease.

PROTEIN-BASED THERAPEUTICS AND NANOMEDICINE

Organophosphate Scavenging

Protein-based prophylactic or immediate post-exposure treatments with organophosphate bioscavengers offer significant promise as medical countermeasures for organophosphate toxicity. We are developing both catalytic and non-catalytic bioscavengers to neutralize organophosphate nerve agents.

HIV vaccine

Despite enormous scientific efforts, development of an effective vaccine for HIV-1 remains a significant challenge. In collaboration with Dr. Ron Swanstrom we use protein engineering to graft epitopes from the HIV viral protein gp120 on to bacterial proteins that would be further used to elicit immunological response to the epitopes in rabbits.

Expand all...