Summary: Protein-coding messenger RNAs (mRNAs) are produced by extensive processing of the gene transcripts, pre-mRNAs, including splicing of introns and 3'end cleavage and polyadenylation. Pre-mRNA processing occurs co- and post-transcriptionally and it is mediated by RNA-binding proteins and specialized particles called snRNPs. Our research is focused on these instruments because of their critical roles in gene expression regulation and their disfunction is the cause of many human diseases.
Keywords: RNA-protein complexes (RNPs) | post-transcriptional gene regulation | splicing | mRNA length regulation (Telescripting) | the SMN complex, an RNP assembly machine | spinal muscular atrophy (SMA) | motor neuron degenerative disease | high-throughput technologies | drug discovery
RNA-binding proteins bind nascent transcripts and profoundly influence each step of post-transcriptional gene expression pathway, including splicing, 5' and 3' end processing, mRNA transport, localization, translation and stability.
Overview: The transcripts of all genes exist as RNPs and are regulated by an enormous assortment of RNA-binding proteins and often also by small noncoding RNAs. Cotranscriptionally, nascent messenger RNA precursors (pre-mRNAs) associate with the hnRNP proteins, RNA-binding proteins that profoundly influence pre-mRNA processing, including 5' capping, splicing of introns and 3'-end polyadenylation. The protein composition and arrangement in these RNP complexes is highly specific and dynamic to each transcript.
Many of these hnRNP proteins, and others that associate later as a consequence of the processing reactions that form the mature mRNAs, provide a means to integrate and regulate the steps in the gene expression pathway, including downstream transport, localization, translation, and stability of mRNAs. An additional class of components, called snRNPs, the subunits of the spliceosome, bind pre-mRNAs and play critical roles in their splicing into mRNAs. Each snRNP is composed of a noncoding small nuclear RNA (snRNA) bound by seven RNA-binding proteins (the Sm proteins) organized as a ring around the snRNA (‘Sm core’), and one or more snRNP-specific proteins. Considering this complexity, understanding both how cells form specific RNPs and the consequences of disruption of their exquisitely precise choreography is of fundamental importance in biology.
RNPs were previously believed to form by self-assembly. Indeed, Sm proteins readily assemble Sm cores on RNAs, but they do so promiscuously, which undoubtedly would be deleterious. We described previously the SMN (survival of motor neurons) complex and showed that it is an assembly machine crucial for preventing illicit Sm core assembly, and we determined how it identifies the snRNAs and restricts Sm proteins to only associate with them. Much of our current research focuses on the biochemistry of the SMN complex, which is essential for biogenesis of snRNPs and plays a central role in RNA metabolism, and on SMA (spinal muscular atrophy). SMA, a common motor neuron degenerative disease and leading hereditary cause of infant mortality, results from reduced SMN levels—affecting 1 in ~6,000 births. As expected for such a fundamental function, SMN is essential for viability of all animal cells.
The SMN Complex, an RNP Assembly Machine
The SMN complex includes SMN, Gemins 2–8, and unrip. The assembly of the heptameric Sm cores on a short 7-nucleotide Sm site of each snRNA (U1, U2, U4, U5, U11, U12, U4atac) that the SMN complex performs is an architectural feat requiring many components and great specificity. Because of the SMN complex's fundamental importance for producing the key elements of the splicing apparatus and its potential to reveal new targets and biomarkers for SMA therapy, we are studying its function, mechanism, regulation, and structure. Furthermore, SMN's function in snRNP assembly is a critical process, akin to chaperone-mediated protein folding, and is rich in opportunities to gain fundamental insights into RNA-protein interactions.
The SMN complex specifically binds Sm proteins and the snRNAs' distinguishing snRNP code, a common structural feature (~50 nucleotides) that contains the Sm site and an adjacent 3' stem loop. However, none of the SMN complex components contains any known RNA-binding motifs. Our studies revealed that Gemin5 is the snRNA-specificity factor of the SMN complex. Gemin5 binds the snRNP code, via its WD-repeat domain, a common scaffold found in numerous proteins of diverse pathways, typically mediates protein-protein interactions, and was not previously known to bind RNA independently. The identification of the WD-repeat domain as a new RNA-binding motif potentially increases the repertoire of RNA-binding proteins.
An understanding of physiological processes often depends on quantitative biochemical assays and small-molecule modulators, both of which were lacking for the SMN pathway and for RNPs in general. We developed a sensitive high-throughput screening (HTS) assay to measure snRNP assembly in cell extracts and found that sub-toxic reactive oxygen species (ROS) rapidly induce SMN intermolecular disulfide bridging and concomitantly inactivate the SMN complex activity both in vitro and in cells, mimicking SMN deficiency. This points to a potential mechanistic convergence of SMA with other neurodegenerative diseases, such as amyotrophic lateral sclerosis (ALS) and Parkinson's, which have also been linked to oxidative stress.
To identify modulators of the SMN complex we used automated immunofluorescence microscopy screening and took advantage of SMN's distinct subcellular localization in the cytoplasm and in discrete nuclear bodies we call Gems. This assay revealed that protein synthesis inhibitors rapidly dissociate the SMN complex into distinct subunits and impair snRNP assembly. To capture the dynamics of the SMN complex in cells and define the points of action of the SMN complex modulators, we used a riboproteomics strategy, combining formaldehyde crosslinking with mass spectrometry and high-throughput sequencing of bound RNAs. Unlike UV crosslinking, developed earlier in our laboratory to identify RNA-binding proteins in cells, the riboproteomics strategy also captures protein-protein interactions within the same complex. From these, we captured an snRNA precursor (pre-snRNA)-Gemin5 intermediate and identified precursors of all the snRNAs, previously undetected since their discovery in the 1960s. This also captured an ROS-stalled complex, containing pre-snRNAs and all the SMN complex components, which is likely the active intermediate poised for snRNP assembly. These findings suggested a stepwise pathway of SMN complex formation and snRNP biogenesis, highlighting Gemin5's role in delivering pre-snRNAs to the SMN subunit as the substrates for snRNP assembly and processing.
U1 snRNP Protects Pre-mRNAs from Premature Termination
Stimulated by the observations of snRNP abundance changes in SMA, we asked if this could be the cause of the splicing abnormalities. To explore this, we used antisense oligonucleotides to interfere systematically with the function of each individual snRNP, and monitored global transcriptome changes by genomic tiling arrays and high-throughput sequencing. Unexpectedly, in addition to inhibiting splicing, U1 snRNP knockdown caused premature termination of pre-mRNAs, typically in the first few introns of the majority of genes. This results from premature cleavage and polyadenylation (PCPA) at cryptic polyadenylation signals present throughout introns. PCPA suppression is a novel U1 snRNP-specific function independent of its known role in splicing, as knockdown of other snRNPs does not cause PCPA. PCPA is thus a default state. Its suppression by U1 is essential for protecting the transcriptome, having major implications for gene regulation. We envisage that pre-mRNAs are under constant threat cotranscriptionally from the cleavage and polyadenylation machinery that normally processes the 3' ends of the transcripts.
Previously thought of as splicing's "hardware," like ribosomal subunits for translation, the role of snRNP abundance and stoichiometry was unexplored and their dynamics, revealed from this work, were unexpected. We are pursuing these new directions, including the PCPA mechanism, its consequence on alternative splicing, and its potential to regulate proteome and cell physiology.
Humans have two SMN genes, SMN1 and SMN2, with identical protein-encoding capacity. Most SMA patients (>97 percent) have homozygous SMN1 deletions and are sustained by SMN2. However, a single nucleotide synonymous substitution in SMN2 exon 7 compromises its splicing, causing most of the SMN2 mRNA (~80 percent) to lack exon 7 (SMN∆7). The resulting SMN∆7 protein is extremely unstable and rapidly degraded, leaving SMA patients with SMN deficiency, the degree of which correlates with SMA severity. Considering SMN's ubiquitous expression and indispensable function, SMA's selective motor neuron pathology remains unexplained. Importantly, there is currently no therapy for this devastating disease. Our current research projects, which are intertwined and synergistic, include elucidating fundamental mechanisms of gene regulation and developing approaches for drug discovery.We have been studying the cause and molecular consequences of SMN deficiency. SMN∆7, the major product of SMN2, is extremely unstable and virtually undetectable. We found that the C-terminal 15–amino acid peptide, resulting from exon 7 skipping, is a potent protein-destabilizing signal (degron). Degron-inactivating mutations yield a stable SMN∆7 protein that can rescue viability of SMN-depleted cells and restore snRNP assembly activity. SMN∆7 is therefore at least partially functional. Inhibition of its degradation could potentially ameliorate SMA severity. We are identifying SMN∆7 degradation mediators and using small-molecule HTS in search of selective SMN∆7 degron inhibitors.
Given the role of the SMN complex in snRNP biogenesis, we profiled the transcriptome (snRNAs and mRNAs) changes in SMN-deficient cells and an SMA mouse model. This revealed different and tissue-specific effects on each snRNP's abundance, and widespread mRNA-splicing abnormalities in numerous transcripts of diverse genes in all tested tissues. These findings demonstrate a key role for the SMN complex in splicing regulation and RNA metabolism and indicate that SMA is a general splicing disease that is not restricted to motor neurons, providing a new perspective on this disease. Although clinical experience argues that motor units are the primary disease target, increasing evidence shows that other tissues are also affected. We continue to investigate SMA pathogenesis. This investigation includes extensive transcriptome profiling of motor neurons and other neurons isolated by laser-capture microdissection from presymptomatic SMA mice. These studies illustrate how a lesion in a ubiquitous housekeeping factor can impair a specific neuronal population (a common theme in neurodegenerative diseases), highlight the critical role of RNA metabolism perturbations in neuropathology, and could provide targets and biomarkers for SMA therapy.
High-Throughput Screens: Discovery of Compounds That Increase SMN in SMA Cells
We also developed a comprehensive discovery strategy, including a battery of secondary assays for hit validation and characterization. Based on these, we established collaboration with a pharmaceutical partner for a large-scale HTS effort. The screening campaign yielded selective hits – active compounds that are currently being studied in our laboratory for research and potential drug development for SMA. The active compounds provide proof of principle of the highly versatile CIA and discovery strategy, which is applicable to numerous other diseases. Additional HTSs we developed for SMA, based on SMN∆7 stabilization, increasing SMN complex activity, and other modalities, have already produced insights and will be implemented on a large scale.