Identify the splicing impact of variation
CHALLENGE WITHDRAWN
This challenge is withdrawn, because the results were unexpectedly published. If you know of another splicing dataset possible suitable for CAGI, please contact the organizers at cagi@genomeinterpretation.org.
Dataset description: public
Dataset: public
Background
Accurate precursor mRNA (pre-mRNA) splicing is required for the expression of protein coding genes from the human genome. In this process, intervening sequences (introns) are removed from pre-mRNA and coding/regulatory sequences (exons) are ligated together generating a mature mRNA. A large ribonucleoprotein machine called the spliceosome assembles de novo upon every nascent intron and catalyzes the chemical steps of splicing. Numerous auxiliary cis-acting elements guide the spliceosome to correct pairs of splice sites. Exonic sequences are densely packed with regulatory elements such as splicing enhancers and splicing silencers (ESE and ESS, respectively). Thus, many exon sequences are multifunctional and contain overlapping information required to specify accurate pre-mRNA splicing and to dictate the primary structure of polypeptides. To better understand genotype-phenotype relationships it is critical to determine if polymorphisms influence the function of exon sequences in pre-mRNA splicing, mRNA translation or potentially both steps.
For the past several years the group of Jeremy Sanford at University of California, Santa Cruz has been working to identify splicing-sensitive disease mutations using the Human Gene Mutation Database. In an initial study his group identified thousands of putative splicing-sensitive disease mutations and validated a handful of aberrant splicing events.
Prediction challenge
Predictors are asked to compare exons from wild type and disease-associated alleles of four different disease genes and then predict which exons will exhibit aberrant pre-mRNA splicing. The submitted prediction should be the change in percentage inclusion of the exon, as delta "percent spliced in" (ΔPSI) as compared to the wild type. In addition, we ask predictors to describe the mechanism how splicing is affected. The predictions will be compared to experimental results.
Dataset: The dataset is composed of 4 pairs of exons from 4 different genes. Each pair contains a wild type sequence and a mutant sequence differing by only a single nucleotide. Each pair of exons was assayed, experimentally for splicing efficiency.
1. Disease: Optic Neuron Atrophy
Gene: OPA1
Chr3 +strand 194843856-194843927
Wild Type Exon Sequence:
ACCATATCCTTAAATGTAAAAGGCCCTGGACTACAGAGGATGGTGCTTGTTG
ACTTACCAGGTGTGATTAAT
Mutant Exon Sequence:
ACCATATCCTTAAATGTAAAAGGCCCTGGACTACAGAGGATGGTGCTTGTTG
ACTTACTAGGTGTGATTAAT
2. Disease: Hyperchromatosis
Gene:TFR2
Chr 7 -strand 100068560-100068682
Wild Type Exon Sequence:
GGAGAGCTGGTGTACGCCCACTACGGGCGGCCCGAAGACCTGCAGGACCT
GCGGGCCAGGGGCGTGGATCCAGTGGGCCGCCTGCTGCTGGTGCGCGTGG
GGGTGATCAGCTTCGCCCAGAAG
Mutant Exon Sequence:
GGAGAGCTGGTGTACGCCCACTAGGGGCGGCCCGAAGACCTGCAGGACCT
GCGGGCCAGGGGCGTGGATCCAGTGGGCCGCCTGCTGCTGGTGCGCGTGG
GGGTGATCAGCTTCGCCCAGAAG
3. Disease: McArdle Disease
Gene PYGM
Chr 11 -strand 64278301 - 64278393
Wild Type Exon Sequence:
GTGGCCATCCAGCTCAATGACACCCACCCCTCCCTGGCCATCCCCGAGCT
GATGAGGATCCTGGTGGACCTGGAACGGATGGACTGGGACAAG
Mutant Exon Sequence:
GTGGCCATCCAGCTCAATGACACCCACCCCTCCCTGGCCATCCCCGAGCT
GATGAGGATCCTGGTGGACCTGGAACGGATGGACTAGGACAAG
4. Disease: Cardiomyopathy
Gene: MYH7
Chr 14 -strand 22968004 - 22968153
Wild Type Exon Sequence:
GTGATATATGCCACTGGGGCACTGGCCAAGGCAGTGTATGAGAGGATGTT
CAACTGGATGGTGACGCGCATCAATGCCACCCTGGAGACCAAGCAGCCAC
GCCAGTACTTCATAGGAGTCCTGGACATCGCTGGCTTCGAGATCTTCGAT
Mutant Exon Sequence:
GTGATATATGCCACTAGGGCACTGGCCAAGGCAGTGTATGAGAGGATGTT
CAACTGGATGGTGACGCGCATCAATGCCACCCTGGAGACCAAGCAGCCAC
GCCAGTACTTCATAGGAGTCCTGGACATCGCTGGCTTCGAGATCTTCGAT
Dataset provided by
Tim Sterne-Weiler and Jeremy Sanford, University of California, Santa Cruz