CAGI Scientist Opening: apply here ☀️

Identify the splicing impact of variation

CHALLENGE WITHDRAWN

This challenge is withdrawn, because the results were unexpectedly published. If you know of another splicing dataset possible suitable for CAGI, please contact the organizers at cagi@genomeinterpretation.org.

Dataset description: public

Dataset: public

Background 

Accurate precursor mRNA (pre-mRNA) splicing is required for the expression of protein coding genes from the human genome. In this process, intervening sequences (introns) are removed from pre-mRNA and coding/regulatory sequences (exons) are ligated together generating a mature mRNA. A large ribonucleoprotein machine called the spliceosome assembles de novo upon every nascent intron and catalyzes the chemical steps of splicing. Numerous auxiliary cis-acting elements guide the spliceosome to correct pairs of splice sites. Exonic sequences are densely packed with regulatory elements such as splicing enhancers and splicing silencers (ESE and ESS, respectively). Thus, many exon sequences are multifunctional and contain overlapping information required to specify accurate pre-mRNA splicing and to dictate the primary structure of polypeptides. To better understand genotype-phenotype relationships it is critical to determine if polymorphisms influence the function of exon sequences in pre-mRNA splicing, mRNA translation or potentially both steps.

For the past several years the group of Jeremy Sanford at University of California, Santa Cruz has been working to identify splicing-sensitive disease mutations using the Human Gene Mutation Database. In an initial study his group identified thousands of putative splicing-sensitive disease mutations and validated a handful of aberrant splicing events.

Prediction challenge

Predictors are asked to compare exons from wild type and disease-associated alleles of four different disease genes and then predict which exons will exhibit aberrant pre-mRNA splicing. The submitted prediction should be the change in percentage inclusion of the exon, as delta "percent spliced in" (ΔPSI) as compared to the wild type. In addition, we ask predictors to describe the mechanism how splicing is affected. The predictions will be compared to experimental results.

Dataset: The dataset is composed of 4 pairs of exons from 4 different genes. Each pair contains a wild type sequence and a mutant sequence differing by only a single nucleotide. Each pair of exons was assayed, experimentally for splicing efficiency.

1. Disease: Optic Neuron Atrophy

Gene: OPA1

Chr3 +strand 194843856-194843927

Wild Type Exon Sequence:

ACCATATCCTTAAATGTAAAAGGCCCTGGACTACAGAGGATGGTGCTTGTTG

ACTTACCAGGTGTGATTAAT

Mutant Exon Sequence:

ACCATATCCTTAAATGTAAAAGGCCCTGGACTACAGAGGATGGTGCTTGTTG

ACTTACTAGGTGTGATTAAT

2. Disease: Hyperchromatosis

Gene:TFR2

Chr 7 -strand 100068560-100068682

Wild Type Exon Sequence:

GGAGAGCTGGTGTACGCCCACTACGGGCGGCCCGAAGACCTGCAGGACCT

GCGGGCCAGGGGCGTGGATCCAGTGGGCCGCCTGCTGCTGGTGCGCGTGG

GGGTGATCAGCTTCGCCCAGAAG

Mutant Exon Sequence:

GGAGAGCTGGTGTACGCCCACTAGGGGCGGCCCGAAGACCTGCAGGACCT

GCGGGCCAGGGGCGTGGATCCAGTGGGCCGCCTGCTGCTGGTGCGCGTGG

GGGTGATCAGCTTCGCCCAGAAG

3. Disease: McArdle Disease

Gene PYGM

Chr 11 -strand 64278301 - 64278393

Wild Type Exon Sequence:

GTGGCCATCCAGCTCAATGACACCCACCCCTCCCTGGCCATCCCCGAGCT

GATGAGGATCCTGGTGGACCTGGAACGGATGGACTGGGACAAG

Mutant Exon Sequence:

GTGGCCATCCAGCTCAATGACACCCACCCCTCCCTGGCCATCCCCGAGCT

GATGAGGATCCTGGTGGACCTGGAACGGATGGACTAGGACAAG

4. Disease: Cardiomyopathy

Gene: MYH7

Chr 14 -strand 22968004 - 22968153

Wild Type Exon Sequence:

GTGATATATGCCACTGGGGCACTGGCCAAGGCAGTGTATGAGAGGATGTT

CAACTGGATGGTGACGCGCATCAATGCCACCCTGGAGACCAAGCAGCCAC

GCCAGTACTTCATAGGAGTCCTGGACATCGCTGGCTTCGAGATCTTCGAT

Mutant Exon Sequence:

GTGATATATGCCACTAGGGCACTGGCCAAGGCAGTGTATGAGAGGATGTT

CAACTGGATGGTGACGCGCATCAATGCCACCCTGGAGACCAAGCAGCCAC

GCCAGTACTTCATAGGAGTCCTGGACATCGCTGGCTTCGAGATCTTCGAT

 Dataset provided by 

Tim Sterne-Weiler and Jeremy Sanford, University of California, Santa Cruz