Challenges

Challenges about to be released:

Regulatory variants

  • Regulation saturation: Predict effect of all variants in 10 disease associated promoter and 11 enhancer elements in a MPRA.

    17,500 single nucleotide variants and small indels in 11 human disease associated enhancers (including IRF4, IRF6, MYC, SORT1) and 10 promoters (including TERT, LDLR, F9, HBG1) were assessed in a saturation mutagenesis massively parallel reporter assay.. Promoters were cloned into a plasmid upstream of a barcoded reporter, whose expression was measured relative to the plasmid DNA to determine the impact of promoter variants. Enhancers were assayed similarly, placed upstream of a minimal promoter. The challenge is to predict the functional effects of these variants in the regulatory regions upon barcoded reporter expression.

    Data provided by: Martin Kircher, Translational Genomics Center, Berlin Institute of Health, Berlin, Germany & Department of Genome Sciences, University of Washington

Nonsynonymous variants

  • CALM1: Predict the effect of calmodulin variants in a yeast growth assay.

    Calmodulin is a calcium-sensing protein that modulates the activity of a large number of proteins in the cell. It is involved in many cellular processes, and is especially important for neuron and muscle cell function. Variants that affect calmodulin function have been found to be causally associated with cardiac arrhythmias. A large library of calmodulin missense variants was assessed with respect to their effects on protein function using a high-throughput yeast complementation assay. The challenge is to predict the functional effects of these calmodulin variants on competitive growth in a high-throughput yeast complementation assay.

    Data provided by: Frederick "Fritz" Roth, University of Toronto

  • PCM1: Predict whether missense mutations within the PCM1 gene impact zebrafish ventricular area development.

    The PCM1 (Pericentriolar Material 1) gene is a component of centriolar satellites occurring around centrosomes in vertebrate cells. Several studies have implicated PCM1 variants as a risk factor for schizophrenia. Ventricular enlargement is one of the most consistent abnormal structural brain findings in schizophrenia Therefore 38 transgenic human PCM1 missense mutations implicated in schizophrenia were assayed in a zebrafish model to determine their impact on the posterior ventricle area. The challenge is to predict whether variants implicated in schizophrenia impact zebrafish ventricular area.

    Data provided by: Nicholas Katsanis, Duke University

  • Frataxin: Predict the impact of variants of Frataxin protein on thermodynamic stability.

    Fraxatin is a highly-conserved protein found in prokaryotes and eukaryotes that is required for efficient regulation of cellular iron homeostasis. Humans with a frataxin deficiency have the cardio- and neurodegenerative disorder Friedreich's ataxia. A library of eight missense variants was assessed by near and far-UV circular dichroism and intrinsic fluorescence spectra to determine thermodynamic stability at different concentration of denaturant. These were used to calculate a ΔΔGH20 value, the difference in unfolding free energy (ΔGH20) between the mutant and wild-type proteins for each variant. The challenge is to predict ΔΔGH20 for each frataxin variant.

    Data provided by: Roberta Chiaraluce and Valerio Consalvi, Sapienza University, Rome

  • TPMT and p10: Predict the effect of variants on TPMT and p10 protein stability.

    The gene p10 encodes for PTEN (Phosphatase and TEnsin Homolog), an important secondary messenger molecule promoting cell growth and survival through signaling cascades including those controlled by AKT and mTOR. Thiopurine S-methyl transferase (TPMT) is a key enzyme involved in the metabolism of thiopurine drugs and functions by catalyzing the S-methylation of aromatic and heterocyclic sulfhydryl groups. A library of thousands of PTEN and TPMT mutations was assessed to measure the stability of the variant protein using a multiplexed variant stability profiling (VSP) assay, which detects the presence of EGFP fused to the mutated PTEN and TPMT protein respectively. The stability of the variant protein dictates the abundance of the fusion protein and thus the EGFP level of the cell. The challenge is to predict the effect of each variant on TPMT and/or PTEN protein stability.

    Data provided by: Kenneth Matreyek, Lea Starita, and Doug Fowler, University of Washington

  • Annotate all nonsynonymous variants:Predict impact of all nonsynonymous variants in the genome.

    dbNSFP describes 810,848,49 possible protein-altering variants in the human genome. The challenge is to predict the functional effect of every such variant. For the vast majority of these missense variants, the functional impact is not currently known, but experimental and clinical evidence are accruing rapidly. Rather than drawing upon a single discrete dataset as typical with CAGI, predictions will be assessed by comparing with experimental or clinical annotations made available after the prediction submission date, on an ongoing basis. if predictors assent, predictions will also incorporated into dbNSFP.

    Data provided by: Xiaoming Liu from the University of Texas School of Public Health

Classification of variants in breast cancer cases and controls

  • CHEK2: Predict the probability of an individual with a given CHEK2 variant gene being in the case (breast cancer) or control cohorts.

    Variants in the CHEK2 gene are associated with breast cancer. This challenge includes CHEK2 gene variants from approximately 1200 Latino breast cancer cases and 1200 ethnically matched controls. This challenge is to estimate the probability of each gene variant occurring in an individual from the cancer affected cohort.

    Data provided by: Elad Ziv, University of California, San Francisco

Splicing

  • MaPSy: Identify the alleles causing splicing defects and estimate their effects on splicing in a Massively Parallel Splicing Assay.

    The Massively Parallel Splicing Assay (MaPSy) approach was used to screen 797 reported exonic disease mutations using a mini-gene system, assaying both in vivo via transfection in tissue culture, and in vitro via incubation in cell nuclear extract. The challenge is to predict the degree to which a given variant causes changes in splicing.

    Data provided by: Will Fairbrother, Brown University

  • Vex-seq: Predict effect of variants on exon splicing in a high-throughput assay.

    A barcoding approach called Variant exon sequencing (Vex-seq) was applied to assess effect of 2,059 natural single nucleotide variants and short indels on splicing of a globin mini-gene construct transfected into HepG2 cells. This is reported as ΔΨ (delta PSI, or Percent Spliced In), between the variant Ψand the reference Ψ. The challenge is to predict ΔΨ for each variant.

    Data provided by: Brenton R. Graveley, UConn Health, Farmington

Clinical genomes

  • SickKids clinical genomes: Match the patients’ genome to their clinical descriptions and predict the causal pathogenic variants.

    This challenge involves 30 children with suspected genetic disorders who were referred for clinical genome sequencing. Predictors are given the 30 genome sequences, and are also provided with the phenotypic descriptions as shared with the diagnostic laboratory. The challenge is to predict what class of disease is associated with each genome, and which genome corresponds to which clinical description. Predictors may additionally identify the diagnostic variant(s) underlying the predictions, and identify predictive secondary variants conferring high risk of other diseases whose phenotypes are not reported in the clinical descriptions.

    Data provided by: Stephen Meyn & colleagues, SickKids