CAGI Scientist Opening: apply here ☀️

CAGI6 Challenge

Summer-Winter 2021

Thirteen challenges announced. All challenges closed.

Clinical Genomes, Transcriptomes and Gene Panels

1. Rare Genomes Project closed

Identify diagnostic variants in children with rare disease from the Rare Genomes Project

The Rare Genomes Project (RGP) is a direct-to-participant research study on the utility of genome sequencing for rare disease diagnosis and gene discovery. The study is led by genomics experts and clinicians at the Broad Institute of MIT and Harvard. Research subjects are consented for genomic sequencing and the sharing of their sequence and phenotype information with researchers working to understand the molecular causes of rare disease. When a candidate disease variant believed to be related to the phenotype is identified, the variant is confirmed with Sanger sequencing in a clinical setting and returned to the participant via his or her local physician. In this challenge, whole genome sequence data and phenotype data from a subset of the solved and unsolved RGP families will be provided. Participants in the challenge will try to identify the causative variant(s) in each case. For the unsolved cases, prioritized variants from the participating teams will be examined to see if additional diagnoses can be made.

Data provided by: Heidi Rehm, Anne O’Donnell-Luria, Melanie O’Leary, Broad Institute of MIT and Harvard

2. Intellectual Disability Panel closed

Predict patients’ clinical descriptions and pathogenic variants from gene panel sequences

The objective in this challenge is to predict a patient’s clinical phenotype and the causal variant(s) based on their gene panel sequences. Sequence data for 74 genes from a cohort of 500 patients with a range of neurodevelopmental presentations (intellectual disability, autistic spectrum disorder, epilepsy, microcephaly, macrocephaly, hypotonia, ataxia) has been made available for this challenge. Additional data from 150 patients from the same clinical study is available for training and validation.

Data provided by: Alessandra Murgia, Emanuela Leonardi, Maria Cristina Aspromonte, University of Padova

3. SickKids clinical genomes and transcriptomes closed

Predict patients’ pathogenic variants from their genome and transcriptome

This challenge involves data from 79 children who were referred to The Hospital for Sick Children’s (SickKids) Genome Clinic for genome sequencing because of suspected but undiagnosed genetic disorders. Research subjects are consented for sharing of their sequence data and phenotype information with researchers working to understand the molecular causes of rare disease. When a candidate disease variant believed to be related to the phenotype is identified, the variant is adjudicated and confirmed in a clinical setting. In this challenge, transcriptomic and phenotype data from a subset of the “solved” (diagnosed) and “unsolved” SickKids patients will be provided, along with corresponding genomic sequence data. The challenge is to use a transcriptome-driven approach to identify the gene(s) and molecular mechanisms underlying the phenotypic descriptions in each case. For the unsolved cases, prioritized variants from the participating teams will be examined to see if additional diagnoses can be made based on the predictions from the CAGI community.

Data provided by: Stephen Meyn, Christian Marshall, Gregory Costain, Michael Wilson, Lianna G. Kyriakopoulou, Kyoko Yuki and Huayun Hou on behalf of the SickKids Genome Clinic, a project of the Centre for Genetic Medicine of the Hospital for Sick Children, Toronto.

Polygenic Risk Scores

1. Polygenic Risk Scores closed

Predict common disease phenotypes from individuals’ genotypes

Polygenic risk scores (PRS) have potential clinical utility for risk surveillance, prevention and personalized medicine. Participants will be provided with datasets of four real phenotypes (Type 2 Diabetes, Breast Cancer, Inflammatory Bowel Disease and Coronary Artery Disease) and of thirty simulated phenotypes representing a range of genetic architectures of common polygenic diseases. The challenge is to predict the disease outcomes of individuals in held-out validation cohorts.

Data provided by: Sung Chun and Shamil Sunyaev, Harvard Medical School

Nonsynonymous Variants

1. HMBS closed

Predict missense variant effects on hydroxymethylbilane synthase as measured by a yeast complementation assay

Hydroxymethylbilane synthase (HMBS), also known as porphobilinogen deaminase (PBGD) or uroporphyrinogen I synthase, is an enzyme involved in heme production. In humans, variants that affect HMBS function result in acute intermittent porphyria (AIP), an autosomal dominant genetic disorder caused by a build-up of porphobilinogen in the cytoplasm. A large library of HMBS missense variants was assessed with respect to their effects on protein function using a high-throughput yeast complementation assay. The challenge is to predict the functional effects of these variants.

Data provided by: Warren van Loggerenberg, Jochen Weile, Song Sun, and Fritz Roth, University of Toronto

2. CaM closed

Predicting the effects of disease-associated variants on the stability of calmodulin

Calmodulin (CaM) is a ubiquitous calcium (Ca2+) sensor protein interacting with more than 200 molecular partners, thereby regulating a variety of biological processes. Missense point mutations in the genes encoding CaM have been associated with ventricular tachycardia and sudden cardiac death. A library encompassing up to 17 point mutations was assessed by far-UV circular dichroism (CD) by measuring melting temperature (Tm) and percentage of unfolding (%unfold) upon thermal denaturation at pH and salt concentration that mimic the physiological conditions. The challenge is to predict: (1) the Tm and %unfold values for isolated CaM variants under Ca2+-saturating conditions (Ca2+-CaM) and in the Ca2+-free (apo) state; (2) whether the point mutation stabilizes or destabilizes the protein (based on Tm and %unfold).

Data provided by: Giuditta Dal Cortivo and Daniele Dell’Orco, University of Verona, Italy

3. Annotate All Missense closed

Predict pathogenicity of all nonsynonymous variants in the genome

dbNSFP currently describes 81,782,923 possible protein-altering variants in the human genome. The challenge is to predict the functional effect of every such variant. For the vast majority of these missense and nonsense variants, the functional impact is not currently known, but experimental and clinical evidence is accruing rapidly. Rather than drawing upon a single discrete dataset as typical with CAGI, predictions will be assessed by comparing with experimental or clinical annotations made available after the prediction submission date, on an ongoing basis. If predictors assent, predictions will also be incorporated into dbNSFP.

Data provided by: Xiaoming Liu, University of South Florida

4. STK11 closed

Predict the effect of somatic missense mutations on the activity and function of Serine/Threonine Kinase

Serine/Threonine Kinase 11 (STK11) is considered a master kinase that functions as a tumor suppressor and nutrient sensor within a heterotrimeric complex with pseudo-kinase STRAD-alpha and structural protein MO25. Germline variants resulting in loss of STK11 define Peutz-Jaghers Syndrome, an autosomal dominant cancer predisposition syndrome marked by gastrointestinal hamartomas and freckling of the oral mucosa. Somatic loss of function variants, both nonsense and missense, occur in 15-30% of non-small cell lung adenocarcinomas, where they correlate clinically with insensitivity to anti-PD1 monoclonal antibody therapy. The challenge is to predict the impact on STK11 function for each missense variant in relation to wildtype STK11.

Data provided by: David Seward, University of Vermont

5. MAPK1 closed

Stability and catalytic efficiency of mitogen-activated protein kinase 1

MAPK1 (ERK2) is active as serine/threonine kinase in the Ras-Raf-MEK-ERK signal transduction cascade that regulates cell proliferation, transcription, differentiation, and cell cycle progression. MAPK1 is activated by phosphorylation which occurs with strict specificity by MEK1/2 on Thr185 and Tyr187, and may also act as a transcriptional repressor independent of its kinase activity. A library of eleven missense variants selected from the COSMIC database was assessed by near and far-UV circular dichroism and intrinsic fluorescence spectra to determine thermodynamic stability at different concentrations of denaturant. These data were used to calculate a ΔΔGH20 value; i.e., the difference in unfolding free energy ΔGH20 between each variant and the wildtype protein, both in phosphorylated and unphosphorylated forms. The challenge is to predict these two ΔΔGH20 values and the catalytic efficiency (kcat/km)mut/(kcat/km)wt, as determined by a fluorescence assay, of the phosphorylated form for each MAPK1 variant.

Data provided by: Roberta Chiaraluce and Valerio Consalvi, Sapienza University, Rome, Italy.

6. MAPK3 closed

Stability and catalytic efficiency of mitogen-activated protein kinase 3

MAPK3 (ERK1) is active as serine/threonine kinase in the Ras-Raf-MEK-ERK signal transduction cascade that regulates cell proliferation, transcription, differentiation, and cell cycle progression. MAPK3 is activated by phosphorylation which occurs with strict specificity by MEK1/2 on Thr202 and Tyr204, and may also act as a transcriptional repressor independent of its kinase activity. A library of twelve missense variants selected from the COSMIC database was assessed by near and far-UV circular dichroism and intrinsic fluorescence spectra to determine thermodynamic stability at different concentrations of denaturant. These data were used to calculate a ΔΔGH20 value; i.e., the difference in unfolding free energy ΔGH20 between each variant and the wildtype protein, both in phosphorylated and unphosphorylated forms. The challenge is to predict these two ΔΔGH20 values and the catalytic efficiency (kcat/km)mut/(kcat/km)wt, as determined by a fluorescence assay, of the phosphorylated form for each MAPK3 variant.

Data provided by: Roberta Chiaraluce and Valerio Consalvi, Sapienza University, Rome, Italy.

Nonsynonymous Variants: Epistasis

1. MTHFR closed

Predict effects of missense variants and their A222V dependence for methylenetetrahydrofolate reductase 

Methylenetetrahydrofolate reductase (MTHFR) catalyzes the production of 5-methyltetrahydrofolate, which is needed for conversion of homocysteine to methionine. Humans with variants affecting MTHFR function present with a wide range of phenotypes, including homocystinuria, homocysteinemia, developmental delay, severe mental retardation, psychiatric disturbances, and late-onset neurodegenerative disorders. A further complication to interpretation of variants in this gene is a common variant, Ala222Val, carried by a large fraction of the human population. A large library of MTHFR missense variants was assessed with respect to their effects on protein function using a high-throughput yeast complementation assay. The challenge is to predict the functional effects of these variants in two different settings: (1) for the wildtype protein, and (2) for the protein with the common Ala222Val variant.

Data provided by: Jochen Weile, Song Sun, Warren van Loggerenberg, and Fritz Roth, University of Toronto

Splicing

1. Splicing VUS closed

Predicting splicing disruption from variants of unknown significance 

Variants causing aberrant splicing have been implicated in a range of common and rare disorders, including retinitis pigmentosa, autism spectrum disorder, amyotrophic lateral sclerosis, and a variety of cancers. However, such variants are frequently overlooked by diagnostic sequencing pipelines, leading to missed diagnoses for patients. Clinically ascertained variants of unknown significance underwent whole-blood based RT-PCR to test for impact on splicing. The challenge is to predict which of the tested variants disrupt splicing.

Data provided by: Diana Baralle, Jenny Lord, Andrew Douglas, Htoo Wai, and David Bunyan, University of Southampton

Variants encountered in a laboratory for clinical testing 

1. Sherloc clinical classification closed

Predict the classification of novel variants encountered across genetic tests conducted by Invitae

Invitae is a genetic testing company that publishes their variant interpretations to ClinVar. In this challenge, over 122,000 previously uncharacterized variants are provided, spanning the range of effects seen in the clinic. Following the close of this challenge, Invitae will submit their interpretations for these variants to ClinVar. Predictors are asked to interpret the pathogenicity of these variants, and the clinical utility of predictions will be assessed across multiple categories by Invitae.

Data provided by: Rachel Hovde, Naomi Fox, Alex Colavin, Kathryn Hatchell, John Garcia, Yuya Kobayashi, Rebecca Truty, and Keith Nykamp, Invitae

Last updated: December 31, 2021