☀️ CAGI6 Challenge ☀️

Summer 2021

☀️ Six challenges announced. One challenge open. Other challenges to be released soon! ☀️

Clinical Genomes and Gene Panels

1. Rare Genomes Project

Identify diagnostic variants in children with rare disease from the Rare Genomes Project

The Rare Genomes Project (RGP) is a direct-to-participant research study on the utility of genome sequencing for rare disease diagnosis and gene discovery. The study is led by genomics experts and clinicians at the Broad Institute of MIT and Harvard. Research subjects are consented for genomic sequencing and the sharing of their sequence and phenotype information with researchers working to understand the molecular causes of rare disease. When a candidate disease variant believed to be related to the phenotype is identified, the variant is confirmed with Sanger sequencing in a clinical setting and returned to the participant via his or her local physician. In this challenge, whole genome sequence data and phenotype data from a subset of the solved and unsolved RGP families will be provided. Participants in the challenge will try to identify the causative variant(s) in each case. For the unsolved cases, prioritized variants from the participating teams will be examined to see if additional diagnoses can be made.

Data provided by: Heidi Rehm, Anne O’Donnell-Luria, Melanie O’Leary, Broad Institute of MIT and Harvard

2. Intellectual Disability Panel

Predict patients’ clinical descriptions and pathogenic variants from gene panel sequences

The objective in this challenge is to predict a patient’s clinical phenotype and the causal variant(s) based on their gene panel sequences. Sequence data for 74 genes from a cohort of 500 patients with a range of neurodevelopmental presentations (intellectual disability, autistic spectrum disorder, epilepsy, microcephaly, macrocephaly, hypotonia, ataxia) has been made available for this challenge. Additional data from 150 patients from the same clinical study is available for training and validation.

Data provided by: Alessandra Murgia, Emanuela Leonardi, Maria Cristina Aspromonte, University of Padova

Nonsynonymous Variants


Predict missense variant effects on hydroxymethylbilane synthase as measured by yeast complementation assay

Hydroxymethylbilane synthase (HMBS), also known as porphobilinogen deaminase (PBGD) or uroporphyrinogen I synthase, is an enzyme involved in heme production. In humans, variants that affect HMBS function result in acute intermittent porphyria (AIP), an autosomal dominant genetic disorder caused by a build-up of porphobilinogen in the cytoplasm. A large library of HMBS missense variants was assessed with respect to their effects on protein function using a high-throughput yeast complementation assay. The challenge is to predict the functional effects of these variants.

Data provided by: Warren van Loggerenberg, Jochen Weile, Song Sun, and Fritz Roth, University of Toronto

2. CaM

Predicting the effects of disease-associated variants on the stability of calmodulin

Calmodulin (CaM) is a ubiquitous calcium (Ca2+) sensor protein interacting with more than 200 molecular partners, thereby regulating a variety of biological processes. Missense point mutations in the genes encoding CaM have been associated with ventricular tachycardia and sudden cardiac death. A library encompassing up to 17-point mutations was assessed by far-UV circular dichroism (CD) by measuring melting temperature (Tm) and percentage of unfolding (%unfold) upon thermal denaturation at pH and salt concentration that mimic the physiological conditions. The challenge is to predict: (1) the Tm and %unfold values for isolated CaM variants under Ca2+-saturating conditions (Ca2+-CaM) and in the Ca2+-free (apo) state; (2) whether the point mutation stabilizes or destabilizes the protein (based on Tm and %unfold).

Data provided by: Giuditta Dal Cortivo and Daniele Dell'Orco, University of Verona, Italy

3. Annotate All Missense

Predict pathogenicity of all nonsynonymous variants in the genome

dbNSFP currently describes 81,782,923 possible protein-altering variants in the human genome. The challenge is to predict the functional effect of every such variant. For the vast majority of these missense and nonsense variants, the functional impact is not currently known, but experimental and clinical evidence is accruing rapidly. Rather than drawing upon a single discrete dataset as typical with CAGI, predictions will be assessed by comparing with experimental or clinical annotations made available after the prediction submission date, on an ongoing basis. If predictors assent, predictions will also be incorporated into dbNSFP.

Data provided by: Xiaoming Liu, University of South Florida

Nonsynonymous Variants: Epistasis


Predict effects of missense variants and their A222V dependence for methylenetetrahydrofolate reductase 

Methylenetetrahydrofolate reductase (MTHFR) catalyzes the production of 5-methyltetrahydrofolate, which is needed for conversion of homocysteine to methionine. Humans with variants affecting MTHFR function present with a wide range of phenotypes, including homocystinuria, homocysteinemia, developmental delay, severe mental retardation, psychiatric disturbances, and late-onset neurodegenerative disorders. A further complication to interpretation of variants in this gene is a common variant, Ala222Val, carried by a large fraction of the human population. A large library of MTHFR missense variants was assessed with respect to their effects on protein function using a high-throughput yeast complementation assay. The challenge is to predict the functional effects of these variants in two different settings: (1) for the wildtype protein, and (2) for the protein with the common Ala222Val variant.

Data provided by: Jochen Weile, Song Sun, Warren van Loggerenberg, and Fritz Roth, University of Toronto

Last updated: May 3, 2021