Welcome to the CAGI experiment!

The Critical Assessment of Genome Interpretation (CAGI, \'kā-jē\) is a community experiment to objectively assess computational methods for predicting phenotypic impacts of genomic variation and to inform future research directions. In this assessment, participants are provided genetic variants and make predictions of resulting phenotype. These predictions are evaluated against experimental characterizations by independent assessors. The CAGI experiment culminates with a community workshop and publications to disseminate results.

The CAGI Goals are:

  1. To evaluate the capability of state-of-the art methods to make useful predictions of molecular, cellular, or organismal phenotypes from genomic data.
  2. To identify bottlenecks in genome interpretation that suggest especially critical areas of future research.
  3. To highlight innovations.
  4. To standardize the field by suggesting appropriate assessment methods and defining what is required for accurate prediction.
  5. To engage and connect researchers from the diverse disciplines whose expertise is essential to methods for genome interpretation.

We are very pleased to announce that CAGI 5 will launch very soon rolling out an initial half-dozen challenge, and more challenges to follow.

We currently anticipate releasing imminently the following challenges:

Nonsynonymous variants

  • CALM1: Predict the effect of protein variants in a yeast growth assay.
  • Frataxin: Predict the impact of variants of Frataxin protein on thermodynamic stability.
  • TPMT and p10: Predict the effect of variants on TPMT and p10 protein stability.
  • Annotate the whole genome: Predict impact of all nonsynonymous variants in the genome.
  • PCM1: Predict the effect of missense mutations within the PCM1 gene on zebrafish development.
Regulatory variants
  • Expression variants: Predict effects of non-coding variants in disease associated promoter and enhancer elements in a massivly parallel reported assay.
  • MaPSy: Identify causal variants and estimate their effects on splicing in a Massively Parallel Splicing Assay.
  • Vex-seq: Predict the change in splicing from reference exons from data originated from a high-throughput assay.

CAGI publications and presentations
There will be a CAGI special issue with papers in the journals Human Mutation and the Annals of Human Genetics. A flagship manuscript is under preparation. A list of past and future presentations about CAGI is available here with downloadable posters and slides.

The acquisition of large numbers of personal genomes has long been the aspiration of genomics researchers, and sequencing technologies promise to make this affordable within the next several years. Already, large-scale genotyping arrays are widely used in research and retail DNA tests of genetic markers have captured the public’s imagination. Unfortunately, personal genomes presently have limited research, or medical value due to a variety of scientific, technical, legal, sociological, and ethical challenges. Yet, whole genomes are also providing tremendous breakthroughs in basic science, such as revealing the genetic basis of Mendelian diseases that had proven refractory to traditional genetics for decades, and helping to unravel the mechanisms by which cancer emerges and evolves.

The CAGI experiment is timely and of wide relevance because of the burgeoning availability of individuals’ genomes, and the desire to interpret them for research and clinical applications. Currently, the field lacks a consensus on the absolute and relative suitability of the panoply of different methods for prediction of the phenotypic impact of genomic variation. The results from CAGI will help the broader community understand the appropriate level of confidence they should have in variant prediction methods, and which classes of approaches are most suitable to a particular application.

CAGI follows in the spirit of the long-running Critical Assessment of Structure Prediction (CASP). Organizers are in the process of collecting unpublished genomic data with associated phenotype characteristics. During the prediction season, participating groups will submit predictions in these areas based on data provided. The prediction accuracy will be evaluated by assessors and results will be revealed at the CAGI conference.