Critical Assessment of Genome Interpretation

Distinguish between exomes of Crohn's disease patients and healthy individuals

Challenge: eQTL-causal SNPs

Dataset description: public

Variant data: registered users only, limited by CAGI Data Use Agreement

Last updated: 14 April 2016

This challenge closed at 9:00 PM PST (Pacific Standard Time) on 9 December 2015.

Download answer key, predictions, and assessment: registered users only, limited by CAGI Data Use Agreement. The answer key, predictions, and assessment files are accessible to registered users only, and their use is limited by the CAGI Data Use Agreement. Please log in access the file.

Presentations from the CAGI 4 conference: registered users only, limited by CAGI Data Use Agreement. Presentations are accessible to registered users only, and their use is limited by the CAGI Data Use Agreement. Please log in to access the file.

Background

Crohn’s disease (CD [MIM 266600]) is a chronic inflammatory bowel disease (IBD) characterized by relapsing inflammation that can involve any part of the gastrointestinal tract and also extra-intestinal manifestations. It is caused by the complex interplay between an overly active immune system and environmental triggers in genetically susceptible individuals. Results from twin and familial aggregation studies [1], as well as evidence from genome-wide association studies [2,3], have shown that genetic factors play an important role in CD etiology. The disease burden of CD is high, often associated with complications such as intestinal ulcers, fistulas, and increased risk of colon cancer. Accumulating evidence suggests that immune tolerance to normal intestinal bacteria is disturbed in genetically susceptible individuals [4], leading to a pathogenic inflammatory response. NOD2 (nucleotide-binding oligomerization domain 2; alias, CARD15) was the first susceptibility gene identified for Crohn's disease and is involved in the innate immune response directed against components of the bacterial cell wall [5,6]. To date, 163 genetic susceptibility loci have been identified for IBD with 30 loci exclusive to CD, 23 to ulcerative colitis (UC), and 110 shared by the two [3]. Early-onset cases of IBD, with an age of onset before 10, often show a more severe disease course with a higher risk of complications, and genetic factors likely play a larger role in these individuals.

The exome sequence variants provided here were collected in Germany from unrelated CD patients and unrelated healthy individuals. Some of the CD patients developed the disease before their 10th year of life, corresponding to early-onset IBD [7].

Prediction challenge

With the provided exome data, identify which individuals have Crohn's disease and which individuals are healthy. In addition, identify which of the patients developed CD before age 10. Data from the Crohn's disease challenges in CAGIs 2011 and 2013 may be used for training. In addition, we provide age of onset data for the previous challenges, below.

Dataset description

Exome sequencing was performed using the TruSeq exome enrichment kit (Illumina) and the Illumina HiSeq2000 instrument. Reads were mapped to the human genome build hg19, and variants were called for all 111 exomes together using the Genome Analysis Toolkit (GATK version 3.3-0) Haplotype Caller. Variant calls were restricted to the TruSeq exome target. GATK was also used for variant quality score recalibration, and only high quality variants passing the filters were retained. Further details of data processing are available in the header of the provided vcf file.

Download dataset: This dataset file is available only to registered users. Please log in to access the file.

Download age of onset training data: This dataset file is available only to registered users. Please log in to access the file.

Download submission template: This submission template file is available only to registered users. Please log in to access the file.

Download validation script: This submission validation script is available only to registered users. Please log in to access the file.

Prediction submission format

The prediction submission is a tab-delimited text file. Organizers provide a file template, which should be used for submission. In addition, a validation script will be provided, and predictors should check the correctness of the format before submitting their predictions.

In the submitted file, each row should include the following tab-separated fields:

individual - The ID number of the individual
disease_status - The probability that this individual has Crohn's disease. The probability should be a value between 0 and 1, 0 meaning healthy and 1 meaning Crohn's patient.
standard_deviation - This defines the confidence of the prediction in column 2. Large SD means low confidence, while small SD means that the predictor is confident about the submitted prediction.
disease_onset - The probability that this individual had early-onset Crohn's disease. The probability should be a value between 0 and 1, 0 meaning late-onset or healthy, and 1 meaning early-onset Crohn's disease.
standard_deviation - This defines the confidence of the prediction in column 4
comments - Optional brief comments on the basis of the predictions in columns 2 and 4

In the template file, cells in columns 2-6 are marked with a "*". Submit your predictions by replacing the "*" with your value. No empty cells are allowed in the submission. You may enter predictions of disease status, disease onset, or both subchallenges. If you choose not to enter predictions for a subchallenge, leave the "*" in those cells. Within a subchallenge, you must enter a prediction and standard deviation for every individual; if you are not confident in a prediction for an individual, enter a large standard deviation for the prediction. Optionally, enter brief comments indicating the basis of the predictions; otherwise, leave the "*" in these cells. Please make sure you follow the submission guidelines strictly.

In addition, your submission should include a detailed description of the method used to make the predictions, similar in style to the Methods section in a scientific article. This information will be submitted as a separate file.

To submit predictions, you need to create or be part of a CAGI User group. Submit your predictions by accessing the link: "All submission forms" from the front page of your group. For more details, please read the FAQ page.

References

Halme, L. et al. Family and twin studies in inflammatory bowel disease. World J. Gastroenterol. 12, 3668–72 (2006).
Franke, A. et al. Genome-wide meta-analysis increases to 71 the number of confirmed Crohn’s disease susceptibility loci. Nat. Genet. 42, 1118–25 (2010).
Jostins, L. et al. Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature 491, 119–24 (2012).
Cho, J. H. The genetics and immunopathogenesis of inflammatory bowel disease. Nat. Rev. Immunol. 8, 458–66 (2008).
Hugot, J. P. et al. Association of NOD2 leucine-rich repeat variants with susceptibility to Crohn’s disease. Nature 411, 599–603 (2001).
Ogura, Y. et al. A frameshift mutation in NOD2 associated with susceptibility to Crohn’s disease. Nature 411, 603–6 (2001).next-generation DNA sequencing data. Genome Res. 20, 1297–303 (2010).
Uhlig, H. H. et al. The Diagnostic Approach to Monogenic Very Early Onset Inflammatory Bowel Disease. Gastroenterology 147, 990–1007.e3 (2014).

Dataset provided by