Using exome sequencing data from a family, identify which individual has hypoalphalipoproteinemia (HA)

Dataset description: public
Dataset: registered users only, limited by CAGI Data use agreement

This challenge closed on 25 April 2013.

HA Challenge answer key (23 KB, docx): registered users only, limited by CAGI Data use agreement

Assessor summary (98 KB, docx): registered users only, limited by CAGI Data use agreement

Slides from the CAGI conference : registered users only, limited by CAGI Data use agreement
      Angel Mak: Data Provider (2.5 MB, remixable ppt)
      Shamil Sunyaev: Assessment (5 MB, remixable ppt)
      Lipika Ray: Predictor Talk (1.8 MB, remixable ppt)
      Emanuela Leonardi: Predictor Talk (4.3 MB, remixable ppt)
      Yanay Ofran: Predictor Talk (1.8 MB, remixable ppt)
      Nathaniel Pearson: Predictor Talk (5 MB, remixable ppt)

Predictions (594.5 KB, zip): registered users only, limited by CAGI Data use agreement

Background: Hypoalphalipoproteinemia (HA) http://omim.org/entry/604091 is characterized by severely decreased serum high-density lipoprotein cholesterol (HDL-C) levels and low apolipoprotein A-1 (APOA1). Low HDL-C is a risk factor for coronary artery disease.

Prediction challenge: In a family where one person has HA, predict which individual has HA, as characterized by extremely low serum HDL-C. It may be helpful to know that the affected individual also has additional phenotypes of hepatosplenomegaly, lymphadenopathy, and short stature.

Dataset Information: The dataset contains variant information for the four subjects in the family extracted from exome sequencing data. The exome sequencing data was generated using the Illumina HiSeq2000 platform with the TruSeq exome enrichment protocol and processed using automated pipeline for next-generation sequencing data (Chapman, https://github.com/chapmanb/bcbb/tree/master/nextgen). Reads were aligned to hg19 by BWA and SNPs and Indels were called by GATK and Dindel.

Variant information for the four subjects is given in separate files: HA_13.vcf (daughter), HA_14.vcf (father), HA_15.vcf (son), HA_16.vcf (mother)

The Dataset file is only available for registered users, please log in to access the file.

Prediction submission format: In the template submission file, the first line (row) of the file contains header information, followed by a separate line for each individual. The first column indicates the individual. In Columns 2 and 3, please provide the probability P(abnormal) of that individual having the phenotype of an extreme HDL-C level as well as hepatosplenomegaly, lymphadenopathy, and short stature and the standard deviation thereof (confidence in the prediction of P). The probabilities P should be in the range (0 – 1).

Format of Prediction Submission Template
P (abnormal) Standard deviation
HA-13
HA-14
HA-15
HA-16

Please use the submission file template provided for your submission. In addition, a validation script is provided, and predictors should check the correctness of the format before submitting their predictions.
Download submission template
Download validation script

The submission template file is a tab-delimited, plain text file, where all blank cells are marked with an "*". Please submit your predictions by replacing the "*" with your value. No empty cells are allowed in the submission; if you cannot submit predictions for a patient, please leave the sign "*" in these cells. Please make sure you follow these submission guidelines strictly.

Method description
In addition, your submission should include a detailed description of the method used to make the predictions (similar to the style of the Methods section in a scientific article). This information will be submitted as a separate file.

To submit predictions, you need to create or be part of a CAGI User group. Submit your predictions by accessing the link: "All submission forms" from the front page of your group. For more details, please read the  FAQ page.

Dataset Providers
Cardiovascular Research Institute, University of California San Francisco
Nina Gonzaludo, Paul L.F. Tang, Clive R. Pullinger, Mary J. Malloy, John P. Kane, Pui-Yan Kwok

Nina   Paul   Clive   Mary   John   Pui-Yan

Assessment: This challenge is being assessed by Shamil Sunyaev, Harvard University.