CAGI Scientist Opening: apply here ☀️
Using exome sequencing data from a family, identify which individual has hypoalphalipoproteinemia (HA)
Dataset description: public
Exome sequence data: registered users only, limited by CAGI Data Use Agreement
This challenge closed on 25 April 2013.
Assessor summary (98 KB, doc): registered users only, limited by CAGI Data Use Agreement
Slides from the CAGI conference: registered users only, limited by CAGI Data Use Agreement
Angel Mak: Data Provider (2.5 MB, remixable ppt)
Shamil Sunyaev: Assessment (5 MB, remixable ppt)
Lipika Ray: Predictor Talk (1.8 MB, remixable ppt)
Emanuela Leonardi: Predictor Talk (4.3 MB, remixable ppt)
Yanay Ofran: Predictor Talk (1.8 MB, remixable ppt)
Nathaniel Pearson: Predictor Talk (5 MB, remixable ppt)
Hypoalphalipoproteinemia (HA) http://omim.org/entry/604091 is characterized by severely decreased serum high-density lipoprotein cholesterol (HDL-C) levels and low apolipoprotein A-1 (APOA1). Low HDL-C is a risk factor for coronary artery disease.
In a family where one person has HA, predict which individual has HA, as characterized by extremely low serum HDL-C. It may be helpful to know that the affected individual also has additional phenotypes of hepatosplenomegaly, lymphadenopathy, and short stature.
The dataset contains variant information for the four subjects in the family extracted from exome sequencing data. The exome sequencing data was generated using the Illumina HiSeq2000 platform with the TruSeq exome enrichment protocol and processed using automated pipeline for next-generation sequencing data (Chapman, https://github.com/chapmanb/bcbb/tree/master/nextgen). Reads were aligned to hg19 by BWA and SNPs and Indels were called by GATK and Dindel.
Variant information for the four subjects is given in separate files: HA_13.vcf (daughter), HA_14.vcf (father), HA_15.vcf (son), HA_16.vcf (mother)
The Datasets are only available for registered users, please log in to access the data.
Prediction submission format
In the template submission file, the first line (row) of the file contains header information, followed by a separate line for each individual. The first column indicates the individual. In Columns 2 and 3, please provide the probability P(abnormal) of that individual having the phenotype of an extreme HDL-C level as well as hepatosplenomegaly, lymphadenopathy, and short stature and the standard deviation thereof (confidence in the prediction of P). The probabilities P should be in the range (0-1).
Format of Prediction Submission Template
P (abnormal) Standard deviation
Please use the submission file template provided for your submission. In addition, a validation script is provided, and predictors should check the correctness of the format before submitting their predictions.
Download validation script (not available)
The submission template file is a tab-delimited, plain text file, where all blank cells are marked with an "*". Please submit your predictions by replacing the "*" with your value. No empty cells are allowed in the submission; if you cannot submit predictions for a patient, please leave the sign "*" in these cells. Please make sure you follow these submission guidelines strictly.
In addition, your submission should include a detailed description of the method used to make the predictions (similar to the style of the Methods section in a scientific article). This information will be submitted as a separate file.
To submit predictions, you need to create or be part of a CAGI User group. Submit your predictions by accessing the link: "All submission forms" from the front page of your group. For more details, please read the FAQ page.
Cardiovascular Research Institute, University of California San Francisco
Nina Gonzaludo, Clive R. Pullinger, Paul L.F. Tang, Mary J. Malloy, John P. Kane and Pui-Yan Kwok
This challenge is being assessed by Shamil Sunyaev, Harvard University.