RAD50 variants in breast cancer patients and controls

Dataset description: public
Dataset: public

The CAGI submission deadline for the RAD50 challenge has passed as of October 7, 2011, 3PM EDT. We welcome the upload of predictions after the deadline for archival and comparison purposes, but these post-deadline submissions are not part of the CAGI experiment.

Background: RAD50 is a candidate intermediate-risk breast cancer susceptibility gene. The RAD50 data provided for CAGI challenge include a list of potentially interesting sequence variants observed from sequencing RAD50 gene in about 1,400 breast cancer cases and 1,200 ethnically matched controls. Variants in the list were observed between 1 and 20 times.

Challenge: Predict the probability of the variant occurring in a case individual. The prediction should be a numeric value P(case) with standard deviation.
Note that in the 2010 experiment with CHEK2 variants, CAGI participants tended to not properly consider the likely distribution of neutral mutations. A probability of 0.5 would indicate that the mutation is neutral (equal in both populations) while a probability of less than 0.5 would be indicative of a variant that is actually protective.

Download dataset



Added 30 Sep 2011: The file above includes the number of times each variant was observed.

The RAD50 gene ID: NM_005732.3 (Isoform 1, NP_005723, 1312 aa)

Additional information: The set of subjects sequenced, and the planned analysis, is similar to that described in:
Le Calvez-Kelm F, Lesueur F, Damiola F, Vallée M, Voegele C, Babikyan D, Durand G, Forey N, McKay-Chopin S, Robinot N, Nguyen-Dumont T, Thomas A, Byrnes GB; Breast Cancer Family Registry, Hopper JL, Southey MC, Andrulis IL, John EM, Tavtigian SV. Rare, evolutionarily unlikely missense substitutions in CHEK2 contribute to breast cancer susceptibility: results from a breast cancer family registry case-control mutation-screening study. Breast Cancer Res. 2011 Jan 18;13(1):R6.doi: 10.1186/bcr2810.

A related manuscript on the ATM gene might also be useful:
Tavtigian SV, Oefner PJ, Babikyan D, et al. Rare, Evolutionarily Unlikely Missense Substitutions in ATM Confer Increased Risk of Breast Cancer. Journal of Human Genetics. 2009:427-446. doi: 10.1016/j.ajhg.2009.08.018

Prediction submission format: The prediction submission is a tab-delimited text file. Organizers provide a file template, which should be used for submission. In addition, a validation script is provided, and predictors should check the correctness of the format before submitting their predictions.

Download RAD50 submission template
Download RAD50 submission validation script

In the submitted file, each row should include the following columns:

  1. Variant - The variant as listed in the prediction dataset file, use the order as provided in the template file
  2. P(case) - The probability of individuals with a given variant being in the case set (Range: 0-1; Note that a probability of 0.5 would indicate that the mutation is neutral (equal in both populations) while a probability of less than 0.5 would be indicative of a variant that is actually protective.)
  3. Standard deviation - This defines the confidence of the prediction in column 2. High SD means low confidence, while small SD means that the predictor is confident about the submitted prediction.

In the template file, cells in columns 2-3 are marked with an "*". Submit your predictions by replacing the "*" with your value. No empty cells are allowed in the submission; if you cannot submit predictions for a variant, leave the sign "*" in these cells. Please make sure you follow the submission guidelines strictly.

In addition, your submission should include a detailed description of the method used to make the predictions (similar to the style of the Methods section in a scientific article). This information will be submitted as a separate file.

To submit predictions, you need to create or be part of a CAGI User group. Submit your predictions by accessing the link:"All submission forms" from the front page of your group. For more details, please read the FAQ page.

Dataset provided by

Sean Tavtigian, University of Utah