Predict the pathogenicity of rare variants in MRE11 and NBS1, two proteins in the MRN complex

This challenge closed on 25 April 2013. There was an optional Extra challenge open for just one week: 8 May to 15 May, 2013 (see below).

      Roland Dunbrack : Predictor Talk

Background: Genomes are subject to constant threat by damaging agents that generate DNA double-strand breaks (DSBs). The Mre11–Rad50–Nbs1 (MRN) complex plays important roles in detection and signaling of DSBs, as well as in the repair pathways of homologous recombination and non-homologous end-joining. The importance of Mre11-Rad50-Nbs1 complex in the cellular response to DNA double-strand breaks was initially revealed by ataxia telangiectasia-like disorder and Nijmgen breakage syndrome.

Prediction challenge: Predict probability of pathogenicity (a number between 0 and 1) for individual rare variants of MRE11 and NBS1.

Dataset Information: Mutation screening of MRE11 and NBS1 genes from a series of approximately 1300 breast cancer cases and 1100 controls. The dataset is comprised of two separate files for the two proteins listing 42 mutations for MRE11 and 44 mutations for NBS1 in each. In each file the columns are as follows:

  1. HGVS variant designation.
  2. The protein variant (where applicable)
  3. The number of carriers observed with the variant.

Note: Some of these variants may fail sequence variant verification. These will be dropped from the analysis. Also, some of these variants may have been observed in ethnic minorities and may have continental-level (non-CEU) ethnic group allele frequencies of >1%. Some of the variants that may fall into this category are marked by ** in a rightmost column. Any that do fall into the category will also be dropped.

Prediction submission format:
Two submission templates, corresponding to each of the two proteins, are provided. In each tab-delimited text file, in addition to the same three columns of the dataset files, there are two additional columns 4 and 5:

    4. Probability (P-case) - The probability of individuals with a given variant being in the cancer case set (Range: 0-1. It may be helpful to note that a probability of 0.5 would indicate that the mutation is neutral — equally likely to appear in both populations)
    5. Standard deviation (SD) - This defines the confidence of the prediction in column 4. High SD means low confidence, while small SD means that the predictor is confident about the submitted prediction.

In the template file, all blank cells are marked with an "*". Submit your predictions by replacing the "*" with your value in columns 4 and 5. No empty cells are allowed in the submission; if you cannot submit predictions for a variant, leave the sign "*" in these cells. Please make sure you follow these submission guidelines strictly.

In addition, a validation script is provided, and predictors should check the correctness of the format before submitting their predictions.

In addition, your submission should include a detailed description of the method used to make the predictions (similar to the style of the Methods section in a scientific article). This information will be submitted as a separate file.

Prediction assessment:
It is planned that the assessment of predictions will employ a logistic regression likelihood ratio test of the status of each subject (case/ control) against the predicted probability of pathogenicity of the variant(s) that they carry. Predictions may also be assessed by calculated odds ratios and ROC areas.

Optional Extra challenge (8-May to 15-May only): New variants of MRE11 and NBS1

Additional information:
Dataset Provider:
Sean V Tavtigian

Assessment: This challenge is being assessed by Sean V Tavtigian, University of Utah.