CAGI Scientist Opening: apply here ☀️

Predict the pathogenicity of rare variants in MRE11 and NBS1, two proteins in the MRN complex

Dataset description: public

Exome sequence data: registered users only, limited by CAGI Data Use Agreement

This challenge closed on 31 October 2012. There was an optional extra challenge open for just one week: 8 May to 15 May, 2013 (see below).

MRN Challenge answer key (43.5 KB, xlsx): registered users only, limited by CAGI Data Use Agreement

Slides from the CAGI conference: registered users only, limited by CAGI Data Use Agreement

      Roland Dunbrack: Predictor Talk (5 MB, remixable ppt)

Predictions (3.9 MB, zip): registered users only, limited by CAGI Data Use Agreement


Genomes are subject to constant threat by damaging agents that generate DNA double-strand breaks (DSBs). The Mre11–Rad50–Nbs1 (MRN) complex plays important roles in detection and signaling of DSBs, as well as in the repair pathways of homologous recombination and non-homologous end-joining. The importance of Mre11-Rad50-Nbs1 complex in the cellular response to DNA double-strand breaks was initially revealed by ataxia telangiectasia-like disorder and Nijmgen breakage syndrome.

Prediction challenge

Predict probability of pathogenicity (a number between 0 and 1) for individual rare variants of MRE11 and NBS1.

Dataset Information

Mutation screening of MRE11 and NBS1 genes from a series of approximately 1300 breast cancer cases and 1100 controls. The dataset is comprised of two separate files for the two proteins listing 42 mutations for MRE11 and 44 mutations for NBS1 in each. In each file the columns are as follows:

Note: Some of these variants may fail sequence variant verification. These will be dropped from the analysis. Also, some of these variants may have been observed in ethnic minorities and may have continental-level (non-CEU) ethnic group allele frequencies of >1%. Some of the variants that may fall into this category are marked by ** in a rightmost column. Any that do fall into the category will also be dropped.

The Datasets are only available for registered users, please log in to access the data.

Prediction submission format 

Two submission templates, corresponding to each of the two proteins, are provided. In each tab-delimited text file, in addition to the same three columns of the dataset files, there are two additional columns 4 and 5:

4. Probability (P-case) - The probability of individuals with a given variant being in the cancer case set (Range: 0-1. It may be helpful to note that a probability of 0.5 would indicate that the mutation is neutral — equally likely to appear in both populations) 

5. Standard deviation (SD) - This defines the confidence of the prediction in column 4. High SD means low confidence, while small SD means that the predictor is confident about the submitted prediction.

In the template file, all blank cells are marked with an "*". Submit your predictions by replacing the "*" with your value in columns 4 and 5. No empty cells are allowed in the submission; if you cannot submit predictions for a variant, leave the sign "*" in these cells. Please make sure you follow these submission guidelines strictly.

In addition, a validation script is provided, and predictors should check the correctness of the format before submitting their predictions. 

The template file is only available for registered users, please log in to access the file.

In addition, your submission should include a detailed description of the method used to make the predictions (similar to the style of the Methods section in a scientific article). This information will be submitted as a separate file.

To submit predictions, you need to create or be part of a CAGI User group. Submit your predictions by accessing the link: "All submission forms" from the front page of your group. For more details, please read the FAQ page.

Prediction Assessment 

It is planned that the assessment of predictions will employ a logistic regression likelihood ratio test of the status of each subject (case/ control) against the predicted probability of pathogenicity of the variant(s) that they carry. Predictions may also be assessed by calculated odds ratios and ROC areas.

Optional Extra challenge (8-May to 15-May only): New variants of MRE11 and NBS1 

Optional extra challenge only available for registered users, please log in to access the file.


Additional information 

The set of subjects sequenced, and the planned analysis, is similar to that described in: Le Calvez-Kelm F, Lesueur F, Damiola F, Vallée M, Voegele C, Babikyan D, Durand G, Forey N, McKay-Chopin S, Robinot N, Nguyen-Dumont T, Thomas A, Byrnes GB; Breast Cancer Family Registry, Hopper JL, Southey MC, Andrulis IL, John EM, Tavtigian SV. Rare, evolutionarily unlikely missense substitutions in CHEK2 contribute to breast cancer susceptibility: results from a breast cancer family registry case-control mutation-screening study. Breast Cancer Res. 2011 Jan 18;13(1):R6. doi: 10.1186/bcr2810.

A related manuscript on the ATM gene might also be useful: Tavtigian SV, Oefner PJ, Babikyan D, et al. Rare, Evolutionarily Unlikely Missense Substitutions in ATM Confer Increased Risk of Breast Cancer. Journal of Human Genetics. 2009:427-446. doi: 10.1016/j.ajhg.2009.08.018

The Rad50 challenge of CAGI 2011 is available here. Results from the Rad50 challenge discussion are available in the CAGI 2011 meeting presentations.

Data Provider

Sean V Tavtigian, University of Utah


This challenge is being assessed by Sean V Tavtigian, University of Utah