CAGI Scientist Opening: apply here ☀️

Variants of BRCA1 and BRCA2: predict which variants are associated with increased risk of breast cancer by ENIGMA

Challenge: ENIGMA

Dataset availability: registered users only

Last updated: 27 April 2018

This challenge is closed.

Make sure you understand our Data Use Agreement and Anonymity Policy


Breast cancer is the most prevalent cancer among women worldwide. The association between germline mutations in the BRCA1 and BRCA2 genes and the development of cancer has been well established. The most common high-risk mutations associated with breast cancer are those in the autosomal dominant breast cancer genes 1 and 2 (BRCA1 and BRCA2). Mutations in these genes are found in 1-3% of breast cancer cases. The challenge is to predict which variants are associated with increased risk for breast cancer.


In normal cells, the BRCA1 and BRCA2 genes are involved in homologous recombination for double strand break repair. Mutations in these genes have been linked to development of breast and ovarian cancer (Rehm et al., 2015). The ENIGMA consortium ( is an international consortium focused on determining the clinical significance of sequence variants in BRCA1, BRCA2 and other known or suspected breast cancer related genes, providing expert input to global database and classification initiatives, and exploring optimal avenues of communication of such information at the provider and patient level.

Variants included in the dataset have been classified according to the IARC 5-tier classification scheme using multifactorial likelihood analysis. The procedure assesses clinically-calibrated bioinformatics information and clinical information (pathology, segregation, co-occurrence, family history, case-control) for each variant to produce a likelihood of pathogenicity. Likelihood values were calibrated against the features of known high-risk cancer-causing variants in BRCA1/2 (Goldgar et al., 2008; Plon et al., 2008). Each mutation is assigned to one of five classes depending in the pathogenicity likelihood, as shown in the table. A combination of public and unpublished information has been used to arrive at the final classifications, and all the classifications provided in the dataset for this challenge are either new or improved compared to what is in the public domain..

These data will be included in a publication being developed by the ENIGMA consortium.

Class                                         Probability of Pathogenicity

5: Pathogenic                          >0.99

4: Likely pathogenic               0.95-0.99

3: Uncertain                             0.05-0.949

2: Likely not pathogenic        0.001-0.049

1: Not pathogenic                   <0.001

Prediction challenge

For each variant, participants are asked to submit predictions on the probability that variant is pathogenic according to the ENIGMA consortium classifications. Optionally, a comment on the basis of the prediction may be given.

Prediction submission format 

The prediction submission is a tab-delimited text file. Organizers provide a template file, which must be used for submission. Five columns are designated for the probability of assignment to each class (Not pathogenic, Likely not pathogenic, Uncertain, Likely pathogenic, and Pathogenic). Each column must contain a probability between zero and 1.0, and the five probabilities must sum to 1.0. Optionally, a comment on the basis of the prediction may be given in the last column. In addition, a validation script is provided, and predictors must check the correctness of the format before submitting their predictions. In the submitted file, each row includes the following columns: note Please note that we are asking participants to predict the probability (and confidence) of only the Pathogenic class as indicated in the submission template file.

In the template file, cells in columns 4-6 are marked with a "*". Submit your predictions by replacing the "*" with your value. No empty cells are allowed in the submission. You must enter a probability for each pathogenicity class for every mutant. Optionally, enter a brief comment on the basis of the prediction, otherwise, leave the "*" in these cells. Please make sure you follow the submission guidelines strictly.

In addition, your submission must include a detailed description of the method used to make the predictions, similar to the style of the Methods section in a scientific article. This information must be submitted as a separate file.

To submit predictions, you need to create or be part of a CAGI User group. Submit your predictions by accessing the link "All submission forms" from the front page of your group. For more details, please read the FAQ page.


Predictors will not be assessed directly against the numeric probability of the ENIGMA classification. Rather, the ENIGMA classification will be used as a weight to assess CAGI predictions. Based on the ENIGMA classification variants will receive the following weights in in the assessment:

Class                                         Weights

5: Pathogenic                          High weight

4: Likely pathogenic               Medium weight

3: Uncertain                             No weight

2: Likely not pathogenic        Medium weight

1: Not pathogenic                   High weight

Download dataset: This dataset file is available only to registered users. Please log in to access the file.


Goldgar DE, et al. Genetic evidence and integration of various data sources for classifying uncertain variants into a single model. Hum Mutat (2008) 29(11):1265-1272. PubMed 

Plon SE, et al. Sequence variant classification and reporting: Recommendations for improving the interpretation of cancer susceptibility genetic test results. Hum Mutat (2008) 29(11):1282-1291. PubMed 

Rehm HL, et al. ClinGen—the clinical genome resource. New Engl J Med (2015) 372(23):2235-2242. PubMed 

Data provided by 

Amanda Spurdle, QIMR Berghofer Medical Research Institute (Australia), and the ENIGMA consortium.

Revision history 

20 December 2017: Initial release 

26 January 2018: Note added 

1 May 2018: Challenge closed

24 September 2018: Dataset availability added