Critical Assessment of Genome Interpretation

Variants of BRCA1 and BRCA2: predict which variants are associated with increased risk of breast cancer by ENIGMA

Challenge: ENIGMA

Dataset availability: registered users only

Last updated: 27 April 2018

This challenge is closed.

Make sure you understand our Data Use Agreement and Anonymity Policy

Summary

Breast cancer is the most prevalent cancer among women worldwide. The association between germline mutations in the BRCA1 and BRCA2 genes and the development of cancer has been well established. The most common high-risk mutations associated with breast cancer are those in the autosomal dominant breast cancer genes 1 and 2 (BRCA1 and BRCA2). Mutations in these genes are found in 1-3% of breast cancer cases. The challenge is to predict which variants are associated with increased risk for breast cancer.

Background

In normal cells, the BRCA1 and BRCA2 genes are involved in homologous recombination for double strand break repair. Mutations in these genes have been linked to development of breast and ovarian cancer (Rehm et al., 2015). The ENIGMA consortium (https://enigmaconsortium.org/) is an international consortium focused on determining the clinical significance of sequence variants in BRCA1, BRCA2 and other known or suspected breast cancer related genes, providing expert input to global database and classification initiatives, and exploring optimal avenues of communication of such information at the provider and patient level.

Variants included in the dataset have been classified according to the IARC 5-tier classification scheme using multifactorial likelihood analysis. The procedure assesses clinically-calibrated bioinformatics information and clinical information (pathology, segregation, co-occurrence, family history, case-control) for each variant to produce a likelihood of pathogenicity. Likelihood values were calibrated against the features of known high-risk cancer-causing variants in BRCA1/2 (Goldgar et al., 2008; Plon et al., 2008). Each mutation is assigned to one of five classes depending in the pathogenicity likelihood, as shown in the table. A combination of public and unpublished information has been used to arrive at the final classifications, and all the classifications provided in the dataset for this challenge are either new or improved compared to what is in the public domain..

These data will be included in a publication being developed by the ENIGMA consortium.

Class Probability of Pathogenicity

5: Pathogenic >0.99

4: Likely pathogenic 0.95-0.99

3: Uncertain 0.05-0.949

2: Likely not pathogenic 0.001-0.049

1: Not pathogenic <0.001

Prediction challenge

For each variant, participants are asked to submit predictions on the probability that variant is pathogenic according to the ENIGMA consortium classifications. Optionally, a comment on the basis of the prediction may be given.

Prediction submission format

The prediction submission is a tab-delimited text file. Organizers provide a template file, which must be used for submission. Five columns are designated for the probability of assignment to each class (Not pathogenic, Likely not pathogenic, Uncertain, Likely pathogenic, and Pathogenic). Each column must contain a probability between zero and 1.0, and the five probabilities must sum to 1.0. Optionally, a comment on the basis of the prediction may be given in the last column. In addition, a validation script is provided, and predictors must check the correctness of the format before submitting their predictions. In the submitted file, each row includes the following columns: note Please note that we are asking participants to predict the probability (and confidence) of only the Pathogenic class as indicated in the submission template file.

Column 1: Gene - BRCA1 or BRCA2
Column 2: DNA Variant - The DNA variant as listed in the prediction dataset file relative to the cDNA. A set of 429 variants (146 in BRCA1 (NM_007294.3); 178 in BRCA2 (NM_000059.3)).
Column 3: Protein Variant - The amino acid substitution where appropriate. UnirProt IDs: BRCA1 (P38398) and BRCA2 (P51587).
Column 4: The probability (P) the variant being pathogenic (0-1). 1 means pathogenic, while 0 means non-pathogenic.
Column 5: The confidence of each prediction probability in the form of a standard deviation (SD). High SD means low confidence, while small SD means that the predictor is confident about the submitted prediction.
Columns 6: Optional comment

In the template file, cells in columns 4-6 are marked with a "*". Submit your predictions by replacing the "*" with your value. No empty cells are allowed in the submission. You must enter a probability for each pathogenicity class for every mutant. Optionally, enter a brief comment on the basis of the prediction, otherwise, leave the "*" in these cells. Please make sure you follow the submission guidelines strictly.

In addition, your submission must include a detailed description of the method used to make the predictions, similar to the style of the Methods section in a scientific article. This information must be submitted as a separate file.

To submit predictions, you need to create or be part of a CAGI User group. Submit your predictions by accessing the link "All submission forms" from the front page of your group. For more details, please read the FAQ page.

Assessment

Predictors will not be assessed directly against the numeric probability of the ENIGMA classification. Rather, the ENIGMA classification will be used as a weight to assess CAGI predictions. Based on the ENIGMA classification variants will receive the following weights in in the assessment:

Class Weights

5: Pathogenic High weight

4: Likely pathogenic Medium weight

3: Uncertain No weight

2: Likely not pathogenic Medium weight

1: Not pathogenic High weight

Download dataset: This dataset file is available only to registered users. Please log in to access the file.

References

Goldgar DE, et al. Genetic evidence and integration of various data sources for classifying uncertain variants into a single model. Hum Mutat (2008) 29(11):1265-1272. PubMed

Plon SE, et al. Sequence variant classification and reporting: Recommendations for improving the interpretation of cancer susceptibility genetic test results. Hum Mutat (2008) 29(11):1282-1291. PubMed

Rehm HL, et al. ClinGen—the clinical genome resource. New Engl J Med (2015) 372(23):2235-2242. PubMed

Data provided by

Amanda Spurdle, QIMR Berghofer Medical Research Institute (Australia), and the ENIGMA consortium.

Revision history

20 December 2017: Initial release

26 January 2018: Note added

1 May 2018: Challenge closed

24 September 2018: Dataset availability added

Center for Critical Assessment of Genome Interpretation

Register/Login

Critical Assessment of Genome Interpretation