C ritical Assessment of Genome Interpretation

Annotate all loss-of-function variants

Challenge: Annotate all LOFs

Variant data: public

Last updated: 27 June 2025

This challenge is open. The challenge closes on September 15, 2025.

How to participate in CAGI7? Download data & submit predictions on Synapse

Make sure you understand our Data Use Agreement and Anonymity Policy

Summary

Loss-of-function variants represent critical classes of genetic variation that can significantly alter protein function and are implicated in a variety of genetic diseases. Two major sources contribute to these variants: (1) frameshifting mutations from out-of-frame insertions or deletions that shift the reading frame and create new downstream stop codons and (2) single nucleotide substitutions that directly create premature stop codons or eliminate natural stop codons. These variants often result in complete loss of function, dominant-negative effects, or gain of toxic function. The challenge is to predict the functional impact of all 1bp and 2bp frameshift insertions/deletions and single nucleotide substitutions that result in stop gain or stop loss across human protein-coding, Mendelian disease-associated genes. For frameshifting variants leading to the same stop gain or stop loss, only the one located at the most 5’ end is included. Predictions will be assessed against new experimental or clinical annotations as they become available, with regular evaluations in line with CAGI standards.

Background

The functional annotation of protein-altering variants remains a critical challenge in genomics and precision medicine. While significant progress has been made in developing computational tools for missense variants (Lin et al., 2024), stop gain and stop loss variants remain significantly underserved by current prediction methods. Most existing approaches treat these variants simplistically by either uniformly predicting nonsense variants as deleterious without considering context, or providing only basic consequence annotation without functional impact assessment (Singer-Berk et al., 2023). However, these variants present unique prediction challenges due to their mechanistic diversity (arising from both frameshift mutations and direct substitutions), position-dependent effects (early vs. late stop events may have different consequences), and context-dependent molecular outcomes including nonsense-mediated decay, non-stop decay, and domain disruption (Frischmeyer et al., 2002; Lindeboom et al., 2019). This challenge addresses the critical gap in computational tools for stop gain/stop loss variant impact prediction, focusing specifically on disease-associated genes where accurate functional assessment has the highest clinical impact.

Prediction challenge

Participants are provided with a comprehensive list of out-of-frame insertions, deletions (leading to +1 or +2 reading frame shift) and single-nucleotide substitutions that result in stop gain or stop loss across human protein-coding, Mendelian disease-associated genes. For each variant, participants must predict its functional effect, expressed as a score from 0 (benign/no effect) to 1 (deleterious/complete loss of function), along with a standard deviation indicating confidence. Predictions will be evaluated as new experimental or clinical data become available, similar to the Annotate All Missense challenges (Rastogi et al., 2025).

Submission format

The prediction submission is a tab-delimited text file. Organizers provide a file template, which should be used for submission. In addition,predictors should check the correctness of the format before submitting their predictions. The nucleotide sequences used for this challenge can be obtained from RefSeq MANE select release v1.4. In the submitted file, each row must include the following tab-separated fields:

Variant: nucleotide change using the HGVS nucleotide-level notation; e.g., NM_001385641.1:c.344_345insA (insertion of 1 nucleotide), NM_001385641.1:c.176_177insGA (insertion of 2 nucleotides), NM_000016.6:c.237_238del (deletion of 2 nucleotides), NM_001385641.1:c.487A>T (substitution of 1 nucleotide).
Prediction score: real-valued score for each variant from 0 (benign) to 1 (deleterious)
Standard deviation: standard deviation of the prediction in column 2 indicating confidence (must be a positive number)
Classification: based on the score in column 2, indicate whether the indel is "D(amaging)" , "T(olerated)" or "U(known)"
Comment: optional brief comment on the basis of the prediction in columns 2-4.

In the template file, some cells in columns 2-5 are marked with a "*". Submit your predictions by only including variants for which predictions are made and using the "*" if a field is not used. No empty cells are allowed in the submission. Unlike in the Annotate All Missense challenge, here you do not have to submit predictions and standard deviation for all variants. If you are not confident in a prediction for a variant, enter an appropriate large standard error for the prediction. Optionally, enter a brief comment on the basis of the prediction. If you do not enter a comment on a prediction, leave the "*" in those cells. Please make sure you follow the submission guidelines strictly.

In addition, your submission must include a detailed description of the method used to make the predictions, similar in style to the Methods section in a scientific article. This information should be submitted as a separate file and contain (1) algorithms and features used; e.g., structural modeling, conservation, machine learning; (2) training data sources; e.g., ClinVar, gnomAD; (3) any assumptions or limitations.

File naming

CAGI allows submission of up to six models per team, of which model 1 is considered primary. You can upload predictions for each model multiple times; the last submission before deadline will be evaluated for each model. If you are submitting a single file with all predictions combined, please use the format below.

Use the following format for your submissions: <teamname>_model_(1|2|3|4|5|6).(tsv|txt)

To include a description of your method, use the following filename: <teamname>_desc.*

Example: if your team’s name is “bestincagi” and you are submitting predictions for your model number 3, your filename should be bestincagi_model_3.txt.

If, however, your files are large, you can split your predictions in four different files. For example, for single nucleotide deletions, please use <teamname>_del1_model_(1|2|3|4|5|6).(tsv|txt). For deletions of length two, please replace “del1” by “del2”. For insertions, please use “ins1” or “ins2” in place of “del1”.

Related challenges

Download data

Variant data: cagi7-lofs-files.zip (12MB).

Download submission template file: annotatealllofstemplate.txt (to be provided)

We do not provide a validation script. Participants are encouraged to use Mutalyzer to validate their variant nomenclature.

References

Frischmeyer PA, et al. An mRNA surveillance mechanism that eliminates transcripts lacking termination codons. Science (2002) 295(5563):2258-2261. PubMed

Lin YJ, et al. Variant Impact Predictor database (VIPdb), version 2: trends from three decades of genetic variant impact predictors. Hum Genomics (2024) 18(1):90. PubMed

Lindeboom RGH, et al. The impact of nonsense-mediated mRNA decay on genetic disease, gene editing and cancer immunotherapy. Nat Genet (2019) 51(11):1645-1651. PubMed

Rastogi R, et al. Critical assessment of missense variant effect predictors on disease-relevant variant data. Hum Genet (2025) 144(2-3):281-293. PubMed

Singer-Berk M, et al. Advanced variant classification framework reduces the false positive rate of predicted loss-of-function variants in population sequencing data. Am J Hum Genet (2023) 110(9):1496-1508. PubMed

Revision history

25 June 2025: challenge preview posted

27 June 2025: challenge open

Center for Critical Assessment of Genome Interpretation

Register/Login

C ritical Assessment of Genome Interpretation

How to participate in CAGI7? Download data & submit predictions on Synapse