CAGI Scientist Opening: apply here ☀️

Predict the effects of missense mutations on calmodulin function as measured by a high-throughput yeast complementation assay

Challenge: CALM1

Variant data: public

Last updated: 20 October 2017

This challenge is closed.

Make sure you understand our Data Use Agreement and Anonymity Policy


Calmodulin is a calcium-sensing protein that modulates the activity of a large number of proteins in the cell. It is involved in many different cellular processes, and is especially important for neuron and muscle cell function. Variants that affect calmodulin function have been found to be causally associated with two cardiac arrhythmias. A large library of calmodulin missense variants was assessed with respect to their effects on protein function using a high-throughput yeast complementation assay. The challenge is to predict the functional effects of these variants.


Calmodulin is a small protein (149aa) encoded by the human genes Calmodulin 1 (CALM1), Calmodulin 2 (CALM2), and Calmodulin 3 (CALM3), each encoding precisely the same calmodulin protein sequence. Calmodulin senses the presence of calcium and communicates that signal to other proteins. It does so using four EF-hand motifs which bind to calcium ions and subsequently trigger a conformational change of the protein (Sarhan et al., 2012). The bound conformation reveals an interaction interface that allows calmodulin to physically interact with other proteins. Calmodulin has high clinical relevance, as variants of the protein are causally associated with two cardiac arrhythmias: catecholaminergic ventricular tachycardia (Nyegaard et al., 2012) and long QT syndrome (Crotti et al., 2013).

A team in Fritz Roth’s Lab at the Donnelly Centre (U. Toronto) and Lunenfeld Tanenbaum Research Institute (Sinai Health Systems), led by Jochen Weile and Song Sun, has assessed a large library of calmodulin variants using a high-throughput yeast complementation assay. This assay reveals the overall impact of each variant on the ability of the protein to function in the cell.


A diverse library of plasmids expressing different human CALM1 variants was generated by a random codon replacement method called POPCode. The Roth lab previously described a complementation assay to assess the functional impact of human CALM1 variants based on their ability to rescue a yeast strain carrying a temperature-sensitive allele of the yeast calmodulin orthologue CMD1 (Sun et al., 2016). Complementation results are shown for the wt version of CALM1 in Supp Table 4 of the Sun et al. (2016) manuscript. CMD1 is a Ca2+ binding protein that regulates Ca2+ independent processes (mitosis, bud growth, actin organization, endocytosis, etc.) and Ca2+ dependent processes (stress-activated pathways). CMD1 targets include Nuf1p, Myo2p and calcineurin. It also binds to the Hog1p MAPK in response to hyperosmotic stress, and potentiates membrane tubulation and constriction mediated by the Rvs161p-Rvs167p complex. CMD1 is an essential gene, and at the restrictive temperature, CMD1 temperature-sensitive mutants do not grow. The pooled library of CALM1 variants was transformed into the temperature-sensitive S. cerevisiae cmd1-1 strain. Two samples were taken from the pooled transformants as pre-selection technical replicates. Two further aliquots were used to start parallel cultures which were grown to saturation at the selective temperature of 36°C, from which two post-selection technical replicate samples were taken. Meanwhile, the same selection was performed on the temperature-sensitive S. cerevisiae cmd1-1 strain expressing the wild type allele of CALM1 and two samples were taken as wild-type control replicates. Plasmid DNA was extracted from the six samples followed by TileSEQ, a sequencing method based on the amplification of small tiles across the gene that are short enough to allow paired-end sequencing to read both strands on each cluster on an Illumina flowcell. When reads from both strands agree on the presence of a variant, it is counted.

The yeast-based functional assays were established and validated in a previous study (Sun et al., 2016), and this map has been validated by its ability to separate pathogenic from non-pathogenic variants (Weile et al., 2017).

The methods are described in (Weile et al., 2017). Briefly, read counts in the pre-selection, post-selection and wt-control conditions for each variant were normalized to sequencing depth and then used to calculate allele frequency enrichment. First, the wild type control counts were subtracted from the pre- and post-selection counts (as they are assumed to represent position-dependent sequencing errors). Then, the log ratio between the post- and pre-selection counts was calculated. Finally, the log ratio distributions of synonymous and nonsense variants (which, for simplicity, are assumed to emulate wildtype- and null-like behaviour) were used to rescale all other variant log ratios, such that 1 represents full function and 0 represents complete loss of function. The resulting quantities are referred to as fitness scores below. The two replicates for each measurement were used to estimate measurement errors, and these were regularized using an established procedure (Baldi & Long, 2001).

Prediction challenge

Participants are asked to submit predictions of the fitness score for each variant on competitive growth on a log scale. The submitted prediction should be a numeric value on a log scale between 0 (no growth at the restrictive temperature) and 1 (wildtype-like growth fitness). Please note: the experimental scores are a measure of fitness in a competitive growth assay and have not been calibrated to correspond to percent of wild-type protein function. Predictors should also bear in mind that this experiment assays the effect of human protein variants in a yeast system. To help participants calibrate their numeric values appropriately, we provide the experimental distribution of numeric growth fitness scores.

The predictions will be assessed against the numeric values calculated for each mutant clone in the competitive growth assay. Each predicted value must include a standard error. Predictions will also be assessed on the standard errors. A brief comment on the basis of the prediction may also be given (optional)

Download dataset: 5-CALM_dataset.txt 

Download experimental distribution: 5-CALM_distribution.tsv 

Prediction submission format 

The prediction submission is a tab-delimited text file. Organizers provide a template file, which must be used for submission. In addition, a validation script is provided, and predictors must check the correctness of the format before submitting their predictions.

Each data row in the submitted file must include the following columns:

In the template file, cells in columns 2-4 are marked with a "*". Submit your predictions by replacing the "*" with your value. No empty cells are allowed in the submission. For a given subset, you must submit predictions and standard deviations for all or none of the variants; if you are not confident in a prediction for a variant, enter an appropriately large standard deviation for the prediction. Optionally, enter a brief comment on the basis of the prediction. If you do not enter a comment on a prediction, leave the "*" in those cells. Please make sure you follow the submission guidelines strictly. 

In addition, your submission should include a detailed description of the method used to make the predictions, similar to the style of the Methods section in a scientific article. This information must be submitted as a separate file. 

To submit predictions, you need to create or be part of a CAGI User group. Submit your predictions by accessing the link "All submission forms" from the front page of your group. For more details, please read the FAQ page.


Predictions will be assessed by an independent assessor. We anticipate a test of rank correlation (e.g., Kendall’s tau) and RMS deviation of predictions relative to experimental observations. The independent assessor is expected to employ a range of other evaluation approaches.

Dataset citation 

Weile J, et al. A framework for exhaustively mapping functional missense variants. Mol Syst Biol (2017) 13(12):957. PubMed 


Baldi P, Long AD. A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics (2001) 17(6):509-519. PubMed 

Crotti L, et al. Calmodulin mutations associated with recurrent cardiac arrest in infants. Circulation (2013) 127(9):1009-1017. PubMed 

Nyegaard M, et al. Mutations in calmodulin cause ventricular tachycardia and sudden cardiac death. Am J Hum Genet (2012) 91(4):703-712. PubMed 

Sarhan MF, et al. Crystallographic basis for calcium regulation of sodium channels. Proc Natl Acad Sci U S A (2012) 109(9):3558-3563. PubMed 

Sun S, et al. An extended set of yeast-based functional assays accurately identifies human disease mutations. Genome Res (2016) 26(5):670-680. PubMed 

Weile J, et al. A framework for exhaustively mapping functional missense variants. Mol Syst Biol (2017) 13(12):957. PubMed 

Revision history 

21 October 2017: initial release

8 November 2017: Closing date added

20 December 2017: Challenge closed

24 September 2018: Dataset availability added