Predict the effects of missense mutations on calmodulin function as measured by a high-throughput yeast complementation assay

Challenge: CALM1
Dataset description: public
Variant data: public
Last updated: 20 October 2017
This challenge closes at 6:00 PM PST (Pacific Standard Time) on 20 December 2017 .

Although the challenge has closed, late submissions have happened occasionally in CAGI. Our policy is that, out of fairness, these cannot be included in the primary assessment by the assessor. However, the assessor will have access to late submissions and may at their discretion choose to consider them in parallel with the on-time primary submissions. If the assessor chooses to consider them, the results for late submissions will be always labeled as 'late' and kept distinct, but might be mentioned in presentations and in the publication.

[Summary] [Background] [Experiment] [Prediction Challenge] [Prediction Submission Format] [Assessment] [Dataset citation] [References] [Revision history]

Calmodulin is a calcium-sensing protein that modulates the activity of a large number of proteins in the cell. It is involved in many different cellular processes, and is especially important for neuron and muscle cell function. Variants that affect calmodulin function have been found to be causally associated with two cardiac arrhythmias. A large library of calmodulin missense variants was assessed with respect to their effects on protein function using a high-throughput yeast complementation assay. The challenge is to predict the functional effects of these variants.

Calmodulin is a small protein (149aa) encoded by the human genes Calmodulin 1 (CALM1), Calmodulin 2 (CALM2), and Calmodulin 3 (CALM3), each encoding precisely the same calmodulin protein sequence. Calmodulin senses the presence of calcium and communicates that signal to other proteins. It does so using four EF-hand motifs which bind to calcium ions and subsequently trigger a conformational change of the protein (Sarhan MF et al. 2012). The bound conformation reveals an interaction interface that allows calmodulin to physically interact with other proteins. Calmodulin has high clinical relevance, as variants of the protein are causally associated with two cardiac arrhythmias: catecholaminergic ventricular tachycardia (Nyegaard M et al. 2012) and long QT syndrome (Crotti L et al. 2013).

A team in Fritz Roth’s Lab at the Donnelly Centre (U. Toronto) and Lunenfeld Tanenbaum Research Institute (Sinai Health Systems), led by Jochen Weile and Song Sun, has assessed a large library of calmodulin variants using a high-throughput yeast complementation assay. This assay reveals the overall impact of each variant on the ability of the protein to function in the cell.

A diverse library of plasmids expressing different human CALM1 variants was generated by a random codon replacement method called POPCode. The Roth lab previously described a complementation assay to assess the functional impact of human CALM1 variants based on their ability to rescue a yeast strain carrying a temperature-sensitive allele of the yeast calmodulin orthologue CMD1 (Sun S et al., 2016). Complementation results are shown for the wt version of CALM1 in Supp Table 4 of the Sun S et al., 2016 manuscript. CMD1 is a Ca2+ binding protein that regulates Ca2+ independent processes (mitosis, bud growth, actin organization, endocytosis, etc.) and Ca2+ dependent processes (stress-activated pathways). CMD1 targets include Nuf1p, Myo2p and calcineurin. It also binds to the Hog1p MAPK in response to hyperosmotic stress, and potentiates membrane tubulation and constriction mediated by the Rvs161p-Rvs167p complex. CMD1 is an essential gene, and at the restrictive temperature, CMD1 temperature-sensitive mutants do not grow. The pooled library of CALM1 variants was transformed into the temperature-sensitive S. cerevisiae cmd1-1 strain. Two samples were taken from the pooled transformants as pre-selection technical replicates. Two further aliquots were used to start parallel cultures which were grown to saturation at the selective temperature of 36°C, from which two post-selection technical replicate samples were taken. Meanwhile, the same selection was performed on the temperature-sensitive S. cerevisiae cmd1-1 strain expressing the wild type allele of CALM1 and two samples were taken as wild-type control replicates. Plasmid DNA was extracted from the six samples followed by TileSEQ, a sequencing method based on the amplification of small tiles across the gene that are short enough to allow paired-end sequencing to read both strands on each cluster on an Illumina flowcell. When reads from both strands agree on the presence of a variant, it is counted.

The yeast-based functional assays were established and validated in a previous study (Sun S et al. 2016), and this map has been validated by its ability to separate pathogenic from non-pathogenic variants (Weile J et al. 2017).

The methods are described in (Weile J et al. 2017). Briefly, read counts in the pre-selection, post-selection and wt-control conditions for each variant were normalized to sequencing depth and then used to calculate allele frequency enrichment. First, the wild type control counts were subtracted from the pre- and post-selection counts (as they are assumed to represent position-dependent sequencing errors). Then, the log ratio between the post- and pre-selection counts was calculated. Finally, the log ratio distributions of synonymous and nonsense variants (which, for simplicity, are assumed to emulate wildtype- and null-like behaviour) were used to rescale all other variant log ratios, such that 1 represents full function and 0 represents complete loss of function. The resulting quantities are referred to as fitness scores below. The two replicates for each measurement were used to estimate measurement errors, and these were regularized using an established procedure (Baldi P and Long AD, 2001).

Prediction challenge
Participants are asked to submit predictions of the fitness score for each variant on competitive growth on a log scale. The submitted prediction should be a numeric value on a log scale between 0 (no growth at the restrictive temperature) and 1 (wildtype-like growth fitness). Please note: the experimental scores are a measure of fitness in a competitive growth assay and have not been calibrated to correspond to percent of wild-type protein function. Predictors should also bear in mind that this experiment assays the effect of human protein variants in a yeast system. To help participants calibrate their numeric values appropriately, we provide the experimental distribution of numeric growth fitness scores.

The predictions will be assessed against the numeric values calculated for each mutant clone in the competitive growth assay. Each predicted value must include a standard error. Predictions will also be assessed on the standard errors. A brief comment on the basis of the prediction may also be given (optional)

Download dataset
5-CALM_dataset.txt (9 KB)

Download experimental distribution
5-CALM_distribution.tsv (446 Bytes)

Download submission template
This submission template file is available only to registered users. Please log in to access the file.

Download submission validation script
This submission validation script is available only to registered users. Please log in to access the file.

Prediction submission format
The prediction submission is a tab-delimited text file. Organizers provide a template file, which must be used for submission. In addition, a validation script is provided, and predictors must check the correctness of the format before submitting their predictions.
Each data row in the submitted file must include the following columns:

  1. AA substitution – The mutation as listed in the dataset file.
  2. Competitive fitness score – Prediction of competitive growth relative to wild-type: 0 = no growth under selection, 1 = wild-type growth. Participants may use the provided experimental distribution of numeric growth score values to calibrate their prediction scores.
  3. Standard deviation – SD of the prediction in column 2 (Indicating confidence in prediction).
  4. Comment – optional brief comment on the basis of the prediction in column 2

In the template file, cells in columns 2-4 are marked with a "*". Submit your predictions by replacing the "*" with your value. No empty cells are allowed in the submission. For a given subset, you must submit predictions and standard deviations for all or none of the variants; if you are not confident in a prediction for a variant, enter an appropriately large standard deviation for the prediction. Optionally, enter a brief comment on the basis of the prediction. If you do not enter a comment on a prediction, leave the "*" in those cells. Please make sure you follow the submission guidelines strictly.

In addition, your submission should include a detailed description of the method used to make the predictions, similar to the style of the Methods section in a scientific article. This information must be submitted as a separate file.

To submit predictions, you need to create or be part of a CAGI User group. Submit your predictions by accessing the link "All submission forms" from the front page of your group. For more details, please read the FAQ page.

Predictions will be assessed by an independent assessor. We anticipate a test of rank correlation (e.g., Kendall's tau) and RMS deviation of predictions relative to experimental observations. The independent assessor is expected to employ a range of other evaluation approaches.

Dataset citation
Weile J*, Sun S*, Cote AG, Knapp J, Verby M, Mellor JC, Wu Y, Pons C, Wong C, van Lieshout N, Yang F, Tasan M, Tan G, Yang S, Fowler DM, Nussbaum R, Bloom JD, Vidal M, Hill DE, Aloy P & Roth FP. 2017. Expanding the atlas of functional missense variation for human genes.BioRxiv 166595
*These authors contributed equally.


  • Baldi P, Long AD. 2001. A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes. Bioinformatics 17:509-519. PMID: 11395427
  • Crotti L, Johnson CN, Graf E et al. 2013. Calmodulin mutations associated with recurrent cardiac arrest in infants. Circulation 127:1009-1017. PMCID:PMC3834768. doi:10.1161/CIRCULATIONAHA.112.001216
  • Nyegaard M, Overgaard MT, Sondergaard MT et al. 2012. Mutations in calmodulin cause ventricular tachycardia and sudden cardiac death. Am J Hum Genet 91:703-712. PMCID:PMC3484646. doi:10.1016/j.ajhg.2012.08.015
  • Sarhan MF, Tung CC, Van Petegem F, Ahern CA. 2012. Crystallographic basis for calcium regulation of sodium channels. Proc Natl Acad Sci USA 109:3558-3563. PMCID:PMC3295267. doi:10.1073/pnas.1114748109
  • Sun S, Yang F, Tan G et al. 2016. An extended set of yeast-based functional assays accurately identifies human disease mutations. Genome Res 26:670-680. PMCID:PMC4864455. doi:10.1101/gr.192526.115
  • Weile J, S Sun S, Cote AG, et al. 2017. Expanding the atlas of functional missense variation for human genes. BioRxiv 166595

Revision history
21 Oct 2017 (v01): initial release
8 Nov 2017 (v02): Closing date added