CAGI Scientist Opening: apply here ☀️
Predict the effects of missense mutations on calmodulin function as measured by a high-throughput yeast complementation assay
Dataset description: public
Variant data: public
Dataset availability: public
Last updated: 20 October 2017
This challenge closes at 6:00 PM PST (Pacific Standard Time) on 20 December 2017.
Download answer key and predictions: registered users only, limited by CAGI Data Use Agreement. The answer key and predictions are accessible to registered users only, and their use is limited by the CAGI Data Use agreement. Please log in to access the file.
Presentations from the CAGI 5 conference: registered users only, limited by CAGI Data Use Agreement. Presentations are accessible to registered users only, and their use is limited by the CAGI Data Use Agreement. Please log in to access the file.
Calmodulin is a calcium-sensing protein that modulates the activity of a large number of proteins in the cell. It is involved in many different cellular processes, and is especially important for neuron and muscle cell function. Variants that affect calmodulin function have been found to be causally associated with two cardiac arrhythmias. A large library of calmodulin missense variants was assessed with respect to their effects on protein function using a high-throughput yeast complementation assay. The challenge is to predict the functional effects of these variants.
Calmodulin is a small protein (149aa) encoded by the human genes Calmodulin 1 (CALM1), Calmodulin 2 (CALM2), and Calmodulin 3 (CALM3), each encoding precisely the same calmodulin protein sequence. Calmodulin senses the presence of calcium and communicates that signal to other proteins. It does so using four EF-hand motifs which bind to calcium ions and subsequently trigger a conformational change of the protein (Sarhan MF et al. 2012). The bound conformation reveals an interaction interface that allows calmodulin to physically interact with other proteins. Calmodulin has high clinical relevance, as variants of the protein are causally associated with two cardiac arrhythmias: catecholaminergic ventricular tachycardia (Nyegaard M et al. 2012) and long QT syndrome (Crotti L et al. 2013).
A team in Fritz Roth’s Lab at the Donnelly Centre (U. Toronto) and Lunenfeld Tanenbaum Research Institute (Sinai Health Systems), led by Jochen Weile and Song Sun, has assessed a large library of calmodulin variants using a high-throughput yeast complementation assay. This assay reveals the overall impact of each variant on the ability of the protein to function in the cell.
A diverse library of plasmids expressing different human CALM1 variants was generated by a random codon replacement method called POPCode. The Roth lab previously described a complementation assay to assess the functional impact of human CALM1 variants based on their ability to rescue a yeast strain carrying a temperature-sensitive allele of the yeast calmodulin orthologue CMD1 (Sun S et al., 2016). Complementation results are shown for the wt version of CALM1 in Supp Table 4 of the Sun S et al., 2016 manuscript. CMD1 is a Ca2+ binding protein that regulates Ca2+ independent processes (mitosis, bud growth, actin organization, endocytosis, etc.) and Ca2+ dependent processes (stress-activated pathways). CMD1 targets include Nuf1p, Myo2p and calcineurin. It also binds to the Hog1p MAPK in response to hyperosmotic stress, and potentiates membrane tubulation and constriction mediated by the Rvs161p-Rvs167p complex. CMD1 is an essential gene, and at the restrictive temperature, CMD1 temperature-sensitive mutants do not grow. The pooled library of CALM1 variants was transformed into the temperature-sensitive S. cerevisiae cmd1-1 strain. Two samples were taken from the pooled transformants as pre-selection technical replicates. Two further aliquots were used to start parallel cultures which were grown to saturation at the selective temperature of 36°C, from which two post-selection technical replicate samples were taken. Meanwhile, the same selection was performed on the temperature-sensitive S. cerevisiae cmd1-1 strain expressing the wild type allele of CALM1 and two samples were taken as wild-type control replicates. Plasmid DNA was extracted from the six samples followed by TileSEQ, a sequencing method based on the amplification of small tiles across the gene that are short enough to allow paired-end sequencing to read both strands on each cluster on an Illumina flowcell. When reads from both strands agree on the presence of a variant, it is counted.
The yeast-based functional assays were established and validated in a previous study (Sun S et al. 2016), and this map has been validated by its ability to separate pathogenic from non-pathogenic variants (Weile J et al. 2017).
The methods are described in (Weile J et al. 2017). Briefly, read counts in the pre-selection, post-selection and wt-control conditions for each variant were normalized to sequencing depth and then used to calculate allele frequency enrichment. First, the wild type control counts were subtracted from the pre- and post-selection counts (as they are assumed to represent position-dependent sequencing errors). Then, the log ratio between the post- and pre-selection counts was calculated. Finally, the log ratio distributions of synonymous and nonsense variants (which, for simplicity, are assumed to emulate wildtype- and null-like behaviour) were used to rescale all other variant log ratios, such that 1 represents full function and 0 represents complete loss of function. The resulting quantities are referred to as fitness scores below. The two replicates for each measurement were used to estimate measurement errors, and these were regularized using an established procedure (Baldi P and Long AD, 2001).
Participants are asked to submit predictions of the fitness score for each variant on competitive growth on a log scale. The submitted prediction should be a numeric value on a log scale between 0 (no growth at the restrictive temperature) and 1 (wildtype-like growth fitness). Please note: the experimental scores are a measure of fitness in a competitive growth assay and have not been calibrated to correspond to percent of wild-type protein function. Predictors should also bear in mind that this experiment assays the effect of human protein variants in a yeast system. To help participants calibrate their numeric values appropriately, we provide the experimental distribution of numeric growth fitness scores.
The predictions will be assessed against the numeric values calculated for each mutant clone in the competitive growth assay. Each predicted value must include a standard error. Predictions will also be assessed on the standard errors. A brief comment on the basis of the prediction may also be given (optional)
Download dataset: 5-CALM_dataset.txt (9 KB)
Download experimental distribution: 5-CALM_distribution.tsv (446 Bytes)
Download submission template: This submission template file is available only to registered users. Please log in to access the file.
Download submission validation script: This submission validation script is available only to registered users. Please log in to access the file.
Prediction submission format
The prediction submission is a tab-delimited text file. Organizers provide a template file, which must be used for submission. In addition, a validation script is provided, and predictors must check the correctness of the format before submitting their predictions.
Each data row in the submitted file must include the following columns:
In the template file, cells in columns 2-4 are marked with a "*". Submit your predictions by replacing the "*" with your value. No empty cells are allowed in the submission. For a given subset, you must submit predictions and standard deviations for all or none of the variants; if you are not confident in a prediction for a variant, enter an appropriately large standard deviation for the prediction. Optionally, enter a brief comment on the basis of the prediction. If you do not enter a comment on a prediction, leave the "*" in those cells. Please make sure you follow the submission guidelines strictly.
In addition, your submission should include a detailed description of the method used to make the predictions, similar to the style of the Methods section in a scientific article. This information must be submitted as a separate file.
To submit predictions, you need to create or be part of a CAGI User group. Submit your predictions by accessing the link "All submission forms" from the front page of your group. For more details, please read the FAQ page.
Predictions will be assessed by an independent assessor. We anticipate a test of rank correlation (e.g., Kendall's tau) and RMS deviation of predictions relative to experimental observations. The independent assessor is expected to employ a range of other evaluation approaches.
Weile J*, Sun S*, Cote AG, Knapp J, Verby M, Mellor JC, Wu Y, Pons C, Wong C, van Lieshout N, Yang F, Tasan M, Tan G, Yang S, Fowler DM, Nussbaum R, Bloom JD, Vidal M, Hill DE, Aloy P, Roth FP. 2017. Expanding the atlas of functional missense variation for human genes. bioRxiv 166595. https://www.biorxiv.org/content/early/2017/11/01/166595
*These authors contributed equally.
Baldi P, Long AD. 2001. A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes. Bioinformatics 17:509-519. PMID: 11395427
Crotti L, Johnson CN, Graf E et al. 2013. Calmodulin mutations associated with recurrent cardiac arrest in infants. Circulation 127:1009-1017. PMCID:PMC3834768. doi:10.1161/CIRCULATIONAHA.112.001216
Nyegaard M, Overgaard MT, Sondergaard MT et al. 2012. Mutations in calmodulin cause ventricular tachycardia and sudden cardiac death. Am J Hum Genet 91:703-712. PMCID:PMC3484646. doi:10.1016/j.ajhg.2012.08.015
Sarhan MF, Tung CC, Van Petegem F, Ahern CA. 2012. Crystallographic basis for calcium regulation of sodium channels. Proc Natl Acad Sci USA 109:3558-3563. PMCID:PMC3295267. doi:10.1073/pnas.1114748109
Sun S, Yang F, Tan G et al. 2016. An extended set of yeast-based functional assays accurately identifies human disease mutations. Genome Res 26:670-680. PMCID:PMC4864455. doi:10.1101/gr.192526.115
Weile J, S Sun S, Cote AG, et al. 2017. Expanding the atlas of functional missense variation for human genes. bioRxiv 166595 http://llama.mshri.on.ca/publications/Weile_BioRxiv_2017.html
21 Oct 2017 (v01): initial release
8 Nov 2017 (v02): Closing date added
24 Sep 2018 (v03): Dataset availability added