CAGI Scientist Opening: apply here ☀️

Cystathionine beta-Synthase (CBS) single amino acid mutations observed in patients with homocystinuria

Dataset description: public

Dataset: public

The CAGI submission deadline for the CBS challenge has passed as of October 3, 2011, 3PM EDT.


CBS is a vitamin-dependent enzyme involved in cysteine biosynthesis via the transsulfuration pathway. The human CBS requires two cofactors for function, vitamin B6 (in the active form of pyridoxal 5’-phosphate [PLP], supplemented in the soluble form of pyridoxine) and heme.

Homocystinuria due to CBS deficiency (OMIM 236200) is a recessive inborn error of sulfur amino acid metabolism. More than 90 different disease-associated mutations have been identified in the CBS gene (Kraus et al. 1999). About one-half of homocystinuric patients respond to high doses of pyridoxine and several alleles are clearly pyridoxine remediable: A114V, R266K, R366H, K384E, L539S and the frequent I278T which accounts for 20% of all CBS mutant alleles.

Jasper Rine’s lab at UC Berkeley collected 84 single amino acid variants that had been observed in patients with homocystinuria. The functionality of the variants was tested in an in vivo yeast complementation assay where the human CBS clone is expressed and functionally complements a yeast cell that has had the yeast gene for the orthologous enzyme, CYS4, removed from the chromosome. In the assay, growth is dependent upon the level of mutant human CBS function, and the rates are expressed as a percentage relative to wild type (human) grown with the same amount of exogenous pyridoxine supplementation, plus and minus the standard deviation. Two concentrations of pyridoxine, high (400 ng/ml) and low (2 ng/ml), were used. For more information, please see documentation for the CAGI 2010 CBS challenge, and especially the file containing background information: CAGI_description_and_data.pdf.

Prediction challenge

Predictors are asked to submit predictions on the effect of the variants in the function of CBS both in high co-factor (pyridoxine) concentration (400ng/ml) and in low co-factor concentration (2ng/ml). The submitted prediction should be a numeric value with a standard deviation. The predictions will be assessed against the numeric values actually measured for each mutation in the yeast assay.

Please note: In CAGI 2010 we offered the first challenge on CBS variants. These synthetic variants were assayed with the same yeast assay as above, and therefore, the example dataset provided for the 2010 challenge is also useful for the 2011 challenge. However, this year's challenge uses actual mutations observed in the human population.

Presentation by Assessor Pauline Ng at the ISMB 2011 conference discusses the CBS 2010 dataset and the submitted predictions. Go to the CAGI 2010 Results page to download the presentation (available for registered users).

Download dataset: 84 single amino acid variants within the coding region of the CBS gene

Additional information

The reference sequence within NCBI Entrez database is: L14577.1.

The consensus sequence provided by data providers was obtained from sequencing of the plasmid containing the published cDNA, and it differs at one position (C909T) compared to the reference sequence.

Database of CBS alleles maintained by the Kraus Lab:

Kraus JP, Janosík M, Kozich V, Mandell R, Shih V, Sperandeo MP, Sebastio G, de Franchis R, Andria G, Kluijtmans LA, Blom H, Boers GH, Gordon RB, Kamoun P, Tsai MY, Kruger WD, Koch HG, Ohura T, Gaustadnes M. Cystathionine beta-synthase mutations in homocystinuria. Hum Mutat. 1999;13(5):362-75. doi: 10.1002/(SICI)1098-1004(1999)13:5<362::AID-HUMU4>3.0.CO;2-K

CBS gene in OMIM:

Prediction submission format

The prediction submission is a tab-delimited text file. Organizers provide a file template, which should be used for submission. In addition, a validation script is provided, and predictors should check the correctness of the format before submitting their predictions.

Download CBS submission template

Download CBS submission validation script

In the submitted file, each row should include the following columns:

  • Substituted residue - The mutation as listed in the prediction dataset file, use the order as provided in the template file
  • Prediction (high pyridoxine conc.) - Prediction of relative growth rate in high pyridoxine concentration (400ng/ml)
  • Standard deviation - SD of the prediction in column 2
  • Prediction (low pyridoxine conc.) - Prediction of relative growth rate in high pyridoxine concentration (2ng/ml)
  • Standard deviation - SD of the prediction in column 4

In the template file, cells in columns 2-5 are marked with an "*". Submit your predictions by replacing the "*" with your value. No empty cells are allowed in the submission; if you cannot submit predictions for a substitution, leave the sign "*" in these cells. Please make sure you follow the submission guidelines strictly.

In addition, your submission should include a detailed description of the method used to make the predictions (similar to the style of the Methods section in a scientific article). This information will be submitted as a separate file.

To submit predictions, you need to create or be part of a CAGI User group. Submit your predictions by accessing the link:"All submission forms" from the front page of your group. For more details, please read the FAQ page.

Dataset provided by

Jasper Rine, Jacob Mayfield, and Meara Davies, University of California, Berkeley