Predict missense variant effects on hydroxymethylbilane synthase as measured by a yeast complementation assay

Challenge: HMBS

Variant data: public

Last updated: 03 May 2021

This challenge will open on 01 June 2021. This challenge will tentatively close on 31 August 2021.

Register (not active yet).    Login (not active yet).    Submit predictions (not active yet). 

Make sure you understand our Data Use Agreement and Anonymity Policy


Hydroxymethylbilane synthase (HMBS), also known as porphobilinogen deaminase (PBGD) or uroporphyrinogen I synthase, is an enzyme involved in heme production. In humans, variants that affect HMBS function result in acute intermittent porphyria (AIP), an autosomal dominant genetic disorder caused by a build-up of porphobilinogen in the cytoplasm. A large library of HMBS missense variants was assessed with respect to their effects on protein function using a high-throughput yeast complementation assay. The challenge is to predict the functional effects of these variants.


HMBS is a 40-42 kDa (344-361 aa) protein involved in the third step of heme biosynthesis. There are two isoforms, one ubiquitous in all tissues, and one restricted to erythrocytes. The ubiquitous isoform is generated by alternative splicing of exon 1 and contains an additional 17 amino acid residues at the N terminus. The HMBS structure comprises three domains and a single catalytic site, with interdomain flexibility contributing to elongation of the polypyrrole product (Louie et al., 1992). Enzyme deficiency caused by mutations in the HMBS gene in combination with environmental factors can trigger acute intermittent porphyria (AIP), a condition characterized by tachycardia, arrhythmias and hypertension, seizures, and damage to nerves and muscles (peripheral neuropathy) which can lead to paralysis. While the exact underlying mechanisms involved in the development of symptomatic AIP are unknown, the measurement of low HMBS activity in erythrocytes facilitates detection of AIP during latent periods and acute episodes. However, the detection of normal activity does not exclude the ‘non-erythroid variant' of AIP (Linenberger & Fertrin, 2020). Mostly for this reason, molecular genetic diagnosis is the most sensitive and preferred method for the diagnosis of classical and variant AIP. Moreover, it permits identification of latent carriers within a single family, allowing these individuals to be aware of factors that may precipitate acute attacks. 

A team in Fritz Roth’s Lab at the Donnelly Centre (University of Toronto) and Lunenfeld-Tanenbaum Research Institute (Sinai Health Systems), led by Warren van Loggerenberg, has assessed a large library of HMBS variants using a high-throughput yeast complementation assay. This assay reveals the overall impact of each variant on the ability of the protein to function in the cell.


A diverse library of plasmids expressing different human HMBS (UniProtKB: P08397) variants was generated by a random codon replacement method called POPCode (Weile et al., 2017). The Roth lab had developed a yeast-based functional complementation assay that is amenable to variant effect mapping of HMBS via two steps: (1) confirming a complementation relationship, in which HMBS can rescue the phenotypic defect of a loss-of-function mutation in the orthologous essential yeast gene HEM3; and (2) assessing the loss of rescue for a test set of likely damaging and likely neutral variants. The yeast-based functional complementation assay was validated for the human HMBS gene by measuring the impact of four variants of which two (50%) were detected at a stringency yielding 100% precision, thus offering performance on par with previous human disease gene complementation assays (Sun et al., 2016). HEM3 catalyzes head-to-tail condensation of four units of porphobilinogen (PBG) to form the linear tetrapyrrole hydroxymethylbilane (HMB) (Bogorad, 1958; Anderson & Desnick, 1980). Catalysis requires the apoprotein spontaneously assemble the unique cofactor dipyrromethane (DPM) from two molecules of PBG to generate the active holoenzyme. DPM initiates the polymerization reaction, and release of the unstable tetrapyrrole product by hydrolysis restores the holoenzyme with covalently bound DPM, which continues to act as a primer for HMB formation (Jordan et al., 1988). HEM3 (UniProtKB: P28789) is an essential gene, and at restrictive temperature, HEM3 temperature-sensitive mutants do not grow. Pooled libraries of ubiquitous and erythroid-specific HMBS variants were transformed into the temperature-sensitive S. cerevisiae HEM3 strain. For each isoform, two samples were taken from the pooled transformants as pre-selection technical replicates. Two further aliquots were used to start parallel cultures which were grown to saturation at the selective temperature of 36°C, from which two post-selection technical replicate samples were taken. Meanwhile, the same selection was performed on the temperature-sensitive S. cerevisiae HEM3 strain expressing either wildtype HMBS isoform, and two samples were taken as wildtype control replicates. Plasmid DNA was extracted from the six samples followed by TileSEQ, a sequencing method based on the amplification of small tiles across the gene that are short enough to allow paired-end sequencing to read both strands on each cluster on an Illumina flowcell. When reads from both strands agree on the presence of a variant, it is counted.

The yeast-based functional assays were established and validated, and maps for either HMBS isoform have been validated by their ability to separate pathogenic from non-pathogenic variants (van Loggerenberg et al., in preparation). The methods are described in (Weile et al., 2017). Briefly, read counts in the pre-selection, post-selection and wildtype-control conditions for each variant were normalized to sequencing depth and then used to calculate allele frequency enrichment. First, the wildtype control counts were subtracted from the pre- and post-selection counts (as they are assumed to represent position-dependent sequencing errors). Then, the log ratio between the post- and pre-selection counts was calculated. Finally, the log ratio distributions of synonymous and nonsense variants (which, for simplicity, are assumed to emulate wildtype- and null-like behavior) were used to rescale all other variant log ratios, such that 1 represents full function and 0 represents complete loss of function. The two replicates for each measurement were used to estimate measurement errors, and these were regularized using an established procedure (Baldi & Long, 2001). For each variant a weighted average score inferring error between maps was calculated, where the weight corresponded to inverse-squared standard error estimates. The resulting quantities in the combined HMBS map are referred to as fitness scores below.

Prediction challenge

Participants are asked to submit predictions of the fitness score for each variant on competitive growth on a log scale. The submitted prediction should be a numeric value on a log scale between 0 (no growth at the restrictive temperature) and 1 (wildtype-like growth fitness). Please note: the experimental scores are a measure of fitness in a competitive growth assay and have not been calibrated to correspond to percent of wildtype protein function. Predictors should also bear in mind that this experiment assays the effect of human protein variants in a yeast system. To help participants calibrate their numeric values appropriately, we provide the experimental distribution of numeric growth fitness scores.

The predictions will be assessed against the numeric values calculated for each mutant clone in the competitive growth assay. Each predicted value must include a standard error. Predictions will also be assessed on the standard errors. A brief comment for the prediction may also be given.

Prediction submission format 

The prediction submission is a tab-delimited text file. Organizers provide a template file, which must be used for submission. In addition, a validation script is provided, and predictors must check the correctness of the format before submitting their predictions.

Each data row in the submitted file must include the following columns:

  • 1. AA substitution – The mutation as listed in the dataset file.
  • 2. Competitive fitness score – Prediction of competitive growth relative to wildtype: 0 = no growth under selection, 1 = wildtype growth. Participants may use the provided experimental distribution of numeric growth score values to calibrate their prediction scores.
  • 3. Standard deviation – SD of the prediction in column 2 (Indicating confidence in prediction).
  • 4. Comment – optional brief comment for the prediction in column 2

In the template file, cells in columns 2-4 are marked with a "*". Submit your predictions by replacing the "*" with your value. No empty cells are allowed in the submission. For a given subset, you must submit predictions and standard deviations for all or none of the variants; if you are not confident in a prediction for a variant, enter an appropriately large standard deviation for the prediction. Optionally, enter a brief comment on the basis of the prediction. If you do not enter a comment on a prediction, leave the "*" in those cells. Please make sure you follow the submission guidelines strictly. 

In addition, your submission should include a detailed description of the method used to make the predictions, similar to the style of the Methods section in a scientific article. This information must be submitted as a separate file.

To submit predictions, you need to create or be part of a CAGI User group. Submit your predictions by accessing the link "All submission forms" from the front page of your group. For more details, please read the FAQ page.

Download data 

Download dataset: Currently not available

Download experimental distribution: Currently not available

Download submission template: Currently not available

Download submission validation script: Currently not available

Training data 

No training data is provided. The participants may wish to use known variants from ClinVar, gnomAD, HGMD, and UniProtKB to calibrate their models.


Predictions will be assessed by an independent assessor. We anticipate a range of evaluation scenarios including the R-square, correlation, and rank correlation between predictions and experimental observations. The independent assessor is expected to emphasize certain performance measures over others as well as to employ other evaluation approaches. We also anticipate separate assessments on single-nucleotide accessible amino acid substitutions and amino acid substitutions requiring more than one nucleotide change to the codon. Finally, the assessors may use the submissions to match the score distributions of their predictions to the observations.

Related challenges


Anderson P, & Desnick R. Purification and properties of uroporphyrinogen I synthase from human erythrocytes. Identification of stable enzyme-substrate intermediates. J Biol Chem (1980) 255(5): 1993-1999. PubMed 

Baldi P, Long AD. A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics (2001) 17(6):509-519. PubMed 

Bogorad L. The enzymatic synthesis of porphyrins from porphobilinogen. I. Uroporphyrin I. J Biol Chem (1958) 233(2): 501-509. PubMed 

Jordan PM, et al. Purification, crystallization and properties of porphobilinogen deaminase from a recombinant strain of Escherichia coli K12. Biochem J (1988) 254(2): 427-435. PubMed 

Linenberger M, Fertrin KY. Updates on the diagnosis and management of the most common hereditary porphyrias: AIP and EPP. Hematology 2014, the American Society of Hematology Education Program Book 2020(1), 400-410. PubMed 

Louie GV, et al. Structure of porphobilinogen deaminase reveals a flexible multidomain polymerase with a single catalytic site. Nature (1992) 359: 33-39. PubMed 

Pierarch CA, et al. Red blood cell phorphobilinogen deaminase in the evaluation of acute intermittent porphyria. JAMA (1987) 257(1):60-61. PubMed 

Sun S, et al. An extended set of yeast-based functional assays accurately identifies human disease mutations. Genome Res (2016) 26(5):670-680. PubMed 

Weile J, et al. A framework for exhaustively mapping functional missense variants. Mol Syst Biol (2017) 13(12): 957. PubMed 

Dataset provided by 

Warren van Loggerenberg, Jochen Weile, Song Sun, and Fritz Roth, University of Toronto

Revision history 

03 May 2021: initial release