Acid alpha-glucosidase (GAA): predict the effect of naturally occurring missense mutations on enzymatic activity

Challenge: GAA missense
Dataset description: public
Variant data: public
Last updated: 9 Nov 2017
This challenge closes at 9:00 PM PST (Pacific Standard Time) on 10 January 2018 .

[Summary] [Background] [Experiment] [Prediction Challenge] [Prediction Submission Format] [References] [Revision history]

Acid alpha-glucosidase (GAA) is a lysosomal alpha-glucosidase. Some mutations in GAA cause a rare disorder, Pompe disease, (Glycogen Storage Disease II). Rare GAA missense variants found in a human population sample have been assayed for enzymatic activity in transfected cell lysates. The assessment of this challenge will include evaluations that recognize novelty of approach. The challenge is to predict the fractional enzyme activity of each mutant protein compared to the wild-type enzyme.

Acid alpha-glucosidase (GAA, NP_000143.2) is a lysosomal enzyme that hydrolyzes terminal, non-reducing end α1-4 and α1-6 linkages in glycogen. Deficiency of GAA causes Pompe disease (Glycogen Storage Disease II) (GSD-II, OMIM #232300), an autosomal recessive lysosomal storage disorder in which lysosomal glycogen accumulation results in myopathy (Hirschhorn et al., 2010). GAA is a member of glycoside hydrolase family 31 (Carbohydrate Active Enzymes database (Cantarel et al., 2009). GAA is synthesized as a 952 amino acid polypeptide that contains an N-terminal signal peptide. The mature, lysosomal form of GAA is produced as the result of extensive proteolytic processing at both the N- and C-terminal ends. The fully processed, mature protein consists of 4 polypeptides derived from the precursor protein: a catalytic polypeptide of 70 KDa, as well as polypeptides of 19.4, 10.3, and 3.9 KDa (Moreland et al., 2005). The precursor form of GAA, lacking the N-terminal signal peptide, is enzymatically active (Van Hove et al., 1996). GAA functions as a monomer, and a recently released 2.0 Å resolution structure (5KZW) can be found in the PDB.

Deficiency of GAA results in lysosomal glycogen accumulation in multiple tissues, with cardiac and skeletal muscle tissues most seriously affected. Pompe disease is a spectrum disorder with a broad range of clinical manifestations. The most severe form is infantile-onset Pompe disease which presents with prominent cardiomegaly, myocardial failure, generalized muscle hypotonia without muscle-wasting, and death prior to 1 to 2 years of age. Infantile patients have little to no detectable GAA activity. At the other extreme is the slowly progressive adult-onset form of the disease. The late-onset form is generally characterized by slowly progressive proximal muscle weakness and respiratory insufficiency, and can present anytime from childhood until as late as the 2nd to 6th decade of life. It is distinguished from the infantile-onset form by the absence of severe cardiac involvement. Late-onset patients may have residual GAA activity up to 30% of normal (Van der Ploeg & Reuser, 2008). It is estimated that approximately one third of those with Pompe disease have the rapidly fatal infantile-onset form, while the majority of patients present with late-onset Pompe disease (Hirschhorn et al., 2010). While life expectancy can vary, death generally occurs due to respiratory failure (Hirschhorn et al., 2010). The incidence of Pompe disease is believed to be approximately 1:40,000 births (Martiniuk et al., 1998).

BioMarin Pharmaceutical is interested in estimating rare disease incidence using the allele frequencies of pathogenic variants present in publically available exome datasets. The fundamental challenge with this approach is that there are a large number of variants of uncertain significance (VUS) whose contribution to disease is difficult to assess. Furthermore, as in the case of Pompe disease where there is a spectrum of disease severity, estimating disease incidence can become a combinatorics problem. It is not sufficient to simply know whether a variant is pathogenic. One must know its impact on activity and which combinations of mutations lead to each disease phenotype. While such information might be obtained by the careful examination of patient phenotypes and genotypes (Kroos et al., 2012), for uncommon variants, or those of unknown significance this might not be possible due to insufficient data. Quantitation of residual enzymatic activity associated with VUS can help provide the missing information. As an illustration of this problem, there are 450 GAA mutations reported by HGMD (V.2016.1) (Stenson et al., 2003) to be associated with Pompe, and 235 of these are missense mutations. The ExAC dataset (, release 0.3) which comprises ~60,000 individual sequenced exomes, contains 433 rare (MAF < 5%) and common (MAF > 5%) missense mutations in GAA, of which 75 are known to be disease associated according to HGMD. Thus, most of the known disease-associated alleles are absent from the ExAC dataset, and the ExAC dataset contains a large number of missense mutations whose contribution to disease incidence is unknown.

BioMarin has functionally assessed the enzymatic activity of the 357 novel missense mutations in the ExAC dataset. Plasmids containing cDNAs encoding each of the mutant proteins were transfected into an immortalized Pompe patient fibroblast cell line. No GAA activity was present in the cell line - two null mutations. After 72 hours, cells were lysed, and GAA activity in the lysate was assessed using the fluorogenic substrate, 4-methylumbelliferyl α-D-glucoside. The activity units are pMol/min/ug protein. Background subtracted enzyme activity for each mutant was normalized to the background subtracted activity in a cell lysate from cells transfected with the wild-type cDNA and reported as per cent wild-type GAA activity. Each mutant was assayed in at least three independent transfection experiments. The results from these determinations were averaged, and the standard deviation calculated.

Prediction challenge
Participants are asked to submit predictions on the effect of the variants on GAA enzymatic activity. The submitted prediction should be a numeric value ranging from 0 (no activity) to 1 (wild-type level of activity), or >1 if the predicted activity is greater than wild-type activity (e.g., 0.7 means 70% of wild-type and 1.3 means 130% of wild-type activity). Each predicted activity must include a standard deviation. Optionally, a comment on the basis of the prediction may be given. The predictions will be assessed against the numeric values actually measured for each mutation in the enzyme assay. In the previous challenges, it has been observed that predictions often cluster more with other predictions other than with the experimental value. Assessment will include metrics that recognize prediction sets that differ substantially from results provided by standard methods such as Polyphen2 and SIFT.

Download dataset
5-GAA_dataset.txt (4.2 KB)

Download submission template
This submission template file is available only to registered users. Please log in to access the file.

Download submission validation script
This submission validation script is available only to registered users. Please log in to access the file.

Prediction submission format
The prediction submission is a tab-delimited text file. Organizers provide a template file, which should be used for submission. In addition, a validation script is provided, and predictors should check the correctness of the format before submitting their predictions. In the submitted file, each row includes the following columns:

  1. AA substitution - The mutation found in ExAC as listed in the prediction dataset file. Use the order as provided in the template file
  2. Prediction (relative activity) - 1. Prediction of relative GAA activity compared to wild-type GAA on a continuous scale where 0 = no activity, 1 = wild-type activity, >1 = higher activity than wild-type.
  3. Standard deviation -SD of the prediction in column 2, indicating confidence
  4. Comment – 1. optional brief comment on the basis of the prediction in column 2

In the template file, cells in columns 2-4 are marked with a "*". Submit your predictions by replacing the "*" with your value. No empty cells are allowed in the submission. You must enter a prediction and standard deviation for every mutant; if you are not confident in a prediction for a mutant, enter a large standard deviation for the prediction. Optionally, enter a brief comment on the basis of the prediction, otherwise, leave the "*" in these cells. Please make sure you follow the submission guidelines strictly.

In addition, your submission must include a detailed description of the method used to make the predictions, similar to the style of the Methods section in a scientific article. This information will be submitted as a separate file.

To submit predictions, you need to create or be part of a CAGI User group. Submit your predictions by accessing the link "All submission forms" from the front page of your group. For more details, please read the FAQ page.


  1. Cantarel BL, Coutinho PM, Rancurel C, Bernard T, Lombard V, Henrissat B. 2009. The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics. Nucleic Acids Res 37:D233–D238. PMCID: PMC2686590. doi:10.1093/nar/gkn663
  2. Hirschhorn R. 2010. Glycogen Storage Disease Type II: Acid Alpha-Glucosidase (Acid Maltase) Deficiency, in The Metabolic and Molecular Basis of Inherited Disease. New York: McGraw-Hill. p. 3389-3420.
  3. Hove JL Van, Yang HW, Wu JY, Brady RO, Chen YT. 1996. High-level production of recombinant human lysosomal acid alpha-glucosidase in Chinese hamster ovary cells which targets to heart muscle and corrects glycogen accumulation in fibroblasts from patients with Pompe disease. Proc Natl Acad Sci U S A 93:65–70 8552676. PMCID: PMC40179
  4. Kroos M, Hoogeveen-Westerveld M, Ploeg A van der, Reuser AJJ. 2012. The genotype-phenotype correlation in Pompe disease. Am J Med Genet Part C Semin Med Genet 160C:59–68 22253258. doi: 10.1002/ajmg.c.31318
  5. Martiniuk F, Chen A, Mack A, Arvanitopoulos E, Chen Y, Rom WN, Codd WJ, Hanna B, Alcabes P, Raben N, Plotz P. 1998. Carrier frequency for glycogen storage disease type II in New York and estimates of affected individuals born with the disease. Am J Med Genet 79:69–72 9738873.
  6. Moreland RJ, Jin X, Zhang XK, Decker RW, Albee KL, Lee KL, Cauthron RD, Brewer K, Edmunds T, Canfield WM. 2005. Lysosomal Acid α-Glucosidase Consists of Four Different Peptides Processed from a Single Chain Precursor. J Biol Chem 280:6780–6791. doi: 10.1074/jbc.M404008200
  7. Ploeg AT van der, Reuser AJ. 2008. Pompe’s disease. Lancet 372:1342–1353. doi: 10.1016/S0140-6736(08)61555-X
  8. Stenson PD, Ball E V, Mort M, Phillips AD, Shiel JA, Thomas NST, Abeysinghe S, Krawczak M, Cooper DN. 2003. Human Gene Mutation Database (HGMD): 2003 update. Hum Mutat 21:577–81 12754702. doi: 10.1002/humu.10212

Data provided by
Wyatt Clark, Kevin Ru, Karen Yu, Jonathan H. LeBowitz
BioMarin Pharmaceutical, Inc. 105 Digital Drive, Novato CA 94949

Revision history
9 Nov 2017 (v01): initial release
13 Nov 2017 (v02): revised description of enzyme assay