CAGI Scientist Opening: apply here ☀️

Acid alpha-glucosidase (GAA): predict the effect of naturally occurring missense mutations on enzymatic activity

Challenge: GAA missense

Variant data: public

Last updated: 24 April 2018

This challenge is closed.

Make sure you understand our Data Use Agreement and Anonymity Policy


Acid alpha-glucosidase (GAA) is a lysosomal alpha-glucosidase. Some mutations in GAA cause a rare disorder, Pompe disease (Glycogen Storage Disease II). Rare GAA missense variants found in a human population sample have been assayed for enzymatic activity in transfected cell lysates. The assessment of this challenge will include evaluations that recognize novelty of approach. The challenge is to predict the fractional enzyme activity of each mutant protein compared to the wild-type enzyme.


Acid alpha-glucosidase (GAA, NP_000143.2) is a lysosomal enzyme that hydrolyzes terminal, non-reducing end α1-4 and α1-6 linkages in glycogen. Deficiency of GAA causes Pompe disease (Glycogen Storage Disease II) (GSD-II, OMIM #232300), an autosomal recessive lysosomal storage disorder in which lysosomal glycogen accumulation results in myopathy (Hirschhorn et al., 2010). GAA is a member of glycoside hydrolase family 31; see Carbohydrate-Active enZYmes Database (Cantarel et al., 2009). GAA is synthesized as a 952 amino acid polypeptide that contains an N-terminal signal peptide. The mature, lysosomal form of GAA is produced as the result of extensive proteolytic processing at both the N- and C-terminal ends. The fully processed, mature protein consists of 4 polypeptides derived from the precursor protein: a catalytic polypeptide of 70kDa, as well as polypeptides of 19.4, 10.3, and 3.9kDa (Moreland et al., 2005). The precursor form of GAA, lacking the N-terminal signal peptide, is enzymatically active (Van Hove et al., 1996). GAA functions as a monomer, and a recently released 2.0Å resolution structure (5KZW) can be found in the PDB.

Deficiency of GAA results in lysosomal glycogen accumulation in multiple tissues, with cardiac and skeletal muscle tissues most seriously affected. Pompe disease is a spectrum disorder with a broad range of clinical manifestations. The most severe form is infantile-onset Pompe disease which presents with prominent cardiomegaly, myocardial failure, generalized muscle hypotonia without muscle-wasting, and death prior to 1 to 2 years of age. Infantile patients have little to no detectable GAA activity. At the other extreme is the slowly progressive adult-onset form of the disease. The late-onset form is generally characterized by slowly progressive proximal muscle weakness and respiratory insufficiency, and can present anytime from childhood until as late as the 2nd to 6th decade of life. It is distinguished from the infantile-onset form by the absence of severe cardiac involvement. Late-onset patients may have residual GAA activity up to 30% of normal (van der Ploeg & Reuser, 2008). It is estimated that approximately one third of those with Pompe disease have the rapidly fatal infantile-onset form, while the majority of patients present with late-onset Pompe disease (Hirschhorn et al., 2010). While life expectancy can vary, death generally occurs due to respiratory failure (Hirschhorn et al., 2010). The incidence of Pompe disease is believed to be approximately 1:40,000 births (Martiniuk et al., 1998).


BioMarin Pharmaceutical is interested in estimating rare disease incidence using the allele frequencies of pathogenic variants present in publically available exome datasets. The fundamental challenge with this approach is that there are a large number of variants of uncertain significance (VUS) whose contribution to disease is difficult to assess. Furthermore, as in the case of Pompe disease where there is a spectrum of disease severity, estimating disease incidence can become a combinatorics problem. It is not sufficient to simply know whether a variant is pathogenic. One must know its impact on activity and which combinations of mutations lead to each disease phenotype. While such information might be obtained by the careful examination of patient phenotypes and genotypes (Kroos et al., 2012), for uncommon variants, or those of unknown significance this might not be possible due to insufficient data. Quantitation of residual enzymatic activity associated with VUS can help provide the missing information. As an illustration of this problem, there are 450 GAA mutations reported by HGMD (V.2016.1) (Stenson et al., 2003) to be associated with Pompe, and 235 of these are missense mutations. BioMarin selected 357 missense mutations in GAA from the ExAC dataset (, release 0.3) which comprises ~60,000 individual sequenced exomes. Of these, 355 have MAF < 5% and two have MAF > 5%, 75 of these are reported as disease associated by HGMD. Thus, most of the alleles reported to be pathogenic by HGMD are absent from the ExAC dataset, and the ExAC dataset contains a large number of missense mutations whose contribution to disease incidence is unknown.

BioMarin has functionally assessed the enzymatic activity of the 357 novel missense mutations in the ExAC dataset. Plasmids containing cDNAs encoding each of the mutant proteins were transfected into an immortalized Pompe patient fibroblast cell line. This cell line has no GAA activity. After 72 hours, cells were lysed, and GAA activity in the lysate was assessed using the fluorogenic substrate, 4-methylumbelliferyl α-D-glucoside. The activity units are pMol/min/ug protein. Background subtracted enzyme activity for each mutant was normalized to the background subtracted activity in a cell lysate from cells transfected with the wild-type cDNA and reported as per cent wild-type GAA activity. Each mutant was assayed in at least three independent transfection experiments. The results from these determinations were averaged, and the standard deviation calculated.

Prediction challenge

Participants are asked to submit predictions on the effect of the variants on GAA enzymatic activity. The submitted prediction should be a numeric value ranging from 0 (no activity) to 1 (wild-type level of activity), or >1 if the predicted activity is greater than wild-type activity (e.g., 0.7 means 70% of wild-type and 1.3 means 130% of wild-type activity). Each predicted activity must include a standard deviation. Optionally, a comment on the basis of the prediction may be given. The predictions will be assessed against the numeric values actually measured for each mutation in the enzyme assay. In the previous challenges, it has been observed that predictions often cluster more with other predictions other than with the experimental value. Assessment will include metrics that recognize prediction sets that differ substantially from results provided by standard methods such as PolyPhen-2 and SIFT.

Download dataset: 5-GAA_dataset.txt 

Prediction submission format 

The prediction submission is a tab-delimited text file. Organizers provide a template file, which should be used for submission. In addition, a validation script is provided, and predictors should check the correctness of the format before submitting their predictions. In the submitted file, each row includes the following columns:

In the template file, cells in columns 2-4 are marked with a "*". Submit your predictions by replacing the "*" with your value. No empty cells are allowed in the submission. You must enter a prediction and standard deviation for every mutant; if you are not confident in a prediction for a mutant, enter a large standard deviation for the prediction. Optionally, enter a brief comment on the basis of the prediction, otherwise, leave the "*" in these cells. Please make sure you follow the submission guidelines strictly.

In addition, your submission must include a detailed description of the method used to make the predictions, similar to the style of the Methods section in a scientific article. This information will be submitted as a separate file.

To submit predictions, you need to create or be part of a CAGI User group. Submit your predictions by accessing the link "All submission forms" from the front page of your group. For more details, please read the FAQ page.


Cantarel BL, et al. The Carbohydrate-Active EnZymes database (CAZy): an expert resource for glycogenomics. Nucleic Acids Res (2009) 37(Database issue):D233-D238. PubMed 

Hirschhorn R, et al. Glycogen storage disease type II: acid alpha-glucosidase (acid maltase) deficiency, in The Metabolic and Molecular Basis of Inherited Disease. New York: McGraw-Hill. pp. 3389-3420 (2010). 

Kroos M, et al. The genotype-phenotype correlation in Pompe disease. Am J Med Genet C Semin Med Genet (2012) 160C(1):59-68. PubMed 

Martiniuk F, et al. Carrier frequency for glycogen storage disease type II in New York and estimates of affected individuals born with the disease. Am J Med Genet (1998) 79(1):69-72. PubMed 

Moreland RJ, et al. Lysosomal acid α-glucosidase consists of four different peptides processed from a single chain precursor. J Biol Chem (2005) 280(8):6780-6791. PubMed 

Stenson PD, et al. Human Gene Mutation Database (HGMD): 2003 update. Hum Mutat (2003) 21(6):577-581. PubMed 

van der Ploeg AT, Reuser AJJ. Pompe’s disease. Lancet (2008) 372(9646):1342-1353. PubMed 

Van Hove JL, et al. High-level production of recombinant human lysosomal acid alpha-glucosidase in Chinese hamster ovary cells which targets to heart muscle and corrects glycogen accumulation in fibroblasts from patients with Pompe disease. Proc Natl Acad Sci U S A (1996) 93(1):65-70. PubMed 

Data provided by 

Wyatt Clark, Kevin Ru, Karen Yu, Jonathan H. LeBowitz 

BioMarin Pharmaceutical, Inc. 105 Digital Drive, Novato CA 94949

Revision history 

9 November 2017: initial release 

13 November 2017: revised description of enzyme assay 

30 November 2017: more details on the submission template and variant chosen added

11 January 2018: closing date extended to 25 April 2018

16 April 2018: UniProtKB/Swiss-Prot: P10253.4 and closing date added

24 September 2018: Dataset availability added