Critical Assessment of Genome Interpretation

N-acetyl-glucosaminidase (NAGLU): predict the effect of naturally occurring missense mutations on cellular enzymatic activity

Challenge: NAGLU

Dataset description: public

Variant data: public

Last updated: 7 April 2016

This challenge closed at 9:00 PM PST (Pacific Standard Time) on 8 December 2015.

Download answer key, predictions, and assessment: registered users only, limited by CAGI Data Use Agreement. The answer key, predictions, and assessment files are accessible to registered users only, and their use is limited by the CAGI Data Use Agreement. Please log in access the file.

Presentations from the CAGI 4 conference: registered users only, limited by CAGI Data Use Agreement. Presentations are accessible to registered users only, and their use is limited by the CAGI Data Use Agreement. Please log in to access the file.

Summary

NAGLU is a lysosomal glycohydrolyase. Deficiency of NAGLU causes the rare disorder Mucopolysaccharidosis IIIB or Sanfilippo B disease. Naturally occurring NAGLU mutants have been assayed for enzymatic activity in transfected cell lysates. The challenge is to predict the fractional activity of each mutant protein compared to the wild-type enzyme.

Background

N-acetyl-glucosaminidase (NAGLU, NP_000254.2) is a lysosomal enzyme that hydrolyzes N-acetyl D-glucosamine from the nonreducing end of heparan sulfate (HS). The human protein is a homo-trimer composed of 720 amino acids in each subunit (not including the 23 amino acid signal peptide) (1-3). The predicted molecular weight of each subunit is 80,345 daltons. Each subunit has seven potential glycosylation sites. NAGLU is proteolytically processed in the lysosome to a mature form, but the precursor protein is enzymatically active. NAGLU is a member of the family 89 glycohydrolyases (Carbohydrate Active Enzymes database: http://www.cazy.org/) (4) for which some structural information exists including a recent patent that describes a 2.9 Å resolution structure of NAGLU (5,6). Coordinates of the human protein structures are not in the PDB but are available on the USPTO web site.

Deficiency of NAGLU causes Mucopolysaccharidosis IIIB (MPS IIIB) or Sanfilippo B disease (7,8) (OMIM #252920), an autosomal recessive lysosomal storage disorder in which lysosomal HS accumulation causes a neurodegenerative disease whose clinical presentation includes intellectual disability that progresses to dementia, behavioral disturbances, and death in the second or third decade, reviewed in (9). BioMarin is currently developing an enzyme replacement therapy to treat MPS IIIB patients.

MPS IIIB is an Orphan indication with a birth incidence that varies substantially in European populations with higher incidences found in southern European countries: 0.08/100,000 in France, 0.21/100,000 in the UK, 0.36/100,000 in Germany, 0.42/100,000 in the Netherlands, 0.78/100,000 in Greece, 2.6/100,000 in Germans of Turkish descent, and 0.78/100,000 in Portugal (10-13). One of the issues in working with such rare disorders is the challenge of developing accurate assessments of disease incidence. Given the large and ever expanding number of publically available whole exome sequencing data, it should be possible to extract incidence information from allele frequencies in such datasets, provided one has a knowledge of which mutations are associated with the disease phenotype. There are 153 NAGLU mutations reported by HGMD (14) to be associated with MPS IIIB, and 90 of these are missense mutations. The ExAC dataset (http://exac.broadinstitute.org, release 0.3) which comprises ~60,000 individual sequenced exomes, contains 189 missense mutations in NAGLU, of which 24 are known to be disease associated. Thus, most of the known disease-associated alleles are absent from the ExAC dataset, and the ExAC dataset contains a large number of missense mutations whose contribution to disease incidence is unknown.

This illustrates the fundamental problem with efforts to extract incidence information from allele frequency data: given a rare disease caused by a large number of ultra-rare mutations, there will be a large number of variants of unknown significance whose contribution to disease is difficult to assess.

BioMarin is functionally assessing the enzymatic activity of each of the 165 novel missense mutations in the ExAC dataset. Plasmids containing cDNAs encoding each of the mutant proteins are being transfected into HEK293 cells. After 72 hours, cells are lysed, and NAGLU activity in the lysate is assessed using the fluorogenic substrate 4-Methylumbelliferyl N-acetyl-α-D-glucosaminide. The activity units are pMol/min/ug protein. The background activity arising from endogenous NAGLU is subtracted using activity levels obtained from a mock transfection with empty vector. Background subtracted enzyme activity for each mutant is normalized to the background subtracted activity in a cell lysate from cells transfected with the wild-type cDNA and reported as per cent wild-type NAGLU activity. Each mutant is being assayed in at least three independent transfection experiments, and the results from these three determinations will be averaged, and the standard deviation will be calculated.

Prediction challenge

Participants are asked to submit predictions on the effect of the variants on NAGLU enzymatic activity. The submitted prediction should be a numeric value ranging from 0 (no activity) to 1 (wild-type level of activity) or >1 if the predicted activity is greater than wild-type activity (e.g. 0.7 means 70% of wild-type and 1.3 means 130% of wild-type activity). Each predicted activity must include a standard deviation. Optionally, a comment on the basis of the prediction may be given. The predictions will be assessed against the numeric values actually measured for each mutation in the enzyme assay.

Download dataset: 4-NAGLU_dataset.txt (4.2 KB)

Download submission template: This submission template file is available only to registered users. Please log in to access the file.

Download submission validation script: This submission validation script is available only to registered users. Please log in to access the file.

Prediction submission format

The prediction submission is a tab-delimited text file. Organizers provide a template file, which should be used for submission. In addition, a validation script is provided, and predictors should check the correctness of the format before submitting their predictions. In the submitted file, each row includes the following columns:

AA substitution - The mutation as listed in the prediction dataset file. Use the order as provided in the template file
Prediction (relative activity) - Prediction of relative NAGLU activity compared to wild-type NAGLU: 0 = no activity, 1 = wild-type activity, >1 = more activity than wild-type NAGLU
Standard deviation - SD of the prediction in column 2 indicating confidence
Comment - optional brief comment on the basis of the prediction in column 2

In the template file, cells in columns 2-4 are marked with a "*". Submit your predictions by replacing the "*" with your value. No empty cells are allowed in the submission. You must enter a prediction and standard deviation for every mutant; if you are not confident in a prediction for a mutant, enter a large standard deviation for the prediction. Optionally, enter a brief comment on the basis of the prediction;, otherwise, leave the "*" in these cells. Please make sure you follow the submission guidelines strictly.

In addition, your submission should include a detailed description of the method used to make the predictions, similar to the style of the Methods section in a scientific article. This information will be submitted as a separate file.

To submit predictions, you need to create or be part of a CAGI User group. Submit your predictions by accessing the link "All submission forms" from the front page of your group. For more details, please read the FAQ page.

References

Di Natale P, et al. Biosynthesis of alpha-N-acetylglucosaminidase in cultured human kidney carcinoma cells. Enzyme (1985) 33(2):75-83. PubMed
Kan SH, et al. Delivery of an enzyme-IGFII fusion protein to the mouse brain is therapeutic for mucopolysaccharidosis type IIIB. Proc Natl Acad Sci U S A (2014) 111(41):14870-14875. PubMed
Weber B, et al. Cloning and expression of the gene involved in Sanfilippo B syndrome (mucopolysaccharidosis III B). Hum Mol Genet (1996) 5(6):771-777. PubMed
Lombard V, et al. The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res (2014) 42(Database issue):D490-D495. PubMed
Ficko-Blean E, et al. Structural and mechanistic insight into the basis of mucopolysaccharidosis IIIB. Proc Natl Acad Sci U S A (2008) 105(18):6560-6565. PubMed
Meiyappan M, et al. (2014) Crystal structure of human alpha-N-acetylglucosaminidase. Shire Human Genetic Therapies, Inc., USA US Patent 8,775,146 B2. Google Patents
O'Brien JS. Sanfilippo syndrome: profound deficiency of alpha-acetylglucosaminidase activity in organs and skin fibroblasts from type-B patients. Proc Natl Acad Sci U S A (1972) 69(7):1720-1722. PubMed
von Figura K and Kresse H. The sanfilippo B corrective factor: a N-acetyl-alpha-D-glucosamindiase. Biochem Biophys Res Commun (1972) 48(2):262-269. PubMed
Valstar MJ, et al. Sanfilippo syndrome: a mini-review. J Inherit Metab Dis (2008) 31(2):240-252. PubMed
Baehner F et al. Cumulative incidence rates of the mucopolysaccharidoses in Germany. J Inherit Metab Dis (2005) 28(6):1011-1017. PubMed
Heron B, et al. Incidence and natural history of mucopolysaccharidosis type III in France and comparison with United Kingdom and Greece. Am J Med Genet A (2011) 155A(1):58-68. PubMed
Pinto R, et al. Prevalence of lysosomal storage diseases in Portugal. Eur J Hum Genet (2004) 12(2):87-92. PubMed
Poorthuis BJ, et al. The frequency of lysosomal storage diseases in The Netherlands. Hum Genet (1999) 105(1-2):151-156. PubMed
Stenson PD, et al. The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine. Hum Genet (2014) 133(1):1-9. PubMed

Data provided by

Jonathan H. LeBowitz, Wyatt T. Clark, G. Karen Yu

BioMarin Pharmaceutical, Inc. 105 Digital Drive, Novato CA 94949

Dataset citation

Clark WT, et al. Utilizing ExAC to assess the hidden contribution of variants of unknown significance to Sanfilippo Type B incidence. PLoS One (2018)13(7):e0200008. PubMed

Updates

3 Aug 2015 (v01): initial release

4 Aug 2015 (v02): revised description of enzyme assay

5 Aug 2015 (v03): repaired link to dataset file

4 Sep 2015 (v04): challenge close date added

28 Oct 2015 (v05): submission instructions and template updated; validation script provided

7 Nov 2015 (v06): submission deadline extended

12 Nov 2015 (v07): improved validation script provided

23 Dec 2015 (v08): answer key provided

25 Jan 2016 (v09): updated answer key provided with corrected value for R737C

18 Mar 2016 (v10): predictions provided

7 Apr 2016 (v11): conference presentations provided

12 June 2023: dataset citation added

Center for Critical Assessment of Genome Interpretation

Register/Login

Critical Assessment of Genome Interpretation