Predicting the effect of missense mutations on protein stability in Arylsulfatase A

Challenge: ARSA

Variant data: public

Last updated: 15 September 2025

This challenge is open. The challenge closes on September 30, 2025.

How to participate in CAGI7?                         Download data & submit predictions on Synapse 

Make sure you understand our Data Use Agreement and Anonymity Policy

Summary 

Metachromatic Leukodystrophy (MLD) is an autosomal recessive, lysosomal-storage disease caused by mutations in Arylsulfatase A (ARSA) and toxic accumulation of sulfatide substrate. Genome sequencing has revealed hundreds of protein-altering, ARSA missense variants, but the functional effect of most variants remains unknown. ARSA protein stability was measured using a high-throughput cellular degradation assay for a large set of variants. The challenge is to predict the fractional protein stability of each of the 8,867 missense mutant protein at 48 hours post-expression.

Background 

Arylsulfatase A (ARSA, E.C. 3.1.6.8, ENST00000216124.5 also known as cerebroside sulfatase) is a lysosomal enzyme that hydrolyzes galactose-3-sulfate residues in a number of lipids and ascorbate-2-sulfate and many phenol sulfates. ARSA is synthesized as a 507 amino acid polypeptide that contains an N-terminal signal peptide. Mature arylsulfatase A is a 51.1 kDa protein with 489 amino acids and three glycosylation sites, Asp 158, 184, and 350, forming an octamer at lysosomal pH value (Stütz & Wrodnigg, 2016). The first reported crystal structure of human ARSA (PDB 1AUK) shows a homooctamer composed of a tetramer of dimers (Lukatela et al., 1998). An unusual structural feature is a formylglycine residue in the hydrate form at position 69 which in concert with an octahedrally coordinated Mg2+ is required for enzymatic activity.

Deficiency of ARSA causes Metachromatic Leukodystrophy (MLD, OMIM #250100), an autosomal recessive lysosomal-storage disorder in which sulfatide buildup in cells, particularly in the brain, spinal cord and peripheral nerves, leads to progressive demyelination disease, resulting in a variety of neurological symptoms and ultimately death (Greene et al., 1967). Patients are generally categorized into four subtypes based on age of onset: late-infantile (0 to 2.5 years, most severe form), early-juvenile (2.5 to 7 years), late-juvenile (7 to 16 years), and adult (16 years and older). 

Preliminary results suggest that early intervention, in pre-symptomatic patients, is imperative, underscoring the importance of newborn screening (NBS) for MLD. NBS screening is complicated by pseudodeficiency variants (Patil & Maegawa, 2013) that decrease ARSA activity to 10-15% of the wildtype range without causing MLD, implying that this level of ARSA activity is sufficient for physiologically hydrolyzing sulfatides.

The incidence of MLD is estimated to range between 1:40,000 and 1:160,000 births according to the National Organization for Rare Disorders, with the most severe form being infantile/late-infantile onset MLD, characterized by rapid progression of psychomotor regression resulting in ataxia and weakness with areflexia. Some children have only signs of a progressive peripheral neuropathy during several months, before central nervous system involvement becomes apparent. Death occurs within a few years after the onset of symptoms. The late-infantile form is genetically characterized by homozygosity or compound heterozygosity for alleles that result in complete or near-complete ablation of enzymatic activity of ARSA, resulting in rapid accumulation of sulfatides and rapid disease progression. 

In the early-juvenile and late-juvenile forms of MLD, disease progression is slower than in the late-infantile form. However, once neurological signs become more evident, decline tends to be rapid, and patients eventually lose all acquired skills. Spasticity becomes prominent, and many patients also develop epilepsy. The end stage of the disease can last several years, with variable duration. Patients with early- or late-juvenile onset mostly carry one allele that allows for the expression of low levels of residual enzyme activity. 

In the adult form of MLD, intellectual and behavioral changes, such as memory deficits or emotional instability, are typically the first presenting symptoms. Mild polyneuropathy often emerges at a later stage, and disease progression is generally slower than in the late-infantile, early-juvenile, or late-juvenile forms. Death usually occurs decades after disease onset. Patients with the adult form of MLD often carry one mild mutation, allowing for the expression of small amounts of functional enzyme, which delays the process of sulfatide accumulation and thus postpones symptom onset.

Experiment

Protein Stability Assay Methodology:

ARSA protein stability was assessed using a novel high-throughput cellular degradation assay. The experimental system uses a doxycycline-inducible expression system with ARSA-EGFP fusion proteins in mammalian cells.

Cell System Design:

  • Bicistronic mRNA construct with an IRES site between ARSA cDNA and mCherry cDNA ensures every cell expresses both proteins
  • Single genomic integration site ensures each cell expresses only one ARSA variant
  • iCasp9 selection system eliminates non-integrated cells
  • Red fluorescence (mCherry) confirms protein expression in all cells

Construct Design:

The ARSA cDNA construct contains the native N-terminus, including the signal peptide, as ARSA is a lysosomal enzyme requiring proper subcellular localization. The eGFP reporter is fused to the C-terminus of ARSA with a small linker positioned immediately after the native C-terminal amino acid of ARSA.

Assay Principle:

The assay measures protein degradation over time by monitoring the loss of green fluorescence (EGFP) relative to red fluorescence (mCherry internal control). Misfolded proteins are preferentially degraded by cellular quality control mechanisms, leading to decreased green/red fluorescence ratios.

Experimental Workflow:

  • Cells are induced with doxycycline to express ARSA-EGFP variants
  • FACS analysis measures green/red fluorescence ratios at 0, 24, and 48 hours post-induction
  • Cells in the lowest green/red ratio bin are collected and analyzed
  • Critical timepoint: 48 hours post-induction shows clear discrimination between stable and unstable variants (24 hours shows minimal differences)

Data Characteristics:

  • The dataset comprises 9,215 missense variants in the ARSA gene — 8,867 variants are included in the challenge for participants to predict, and 348 are provided as sample data in the cagi7arsasample.tsv file; see below.
  • Protein stability range: 
    • At 0 hours: 58.1-99.9% protein stability (wildtype ARSA 97.4%)
    • At 24 hours: 0.5-99.7% protein stability (wildtype ARSA 88.8%)
    • At 48 hours: 4.6-95.9% protein stability (wildtype ARSA 78.4%)
  • Validation: Results correlate with genotype-phenotype relationships in 90 confirmed MLD patients from Children's Hospital of Philadelphia (CHOP) cohort
  • Complete variant coverage: PacBio sequencing confirmed all possible missense variants were present in the initial library

Clinical Correlation and Newborn Screening Context:

This assay was developed specifically to support clinical decision-making in newborn screening programs for MLD. Current newborn screening achieves no false positives or false negatives in over 500 million tests, but determining disease severity and treatment urgency requires understanding variant pathogenicity.

Clinical Interpretation Guidelines:

  • Protein stability <58.2%: Strong predictor of severe MLD requiring immediate intervention
  • Protein stability >58.2%: Requires additional enzymatic activity assessment
  • Caveat: Some variants may be stable but enzymatically inactive (e.g., A214V folds normally but has near-zero enzymatic activity, still causing severe MLD)

Prediction challenge

Participants are asked to submit predictions on the impact of each of the 8,867 missense variants on ARSA protein stability. The submitted protein stability prediction should be a numeric value representing the percentage of protein remaining at 48 hours post-expression. The scale is:

  • 0 = complete protein degradation (0% remaining)
  • 1 = perfect protein stability (100% remaining, no degradation)
  • 0.784 = wildtype ARSA stability level at 48 hours post-expression (78.4% remaining)
  • Values between 0-1 = percentage of protein remaining (e.g., 0.4 means 40% of protein remains)
  • Based on experimental observations, the range is approximately 0.046-0.959 at 48 hours post-expression (4.6% to 95.9% protein remaining)

Each predicted protein stability must include a standard deviation that indicates the confidence in the provided prediction score; i.e., low spread suggests confident predictions and high spread suggests lack of confidence. 

Optionally, a comment on the basis of the prediction may be given.

Submission format 

The prediction submission is a tab-delimited text file. Organizers provide a template file, which must be used for submission. In addition, a validation script is provided, and predictors must check the correctness of the format before submitting their predictions.

Each data row in the submitted file must include the following columns:

  • Column 1 aa_substitution: AA substitution – The mutation found as listed in the variant file; e.g., P6L. These variants are relative to RefSeq NP_000478.3. This sequence can also be found as the main isoform of the UniProtKB P15289, but note that the first two amino acids are removed and so P6L relative to NP_000478.3 would be P4L relative to P15289-1.
  • Column 2 stability_score_48hr: Protein stability score –  Non-negative prediction of relative ARSA protein stability on a continuous scale where 0 = complete degradation (0% remaining), 1 = perfect stability (100% remaining), 0.784 = wildtype level (78.4% remaining).
  • Column 3 sd: Standard deviation – Non-negative number indicating the confidence of the prediction in column 2 (low SD = high confidence, high SD = low confidence)
  • Column 4 comment: Comment – optional brief comment on the basis of the prediction in column 2.

In the template file, cells in columns 2-4 are marked with a "*". Submit your predictions by replacing the "*" with your value. No empty cells are allowed in the submission. You must enter a prediction and standard deviation for every variant; if you are not confident in a prediction for a variant, enter a large standard deviation for the prediction. Optionally, enter a brief comment on the basis of the prediction, otherwise, leave the "*" in these cells. Please make sure you follow the submission guidelines strictly.

In addition, your submission must include a detailed description of the method used to make the predictions, similar to the style of the Methods section in a scientific article. This information will be submitted as a separate file.

File naming

CAGI allows submission of up to six models per team, of which model 1 is considered primary. You can upload predictions for each model multiple times; the last submission before deadline will be evaluated for each model. If you are submitting a single file with all predictions combined, please use the format below.

Use the following format for your submissions: <teamname>_model_(1|2|3|4|5|6).(tsv|txt)

To include a description of your method, use the following filename: <teamname>_desc.*

Example: if your team’s name is “bestincagi” and you are submitting predictions for your model number 3, your filename should be bestincagi_model_3.txt.

Sample data

We provide sample data for 348 variants in cagi7arsasample.tsv (see below), which includes protein stability scores from this challenge (first five columns) and enzymatic activity measurements from the previous CAGI6 ARSA challenge (remaining columns, as reported in Supplementary Table S3 of Trinidad et al., 2023). Note that the file contains 349 rows because the missense variant G261R is represented by two different DNA-level SNPs, each with corresponding database information. Note that enzymatic activity does not always correlate with protein stability—some variants may be stable but catalytically inactive. While this sample file includes additional annotations, participants are only required to submit predictions for the columns specified in the submission format.

The participants may wish to use variants from ClinVar, gnomAD, HGMD, MaveDB, and UniProtKB to train and calibrate their models.

Download data 

The variants are provided using the HGVS variant nomenclature for protein sequences.

Download the list of 8,867 missense variants: cagi7arsavariantlist.txt

Download the submission template for these 8,867 missense variants: cagi7arsasubmissiontemplate.tsv 

Download the sample data of 348 missense variants: cagi7arsasample.tsv and the corresponding legend cagi7arsasamplelegend.txt

Download submission validation script: cagi7arsavalidation.py 

Assessment

This challenge follows the tradition of CAGI challenges that assess the predictions of biochemical effects for missense variants. The evaluation protocols and metrics will follow those described by The Critical Assessment of Genome Interpretation Consortium (2024). Predictions will be assessed by an independent assessor, Yu-Jen (Jennifer) Lin, UC Berkeley. 

Potential Evaluation Approaches:

  • Binary Classification Analysis: We may evaluate how well predicted stability scores can distinguish between variants above and below clinically and biologically important thresholds. This would assess whether computational methods can identify variants that fall into meaningful functional categories based on experimental measurements.
  • Continuous Score Comparison: We may assess how well the continuous predicted scores relate to the experimental stability measurements. This could measure the overall agreement between predicted and observed stability values across the range of variant effects.
  • Additional Considerations: Assessment may include metrics that recognize prediction sets that differ substantially from results provided by standard computational predictor methods. This is because, in the previous challenges, it has been observed that predictions often cluster more with other predictions other than with the experimental value (Clark et al, 2019).

We hope to give participants a sense of how predictions might be assessed without restricting their approach to the problem or predetermining their methodological decisions. Hence, the final evaluation metrics will be released only after the challenge ends. 

Dataset provided by

Michael H. Gelb, University of Washington

Related challenges 

References 

Clark WT, et al. Assessment of predicted enzymatic activity of α-N-acetylglucosaminidase variants of unknown significance for CAGI 2016. Hum Mutat (2019) 40(9):1519-1529. PubMed 

Greene H, et al. Arylsulfatase A in the urine and metachromatic leukodystrophy. J Pediatr (1967) 71(5):709-711. PubMed 

Lukatela G, et al. Crystal structure of human arylsulfatase A: the aldehyde function and the metal ion at the active site suggest a novel mechanism for sulfate ester hydrolysis. Biochemistry (1998) 37(11):3654-3664. PubMed 

Patil SA, Maegawa GH. Developing therapeutic approaches for metachromatic leukodystrophy. Drug Des Devel Ther (2013) 7:729-745. PubMed 

Stütz AE, Wrodnigg TM. Carbohydrate-processing enzymes of the lysosome: diseases caused by misfolded mutants and sugar mimetics as correcting pharmacological chaperones. Adv Carbohydr Chem Biochem (2016) 73:225-302. PubMed 

The Critical Assessment of Genome Interpretation Consortium. CAGI, the Critical Assessment of Genome Interpretation, establishes progress and prospects for computational genetic variant interpretation methods. Genome Biol (2024) 25(1): 53. PubMed  

Trinidad M, et al. Predicting disease severity in metachromatic leukodystrophy using protein activity and a patient phenotype matrix. Genome Biol (2023) 24(1):172. PubMed

Revision history 

25 June 2025: challenge preview posted

27 June 2025: minor edits in the description, challenge open

15 September 2025: submission deadline extended from September 15 to September 30