CAGI Scientist Opening: apply here ☀️

Human SUMO ligase (UBE2l): predict the effects of missense mutations on competitive growth in a high-throughput yeast complementation assay

Challenge: SUMO ligase

Dataset description: public

Variant data: public

Last updated: 7 April 2016

This challenge closed at 9:00 PM PST (Pacific Standard Time) on 18 November 2015.

Download answer key, predictions, and assessment: registered users only, limited by CAGI Data Use Agreement. The answer key, predictions, and assessment files are accessible to registered users only, and their use is limited by the CAGI Data Use Agreement. Please log in access the file.

Presentations from the CAGI 4 conference: registered users only, limited by CAGI Data Use Agreement. Presentations are accessible to registered users only, and their use is limited by the CAGI Data Use Agreement. Please log in to access the file.


SUMO ligase identifies target proteins and covalently attaches SUMO to them, thereby modulating the functions of hundreds of proteins including proteins implicated in cancer, neurodegeneration, and other diseases. A large library of missense mutations in human SUMO ligase has been assessed for competitive growth in a high-throughput yeast-based complementation assay. The challenge is to predict the effect of mutations on function, as measured by the change in fractional representation of each mutant SUMO ligase clone, relative to wild-type clones, in a competitive yeast growth assay.


The human genome encodes several small ubiquitin-like modifier proteins (SUMOs) that collectively 'tag' and modulate the functions of hundreds of proteins, including proteins implicated in cancer, neurodegeneration, and other diseases. As the only human SUMO-conjugating protein (SUMO E2 ligase), UBE2l is solely responsible for identifying target proteins and covalently attaching SUMO [1]. UBE2I , also known as UBE9 , is the human counterpart of the S. cerevisiae gene UBC9 . UBE2l is reportedly targeted by multiple viruses, including HIV and the DNA tumor viruses EBV and HPV [2]. The biological role of Ube2I and the wealth of structural data make it an ideal test case for predicting the effects of mutation. See the additional information below for a list of structure datasets.

The Roth Lab has generated a library of over 6,000 UBE2I clones, each tagged with a DNA barcode (i.e., a unique 25 bp sequence). These clones collectively express nearly 2,000 unique amino acid changes in various combinations. They have also implemented a yeast-based complementation assay in which expression of human UBE2I in S. cerevisiae rescues a temperature-sensitive mutant version of yeast UBC9 (Sun et al, submitted). Thus, a library expressing mutant human UBE2I clones in yeast can be grown competitively and quantified via DNA barcode sequencing to assess the functional impact of individual UBE2I variants.


The complementation screen has been conducted as follows. The library of mutant clones was pooled and transformed into a mutant yeast strain carrying the temperature-sensitive allele of UBC9 . The pool was then split into six replicate plates; three replicates were grown at the permissive temperature (25°C; growth not dependent on human protein; no selection for mutant clone function), and three replicates were grown at the restrictive temperature (37°C; growth dependent on human protein; selection for mutant clone function). After 48 hours, the confluent plates were scraped and the barcode loci amplified in preparation for next-generation sequencing. Barcodes were then counted and used to calculate the relative abundance of each clone in the pool for each condition and replicate. The ratio of relative abundance of a given barcode in the restrictive (selective) temperature sample to that of the permissive (non-selective) temperature sample is a measure of clone-dependent fitness. It is therefore also a measure of the function of the corresponding mutant protein.

The competitive growth scores are calculated as follows. For each clone in the assay, the average barcode read count at the restrictive temperature is divided by the average barcode read count at the permissive temperature. These ratios are then normalized to the ratio observed for an aggregated collection of clones that are considered to be ‘wild-type’. Thus, the growth score is a ratio of ratios, it will be zero if no barcode reads are observed for a mutant clone at the restrictive temperature, and it will be 1 for the wild-type clones, by definition.

To determine whether changes in relative barcode abundance between permissive and selective temperature samples are significant, a Student’s t-test was used. Benjamini-Hochberg corrected q-values were then derived from the t-test p-values and used to filter the results (q < 0.05). Clones for which replicate experiments at the permissive temperature yielded fewer than ten barcode counts on average were not considered to have been well measured and were excluded from the downstream analysis. Thus, some clones and mutations represented in the challenge set may not be represented in the final results. 

Prediction challenge

Participants are asked to submit predictions of the effect of the variants on competitive growth. The submitted prediction should be a numeric value between 0 (no growth at the restrictive temperature) and 1 (wild-type growth), or >1 (more growth than wild type). Please note: the experimental scores are a measure of fitness in a competitive growth assay and have not been calibrated to correspond to percent of wild-type protein function. To help participants calibrate their numeric values appropriately, we provide the experimental distribution of numeric growth scores.

Where multiple mutations are listed for a mutant protein, the challenge is to predict the combined effect of those mutations. The predictions will be assessed against the numeric values actually measured for each mutant clone in the competitive growth assay. Each predicted value must include a standard deviation indicating confidence. Optionally, a brief comment on the basis of the prediction may be given.

The dataset is divided into three subsets of mutants. Participants may submit predictions on any or all of these subsets:

Download dataset: 4-SUMO_ligase_dataset.txt (82 KB)

Download experimental distribution: 4-SUMO_ligase_distribution_v2.tsv (50 KB). Click HERE for details of the experimental distribution of growth scores.

Download submission template: This submission template file is available only to registered users. Please log in to access the file.

Download submission validation script: This submission validation script is available only to registered users. Please log in to access the file.

Prediction submission format 

The prediction submission is a tab-delimited text file. Organizers provide a template file, which must be used for submission. In addition, a validation script is provided, and predictors should check the correctness of the format before submitting their predictions.

In the submission template file, the three subsets of mutants are separated by header rows beginning with a “#”. These headers must be retained in the submitted file. Each data row in the submitted file must include the following columns:

In the template file, cells in columns 2-4 are marked with a "*". Submit your predictions by replacing the "*" with your value. No empty cells are allowed in the submission. For a given subset, you must submit predictions and standard deviations for all or none of the variants; if you are not confident in a prediction for a variant, enter a large standard deviation for the prediction. Optionally, enter a brief comment on the basis of the prediction. If you do not enter predictions for a subset of variants, or if you do not enter a comment on a prediction, leave the "*" in those cells. Please make sure you follow the submission guidelines strictly.

In addition, your submission should include a detailed description of the method used to make the predictions, similar to the style of the Methods section in a scientific article. This information will be submitted as a separate file.

To submit predictions, you need to create or be part of a CAGI User group. Submit your predictions by accessing the link "All submission forms" from the front page of your group. For more details, please read the FAQ page.


Predictions will be assessed by an independent assessor. A test of rank correlation (e.g., Kendall's tau) is anticipated to be an important part of the assessment, as is the RMS deviation of predictions from experimental observations. The independent assessor is expected to employ a range of other tests.

Additional information 

Entrez Gene ID: 7329

UniprotKB: P63279

PDB ID of Ube2I crystal structure: 1A3S

PDB IDs of Ube2I complex co-crystal structures: 1KPS, 2GRN, 2O25, 2PE6, 2UYZ, 2VRR, 3UIN, 3UIO, 3UIP, 4Y1L


Dataset citation

Weile J, Cote AG, Sun S, Knapp J, Verby M, Yang F, Tan, G., Mellor J., Andrews B., Vidal M, Hill, D., Boone C, Roth FP. An atlas of functional amino acid changes in human SUMO and SUMO ligase. (in preparation). 


4 Sep 2015 (v01): initial release

28 Oct 2015 (v02): submission instructions and template updated, validation script provided

4 Nov 2015 (v03): experimental distribution of growth scores provided 

12 Nov 2015 (v04): updated growth score distribution with estimated measurement errors provided for each subset of mutants, improved validation script provided 

18 Dec 2015 (v05): answer key provided 

18 Mar 2016 (v06): predictions provided 

7 Apr 2016 (v11): conference presentations provided