CAGI Scientist Opening: apply here ☀️

The fitness of tagged transposon insertions in Shewanella oneidensis strain MR-1 under stress conditions

Dataset description: public

Prediction dataset: public

The CAGI submission deadline for the MR-1 challenge has passed as of October 4, 2011, 3PM EDT. We welcome the upload of predictions after the deadline for archival and comparison purposes, but these post-deadline submissions are not part of the CAGI experiment.

Background 

Shewanella oneidensis strain MR-1 (formerly known as S. putrefaciens) is a model organism for studying metal reduction, as MR-1 can utilize a wide range of metal ions and solid metals as electron acceptors and also grows aerobically. MR-1 is in the same division of bacteria as E. coli (the γ-Proteobacteria), but they are not closely related. Of the ~4,500 proteins in MR-1, only about a third have orthologs in E. coli. The MR-1 genome sequence was published in 2002 (Heidelberg et al., 2002; doi:10.1038/nbt749 and the annotation has been curated since. A few hundred papers have been published on MR-1, and hundreds of gene expression experiments are publicly available.

The Adam Arkin Lab at UC Berkeley has created a large number of S. oneidensis MR-1 transposon insertions with known location and with a known tag or barcode. These insertions are pooled together into two pools, and the pools are grown under a given (stress) condition for ~6-8 generations. Typically, the stress experiments are performed in LB media with the stressor in well-shaken (aerobic) flasks, and a concentration of the stressor that reduces the growth rate about 2-fold is used.

The abundance of each tagged strain is measured with microarray at the beginning and at the end of the experiment. The fitness of the strain is the log2 ratio of these abundances. (This is not the same scale as fitness in population genetics.) The data is normalized so that the median strain has a fitness of 0. The fitness value of a gene is computed as the average of the values for the insertions in that gene. In this experiment it is assumed that the insertions of a given gene deactivate that gene.

The reliability of these per-gene fitness values is estimated by looking at consistency across different insertions in the same gene and at consistency across the two pools. In a typical experiment, some strains are very sick (fitness < -2 imply little or no growth), some strains are moderately but significantly sick (fitness ~ -1), most strains have fitness near 0 (are neutral), and a handful of strains are advantaged (fitness ~ 1).

A study of MR-1 gene-phenotype relationships for 121 conditions has already been published (Deutschbaue et al., 2011). The 2012 challenge is to predict results under eight more conditions.

Prediction challenge

Predictors are asked to submit predictions on how insertions in the given gene of MR-1 affect the fitness of that gene in a given condition (stressor). The submitted prediction should be a numeric value with a standard deviation. The predictions will be assessed against the numeric fitness values (log2 ratios) actually measured for each gene in each condition.

Download: list of genes in MR-1

Tab-delimited format, column descriptions:

Descriptions of 8 challenge experiments (the conditions)

Tab-delimited format, column descriptions:

Prediction Dataset

The CAGI challenge consists of 8 fitness experiments over 4632 MR-1 genes. Of these, we will reveal the fitness scores of 1732 genes over 4 conditions. Thus the prediction dataset will be the remaining 2900 genes in the 4 four conditions, and the whole gene set (4632 genes) in the remaining 4 conditions.

Fitness data for half the genes in half the challenge experiments

Tab-delimited format, column descriptions:

Example Results

The dataset providing example results contains 195 fitness experiments on MR-1.

Descriptions of 195 fitness experiments

For file format definition, please see the description for the Prediction dataset conditions above.

Fitness data for 195 experiments

For file format definition, please see the description for the Prediction dataset conditions above. 

Additional data files on the 195 fitness experiments:

Viewing the data in MeV:

Other resources:

Prediction submission format 

The prediction submission is a tab-delimited text file. Organizers provide a file template, which should be used for submission. In addition, a validation script is provided, and predictors should check the correctness of the format before submitting their predictions.

Download MR-1 submission template

Download MR-1 submission validation script (not available).

In the submitted file, each row should include the following columns:

In the template file, some cells in columns 2-9 have pre-filled numbers. These are derived from the revealed experiments as defined in Fitness data for half the genes in half the challenge experiments. The rest of the cells in columns 2-9 are marked with an "*". Submit your predictions by replacing the "*" with your value. No empty cells are allowed in the submission; if you cannot submit predictions for a gene, leave the sign "*" in these cells. Please make sure you follow the submission guidelines strictly. Each prediction column should have prediction and standard deviation in the following format: Prediction Value (Standard Deviation).

In addition, your submission should include a detailed description of the method used to make the predictions (similar to the style of the Methods section in a scientific article). This information will be submitted as a separate file.

To submit predictions, you need to create or be part of a CAGI User group. Submit your predictions by accessing the link: "All submission forms" from the front page of your group. For more details, please read the FAQ page.

References 

The Arkin Lab method for making and assaying pools of transposon-tagged mutants in bacteria: Oh J, Fung E, Price MN, Dehal PS, Davis RW, Giaever G, Nislow C, Arkin AP, Deutschbauer A. A universal TagModule collection for parallel genetic analysis of microorganisms. Nucleic Acids Res. 2010 Aug;38(14):e146. doi: 10.1093/nar/gkq419

Fitness data in yeast using a similar approach (although these were tagged clean deletions, not transposon insertions): Giaever G et al., Functional profiling of the Saccharomyces cerevisiae genome. Nature 418, 387-391. doi:10.1038/nature00935 

Hillenmeyer ME, Fung E, Wildenhain J, Pierce SE, Hoon S, Lee W, Proctor M, St Onge RP, Tyers M, Koller D, Altman RB, Davis RW, Nislow C, Giaever G. The chemical genomic portrait of yeast: uncovering a phenotype for all genes. Science. 2008 Apr 18;320(5874):362-5. doi: 10.1126/science.1150021

Fitness data in E. coli (using 1000-well plates and cameras, not pools and microarrays): Nichols RJ, Sen S, Choo YJ, Beltrao P, Zietek M, Chaba R, Lee S, Kazmierczak KM, Lee KJ, Wong A, Shales M, Lovett S, Winkler ME, Krogan NJ, Typas A, Gross CA. Phenotypic landscape of a bacterial cell. Cell. 2011 Jan 7;144(1):143-56. doi:10.1016/j.cell.2010.11.052 

Data provided by

Adam M. Deutschbauer, Morgan N. Price, Kelly Wetmore, Wenjun Shao, Jason Baumohl, and Adam P. Arkin from UC Berkeley, and Michelle Nyguyen, Raquel Tamse, Ronald W. Davis from Stanford University.