Distinguish between exomes of bipolar disorder patients and unaffected individuals

Challenge: Bipolar exomes
Dataset description: public
Exome sequence data: registered users only, limited by the CAGI Data Use Agreement and a Data Transfer Agreement
Last updated: 14 Apr 2016
This challenge closed at 9:00 PM PST (Pacific Standard Time) on 25 January 2016.

Download answer key, predictions, and assessment: registered users only, limited by CAGI Data use agreement
The answer key, predictions, and assessment are accessible to registered users only, and their use is limited by the CAGI Data use agreement. Please log in to access the file.

Presentations from the CAGI 4 conference: registered users only, limited by CAGI Data use agreement
Presentations are accessible to registered users only, and their use is limited by the CAGI Data use agreement. Please log in to access the file.

Bipolar disorder (BD) is a serious mental illness characterized by recurrent episodes of manias and depression, which are syndromes of abnormal mood, thinking and behavior. It affects 1.0-4.5% of the population [1], and it is among the major causes of disability worldwide [2]. Notably, up to 15% of individuals with BD die from suicide [3]. There is overwhelming evidence that genetic factors play a leading role in the etiology of BD [4-9]. Twin studies of BD have yielded estimates of heritability that reach up to 90% [10] and are among the highest for any mental disorder [11,12]. Moreover, there is little evidence that other non-genetic factors contribute significantly to the risk of BD [13]. However, progress in explaining the genetic contribution to BD has been limited. Nearly two decades of research with linkage and candidate gene studies failed to identify a single confirmed susceptibility gene for BD [10,14].

More recent efforts with genome-wide association studies (GWAS) have had greater success. The most recent report on BD from the Psychiatric Genomics Consortium (PGC), an international consortium of investigators carrying out meta-analyses of existing GWAS on five different psychiatric disorders [15], identified 19 genome-wide significant loci in an analysis of 20,352 BD cases and 31,358 controls. The findings emerging from the PGC provide for the first time credible evidence implicating specific genetic loci in the risk of BD. However, the number of genome-wide significant findings that have been reported for BD are considerably less than for schizophrenia in which over 100 genome-wide significant loci have been identified [16]. The differences are notable even considering the larger sample sizes available for analysis with schizophrenia. Moreover, estimates of the total variance in liability to BD that can be explained by all SNPs in the GWAS carried out by the PGC do not exceed 25% [17]. These observations support the hypothesis that rarer variants that are poorly tagged and largely missed by GWAS contribute to a meaningful proportion of risk for BD [18-20] and motivate sequencing studies to identify such variants.

Dataset description
The data providers have carried out an exome sequencing study of BD and provide for this challenge data from 1,000 individuals including cases, and ancestry and sex matched controls. NimbleGen SeqCap EZ v2.0 Exome arrays with ~3.4 Mb additional custom target for promoter, UTR, and intronic information of 1,422 synaptic genes and 60 genes previously associated with BD were used for target capture (see bed file, Additional information) and samples were sequenced using the Illumina HiSeq2000. Variants were called using GATK UnifiedGenotyper with all samples together following best practices. Only high quality pass variants were retained. We further excluded variants with greater than 10% missing data or in Hardy-Weinberg disequilibrium at p<1x10-6, as well as specific genotype calls with read depth<10 or genotype quality<20.

Prediction challenge
With the provided exome data, identify which individuals have BD and which individuals are unaffected. The organizers have divided the dataset into halves: 500 exomes for training, and 500 exomes for the prediction challenge. The 1,000 exomes are provided in a single dataset file. A training set file provides the BD status of 500 individuals: it contains the individual's ID and their disease status (0=unaffected, 1=BD).

Download dataset, training data, and DTA
This dataset file is available only to registered users. Please log in to access the file.

Download submission template and validation script
This dataset file is available only to registered users. Please log in to access the file.

Prediction submission format
The prediction submission is a tab-delimited text file. Organizers provide a file template, which should be used for submission. In addition, a validation script is provided, and predictors should check the correctness of the format before submitting their predictions.

In the submitted file, each row should include the following tab-separated fields:

  1. individual - the ID number of the individual
  2. disease_status - the probability that this individual has bipolar disorder. The probability should be a value between 0 and 1, 0 meaning unaffected and 1 meaning BD patient.
  3. SD – standard deviation defining the confidence of the prediction in column 2. Large SD means low confidence, while small SD means that the predictor is confident about the submitted prediction.
  4. comment – an optional brief comment on the basis of the prediction in columns 2

In the template file, cells in columns 2-4 are marked with a "*". Submit your predictions by replacing the "*" with your value. No empty cells are allowed in the submission. You must enter a prediction and standard deviation for every individual; if you are not confident in a prediction for an individual, enter a large standard deviation for the prediction. Optionally, enter a brief comment indicating the basis of the prediction; otherwise, leave the "*" in these cells. Please make sure you follow the submission guidelines strictly.

In addition, your submission should include a detailed description of the method used to make the predictions, similar in style to the Methods section in a scientific article. This information will be submitted as a separate file.

To submit predictions, you need to create or be part of a CAGI User group. Submit your predictions by accessing the link: "All submission forms" from the front page of your group. For more details, please read the FAQ page.

Additional information
Download the target intervals used for extended target capture:
This dataset file is available only to registered users. Please log in to access the file.


  1. Merikangas, K.R., Akiskal, H.S., Angst, J., Greenberg, P.E., Hirschfeld, R.M., Petukhova, M., and Kessler, R.C. (2007). Lifetime and 12-month prevalence of bipolar spectrum disorder in the National Comorbidity Survey replication. Archives of general psychiatry 64, 543-552.
  2. Degenhardt L1, Whiteford HA, Ferrari AJ, Baxter AJ, Charlson FJ, Hall WD, Freedman G, Burstein R, Johns N, Engell RE, Flaxman A, Murray CJ, Vos T (2013). Global burden of disease attributable to illicit drug use and dependence: findings from the Global Burden of Disease Study 2010. Lancet 382(9904):1564-74.
  3. Guze, S.B., and Robins, E. (1970). Suicide and primary affective disorders. The British journal of psychiatry : the journal of mental science 117, 437-438.
  4. McGuffin, P., Rijsdijk, F., Andrew, M., Sham, P., Katz, R., and Cardno, A. (2003). The heritability of bipolar affective disorder and the genetic relationship to unipolar depression. Archives of general psychiatry 60, 497-502.
  5. Smoller, J.W., and Finn, C.T. (2003). Family, twin, and adoption studies of bipolar disorder. American journal of medical genetics Part C, Seminars in medical genetics 123C, 48-58.
  6. Craddock, N., and Jones, I. (1999). Genetics of bipolar disorder. Journal of medical genetics 36, 585-594.
  7. Goodwin, F.K., and Jamison, K.R. (2007). Manic-depressive illness: bipolar disorders and recurrent depression.(Oxford University Press).
  8. Rice, J., Reich, T., Andreasen, N.C., Endicott, J., Van Eerdewegh, M., Fishman, R., Hirschfeld, R.M., and Klerman, G.L. (1987). The familial transmission of bipolar illness. Archives of general psychiatry 44, 441-447.
  9. Goldin, L.R., Gershon, E.S., Targum, S.D., Sparkes, R.S., and McGinniss, M. (1983). Segregation and linkage analyses in families of patients with bipolar, unipolar, and schizoaffective mood disorders. American journal of human genetics 35, 274-287.
  10. Craddock, N., and Sklar, P. (2013). Genetics of bipolar disorder. Lancet 381, 1654-1662.
  11. Shih, R.A., Belmonte, P.L., and Zandi, P.P. (2004). A review of the evidence from family, twin and adoption studies for a genetic contribution to adult psychiatric disorders. International review of psychiatry 16, 260-283.
  12. Sullivan, P.F., Daly, M.J., and O'Donovan, M. (2012). Genetic architectures of psychiatric disorders: the emerging picture and its implications. Nature reviews Genetics 13, 537-551.
  13. Alloy, L.B., Abramson, L.Y., Urosevic, S., Walshaw, P.D., Nusslock, R., and Neeren, A.M. (2005). The psychosocial context of bipolar disorder: environmental, cognitive, and developmental risk factors. Clinical psychology review 25, 1043-1075.
  14. Seifuddin, F., Mahon, P.B., Judy, J., Pirooznia, M., Jancic, D., Taylor, J., Goes, F.S., Potash, J.B., and Zandi, P.P. (2012). Meta-analysis of genetic association studies on bipolar disorder. American journal of medical genetics Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics 159B, 508-518.
  15. Psychiatric, G.C.C.C., Cichon, S., Craddock, N., Daly, M., Faraone, S.V., Gejman, P.V., Kelsoe, J., Lehner, T., Levinson, D.F., Moran, A., et al. (2009). Genomewide association studies: history, rationale, and prospects for psychiatric disorders. The American journal of psychiatry 166, 540-556.
  16. Schizophrenia Working Group of the Psychiatric Genomics, C. (2014). Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421-427.
  17. Cross-Disorder Group of the Psychiatric Genomics, C., Lee, S.H., Ripke, S., Neale, B.M., Faraone, S.V., Purcell, S.M., Perlis, R.H., Mowry, B.J., Thapar, A., Goddard, M.E., et al. (2013). Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nature genetics 45, 984-994.
  18. Manolio, T.A., Collins, F.S., Cox, N.J., Goldstein, D.B., Hindorff, L.A., Hunter, D.J., McCarthy, M.I., Ramos, E.M., Cardon, L.R., Chakravarti, A., et al. (2009). Finding the missing heritability of complex diseases. Nature 461, 747-753.
  19. Cirulli, E.T., and Goldstein, D.B. (2010). Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nature reviews Genetics 11, 415-425.
  20. Bodmer, W., and Bonilla, C. (2008). Common and rare variants in multifactorial susceptibility to common diseases. Nature genetics 40, 695-701.

Dataset provided by
Mehdi Pirooznia and Peter Zandi, The Johns Hopkins University
Richard McCombie, Cold Spring Harbor Laboratory
James B. Potash, University of Iowa

4 Nov 2015 (v01): initial release
11 Nov 2015 (v02): DTA provided as a pdf form with fields
12 Nov 2015 (v03): improved validation script provided
17 Dec 2015 (v04): challenge close date extended to 25 Jan 2015
29 Jan 2016 (v05): answer key provided
18 Mar 2016 (v06): predictions provided
14 Apr 2016 (v07): assessment and conference presentations provided