CAGI Scientist Opening: apply here ☀️

Estimate patients’ therapeutic warfarin doses from their exome sequences

Challenge: Warfarin exomesClotting disease (DVT or PE) exomes 

Dataset description: public

Dataset availability: encrypted

Exome sequence data: registered users only, limited by CAGI Data Use Agreement

Last updated: 14 April 2016

This challenge will tentatively close at 8:00 PM PST (Pacific Standard Time) on 8 December 2015. 

Download answer key, predictions, and assessment: registered users only, limited by CAGI Data Use Agreement. The answer key, predictions, and assessment files are accessible to registered users only, and their use is limited by the CAGI Data Use Agreement. Please log in access the file.

Presentations from the CAGI 4 conference: registered users only, limited by CAGI Data Use Agreement. Presentations are accessible to registered users only, and their use is limited by the CAGI Data Use Agreement. Please log in to access the file.


With over 33 million prescriptions in 2011, warfarin is the most commonly used anticoagulant for preventing thromboembolic events [1]. Warfarin has a twenty-fold inter-individual dose variability and a narrow therapeutic index, and it is responsible for a third of adverse drug event hospitalizations in older Americans [2]. Alternatives to warfarin, such as direct thrombin inhibitors and factor Xa inhibitors, are now available. However, these are more expensive, irreversible, and may cause a higher rate of acute coronary events compared to warfarin [3,4]. Thus, warfarin remains a mainstay of anticoagulant therapy, and better methods of dosing warfarin will lead to fewer adverse events due to overcoagulation.

Both clinical modifiers and genetic polymorphisms are known to affect an individual’s stable therapeutic warfarin dose [5]. Previously, warfarin dose prediction algorithms have been formulated; however, these algorithms are less predictive in diverse populations [6].

Prediction challenge

With the provided exome data and clinical covariates, predict the therapeutic warfarin dose for 53 individuals.

Dataset description

The data set contains the following components:

  • A jointly called vcf file of genotypes for exomes of 103 African Americans on warfarin.
  • A file of clinical co-variates for the same individuals.
  • A file of warfarin doses for 50 of these individuals, for use in training if you wish.
  • The distribution of warfarin doses for the other 53 individuals, whose doses are to be predicted.

A description of how the genomic data were collected is available in the methods section of [7] ( Reference [7] contains a large amount of other relevant information, including an analysis and prediction model developed by the dataset providers.

Download dataset: This dataset file is available only to registered users. Please log in to access the file.

Download submission template: This submission template file is available only to registered users. Please log in to access the file.

Download validation script: This submission validation script is available only to registered users. Please log in to access the file.

Prediction submission format 

The prediction submission is a tab-delimited text file. Organizers provide a file template, which should be used for submission. A validation script is provided, and predictors should check the correctness of the format before submitting their predictions.

In the submitted file, each of the 53 rows includes the following columns:

  • subject_id - The ID number of the individual
  • dose - the predicted warfarin dose for each individual. Dose predictions must fall within the distribution provided (16-105 mg/week).
  • standard deviation (mg/week) - SD defining the confidence of the prediction in column 2
  • comment – an optional brief comment on the basis of the prediction in column 2

In the template file, cells in columns 2-4 are marked with an "*". Submit your predictions by replacing the "*" with your value. No empty cells are allowed in the submission. You must enter a prediction and standard deviation for every individual; if you are not confident in a prediction for an individual, enter a large standard deviation for the prediction. Optionally, enter a brief comment indicating the basis of each prediction;,otherwise, leave the "*" in these cells. Please make sure you follow the submission guidelines strictly.

Note that although numerical dose prediction is required and will be assessed, it is likely evaluation will also include an assessment based on a binary prediction of high/low dose. Values at or below 44 will be considered low, and values above 44 will be considered high.

In addition, your submission should include a detailed description of the method used to make the predictions, similar to the style of the Methods section in a scientific article. This information will be submitted as a separate file.

To submit predictions, you need to create or be part of a CAGI User group. Submit your predictions by accessing the link: "All submission forms" from the front page of your group. For more details, please read the FAQ page.


  • IMS Institute for Healthcare Informatics. 2012. The Use of Medicines in the United States : Review of 2011.
  • Budnitz DS, Lovegrove MC, Shehab N, Richards CL. 2011. Emergency hospitalizations for adverse drug events in older Americans. N Engl J Med 365:2002-2012. doi:10.1056/NEJMsa1103053
  • Bauer KA. 2011. Recent progress in anticoagulant therapy: oral direct inhibitors of thrombin and factor Xa. J Thromb Haemost 9:12-9. doi:10.1111/j.1538-7836.2011.04321.x
  • Uchino K, Hernandez AV. 2012. Dabigatran association with higher risk of acute coronary events: meta-analysis of noninferiority randomized controlled trials. Arch Intern Med 172:397-402. doi:10.1001/archinternmed.2011.1666
  • Klein TE, Altman RB, Eriksson N, et al. 2009. Estimation of the warfarin dose with clinical and pharmacogenetic data. N Engl J Med 360:753-764. doi:10.1056/NEJMoa0809329
  • Daneshjou R, Klein TE, Altman RB. 2014. Genotype-guided dosing of vitamin K antagonists. N Engl J Med 370:1763-4. doi:10.1056/NEJMc1402521
  • Daneshjou R, et al. Genetic variant in folate homeostasis is associated with lower warfarin dose in African Americans. Blood 124:2298-305. doi:10.1182/blood-2014-04-568436

Dataset provided by

Roxana Daneshjou and Russ Altman, Stanford University School of Medicine


6 Aug 2015 (v01): initial release 

4 Sep 2015 (v02): challenge close date added 

28 Oct 2015 (v03): submission instructions and template updated, validation script provided 

7 Nov 2015 (v04): submission deadline extended 

12 Nov 2015 (v05): improved validation script provided

18 Dec 2015 (v06): answer key provided

18 Mar 2016 (v06): predictions provided 

14 Apr 2016 (v07): assessment and conference presentations provided