With over 33 million prescriptions in 2011, warfarin is the most commonly used anticoagulant for preventing thromboembolic events [1]. Warfarin has a twenty-fold inter-individual dose variability and a narrow therapeutic index, and it is responsible for a third of adverse drug event hospitalizations in older Americans [2]. Alternatives to warfarin, such as direct thrombin inhibitors and factor Xa inhibitors, are now available. However, these are more expensive, irreversible, and may cause a higher rate of acute coronary events compared to warfarin [3,4]. Thus, warfarin remains a mainstay of anticoagulant therapy, and better methods of dosing warfarin will lead to fewer adverse events due to overcoagulation.

Both clinical modifiers and genetic polymorphisms are known to affect an individual’s stable therapeutic warfarin dose [5]. Previously, warfarin dose prediction algorithms have been formulated; however, these algorithms are less predictive in diverse populations [6].

Prediction challenge
With the provided exome data and clinical covariates, predict the therapeutic warfarin dose for 53 individuals.

Dataset description
The data set contains the following components:

  1. A jointly called vcf file of genotypes for exomes of 103 African Americans on warfarin.
  2. A file of clinical co-variates for the same individuals.
  3. A file of warfarin doses for 50 of these individuals, for use in training if you wish.
  4. The distribution of warfarin doses for the other 53 individuals, whose doses are to be predicted.

A description of how the genomic data were collected is available in the methods section of [7] ( Reference [7] contains a large amount of other relevant information, including an analysis and prediction model developed by the dataset providers.

Prediction submission format
The prediction submission is a tab-delimited text file. Organizers provide a file template, which should be used for submission. A validation script is provided, and predictors should check the correctness of the format before submitting their predictions.

In the submitted file, each of the 53 rows includes the following columns:

  1. subject_id - The ID number of the individual
  2. dose - the predicted warfarin dose for each individual. Dose predictions must fall within the distribution provided (16-105 mg/week).
  3. Standard deviation (mg/week) – SD defining the confidence of the prediction in column 2
  4. comment – an optional brief comment on the basis of the prediction in column 2

In the template file, cells in columns 2-4 are marked with an "*". Submit your predictions by replacing the "*" with your value. No empty cells are allowed in the submission. You must enter a prediction and standard deviation for every individual; if you are not confident in a prediction for an individual, enter a large standard deviation for the prediction. Optionally, enter a brief comment indicating the basis of each prediction;,otherwise, leave the "*" in these cells. Please make sure you follow the submission guidelines strictly.

Note that although numerical dose prediction is required and will be assessed, it is likely evaluation will also include an assessment based on a binary prediction of high/low dose. Values at or below 44 will be considered low, and values above 44 will be considered high.

In addition, your submission should include a detailed description of the method used to make the predictions, similar to the style of the Methods section in a scientific article. This information will be submitted as a separate file.

To submit predictions, you need to create or be part of a CAGI User group. Submit your predictions by accessing the link: "All submission forms" from the front page of your group. For more details, please read the FAQ page.


Dataset provided by

RoxanaD RussAltman

Roxana Daneshjou and Russ Altman, Stanford University School of Medicine

