Data Use Agreement

Essential information about how you can use CAGI data. Please read carefully, because it might contain data use restrictions that you did not expect.

CAGI aims to advance phenotypic interpretation of genomic variation. The CAGI experiments depend on the interrogation of data from people whose information has been collected as part of clinical care, following participation in a research project or biorepository, or from healthy volunteers.  Some of these data—incorporating both genotypes and phenotypes—are sensitive and personal and therefore must be handled with the utmost respect, integrity, and care. This includes being maintained with the highest standards of data security and confidentiality.

The success of CAGI also hinges critically on the generous contribution of pre-publication datasets and the participation of predictors and assessors. Many datasets affect individuals’ careers.

To protect unpublished and sensitive data that have been shared with CAGI, and as a condition of participation in CAGI, CAGI participants must agree to the following dataset dissemination rules. Here we define CAGI “participants” as those who have any role in CAGI, including predictors, assessors, dataset providers, organizers, and advisors.

  • All datasets (including genotypes and phenotypes) are confidential until released by the dataset provider. Release may take the form of (a) datasets that are posted on the CAGI website and explicitly labeled as open public access, (b) explicit written permission from the dataset provider to use the data for a limited set of applications, and/or (c) publication of the full contents of the dataset for unrestricted public use (publication of partial or restricted datasets constitutes release of only that partial or access restricted dataset).
  • CAGI participants agree not to share unreleased datasets with anyone except other registered and approved predictors who have agreed to these terms.
  • It is the responsibility of each CAGI participant to maintain the privacy and security of datasets obtained from CAGI. CAGI participants must keep the files on secure systems and may not submit confidential data for predictions on third-party webservers.
  • CAGI participants agree not to use an unreleased dataset for any other purposes than that described in the CAGI challenge for the dataset.
  • CAGI participants agree not to use unreleased datasets for any commercial purpose.
  • CAGI participants agree not to use any unreleased datasets in any publication, for example as a test case (even if the source of the data is not disclosed) or for reporting a discovery that the CAGI participant might have made when analyzing the data.
  • Following dataset release, CAGI participants may use the data with the same freedom and constraints as others who obtained the data via public mechanisms (i.e., without participation in CAGI). Even after release, dataset use may be constrained (e.g., due to privacy issues) and participation in CAGI does not release CAGI participant from those constraints.
  • Any requests for early release of dataset contents for a specific purpose must be submitted via the CAGI organizers, rather than directly to the dataset provider.

All CAGI participants, including those within the same research group, must individually sign the data use agreement. To register for the CAGI dataset access, you must read, understand, and agree to these data use rules. If you agree to these rules, please indicate this by providing your initials. 

Last updated: May 3, 2021