Data Use Agreement

Essential information about how you can use CAGI data. Please read carefully, because it might contain data restrictions that you did not expect.

CAGI aims to advance phenotypic interpretation of genomic variation. The CAGI experiments depend on the interrogation of data from people whose information has been collected as part of clinical care, following participation in a research project or biorepository, or from healthy volunteers. Some of these data—incorporating both genotypes and phenotypes—are highly sensitive and personal, and therefore must be handled with the utmost respect, integrity, and care including being maintained with the highest standards of data security and confidentiality.

The success of CAGI also hinges critically on the generous contribution of pre-publication datasets and the participation of predictors and assessors. Many datasets affect individuals’ careers.

To protect unpublished and sensitive data that have been shared with CAGI, and as a condition of participation in CAGI, CAGI participants must agree to the following dataset dissemination rules. We define CAGI “participants” as those who have any role in the CAGI experiment including predictors, assessors, data set providers, organizers and advisors.

  • All datasets (including genotypes and phenotypes) are confidential until released by the dataset provider. Release may take the form of (a) datasets that are posted on the CAGI website and explicitly labeled as open public access, (b) explicit written permission from dataset provider to use the data for a limited set of applications, and/or (c) publication of the full contents of the dataset for unrestricted public use (publication of partial or restricted datasets constitutes release of only that partial or access restricted dataset).
  • CAGI participants agree not to share unreleased datasets with anyone except other registered and approved predictors who have agreed to these terms.
  • CAGI participants agree to be responsible for maintaining the privacy and security of unreleased datasets, which he/she obtained from CAGI. As one example, CAGI participant must keep the files on secure systems and may not submit confidential data for predictions on third-party webservers.
  • CAGI participants agree not to use an unreleased dataset for any other purposes than that described in the CAGI challenge for the dataset.
  • CAGI participants agree not to use unreleased datasets for any commercial purpose.
  • CAGI participants agree not to use any unreleased datasets in any publication, for example as a test case (even if the identity of the data are not disclosed) or for reporting a discovery that the CAGI participant might have made when analyzing the data.
  • Following dataset release, CAGI participants may use the data with the same freedom and constraints as others who obtained the data without participation in CAGI via public mechanisms. Even after release, dataset use may be constrained (e.g., due to privacy issues) and participation in CAGI does not release CAGI participant from those constraints.
  • Any requests for early release of dataset contents for a specific purpose must be submitted via the CAGI organizers, rather than directly to the dataset provider.

In order to register for the CAGI dataset access, you must read, understand, and agree to these data use rules. If you agree to these rules, please register by providing your initials.

Last updated: November 30, 2020