ASM file structure and definitions

The following is a description of the file structure of the sequence dataset for the SickKids clinical genomes challenge. Further information on Complete Genomics file structure is available here.

CGI files and Structure

ASM directory:

The files in the ASM directory describe and annotate the genome assembly with respect to the reference genome. It should contain these files:

CAGI The total space would typically be ~35Gb/genome but we did not transfer the REF and the EVIDENCE folders (which are very large) and thus the files for each genome should be ~2Gb/genome. 

In addition to the variations file, the ASM directory includes annotations of the assembled sequence with respect to the SNP database (dbSNP), RefSeq transcripts, and protein sequences. These folders are not large and only take up negligible space. The ASM directory includes the following subdirectories: