About the Collaboratory

The Cancer Genome Collaboratory is an academic research cloud that contains the raw and interpreted data from the International Cancer Genome Consortium, an international project that aims to sequence the genomes of 25,000 tumours and matching normal tissues.


The Cancer Genome Collaboratory (or simply the Collaboratory) is an academic research cloud being built by the Ontario Institute for Cancer Research (OICR). This unique compute cloud-based resource enables research on the world’s largest and most comprehensive cancer genome dataset.

Using the Collaboratory’s facilities, researchers can run complex data mining and analysis operations across a large repository of cancer genome sequences and their associated donor clinical information. Using advanced metadata tagging, provenance tracking, and workflow management software, researchers can execute complex analytic pipelines, create reproducible traces of each computational step, and share methods and results. Instead of spending weeks to months downloading hundreds of terabytes of data from a central repository before computations can begin, researchers can upload their analytic software into the Collaboratory cloud, run it, and download the computed results in a secure fashion.

The Collaboratory is home to the data holdings of the International Cancer Genome Consortium, a global collaboration involving more than 70 projects and 40 countries/jurisdictions to sequence the genomes of 25,000 tumours and their matched normal tissues across 50 major cancer types. Users of the Collaboratory have fast and easy access to this unique data set.

Collaboratory Benefits

Benefits to Cancer Research

Cancer is the leading cause of morbidity and mortality, responsible for over 70,000 deaths per year in Canada and over 8 million worldwide. Cancer is a disease of the genome in which an accumulation of genomic alterations leads to unregulated cell growth. Researchers need a comprehensive catalogue of the molecular alterations that arise during the formation of malignant tumours, and models of how these alterations interact to give rise to tumour phenotype.

The International Cancer Genome Consortium (ICGC) is the largest worlwide coordinated effort to produce this catalogue. The ICGC already represents the world's largest collection of genomes and the Collaboratory currently contains a growing repository of ICGC alignment and variant data allowing researchers to test important ideas about the role of genomics in cancer.

Benefits to Computer Science Research

This project will develop and implement application programming interfaces (APIs) that allow large shared data sets to be accessed in an efficient and backward-compatible manner, using new methods for tracking provenance and workflows. It will also accelerate research in the fields of indexing, search, compression and cryptography, thereby impacting other data-driven fields in biology and the natural sciences.

Benefits to the Ethics and Law of Personal Health Information (PHI) Privacy

The placement of PHI in a shared cloud resource poses significant but not insurmountable challenges to the established legal and ethical frameworks for protection of PHI. The ICGC data set is an ideal test case for enhancing these frameworks because of the careful and uniform way in which informed consent was obtained from its donors.

