Research Statement
- A compressed distributed reference genome index with very quick search capability
- A stream mapper on a distributed index for real-time variant calling
- An index on short read data collections that enables similarity search capability
- Indexed search techniques for unaligned reads

-
Management Team
- Cenk Sahinalp, Principal Investigator
-
Collaborators
- Jared Simpson, Collaborator
- Bonnie Berger, Collaborator
Outcomes
This core enables the efficient compression and fast searching of cancer genome sequences. It will have an immediate benefit to the Collaboratory by allowing more sequence data to be stored in the same physical capacity, as well as to the entire genomics community, which is facing a rate of increase in NGS sequencing data that well exceeds the rate of advance in raw storage capacity.
Its novelty lies in the integration of computational methods for handling large scale sequence data through:
- parallelization
- streaming and on-line computing
- I/O efficiency (in the context of data structures)
- compression
Software
-
Reference-based compression by local assembly
-
Ultra-Sensitive Detection of Single Nucleotide Variants and Indels in Circulating Tumour DNA
-
Compact, SNP-aware mapper for high performance sequencing applications
-
Exact genotyping of CYP2D6 using high-throughput sequencing data
-
Clonality inference from low coverage single-sample tumors
Latest Publications & Presentations
Comparison of high-throughput sequencing data compression tools.
Nature methods, 2016;():
Compression - State of the art
Date: February 2016
Meeting: MPEG-114, San Diego; San Diego, USA
DeeZ: reference based compression by local assembly
Date: April 2015
Meeting: RECOMB-Seq 2015, highlight talks