Our group is using bulk as well as single cell-based omics approaches for investigating mechanisms behind complex phenotypes in humans, ranging from common diseases including cancer to ageing. An over-arching theme centers on the formation and selection of germline and somatic genetic variation in health and disease states, in particular genomic structural variation (SV). Bioinformatics approaches used encompass deep learning and statistical methodology for processing high-dimensional big data sets. Omics techniques employed in our group range from whole genome sequencing and epigenomic techniques, to single cell/single strand DNA sequencing (Strand-seq; see Figure 1), the latter of which enables haplotype-resolved studies of genetic variation and genome instability. Scientists in our group combine methods development, data generation and analysis with hypothesis generation and experimental testing to obtain insights into biological mechanisms of disease.

Figure 1:  Strand-seq preserves the identity and structure of each homologue in a cell (Porubsky et al. Nat Commun 2017).

Previous and current research

Our laboratory has further been among the pioneers in the utilization of cloud computing to enable sharing and processing of large-scale omics data. As an example, the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) project, co-led by our group, is leveraging cloud solutions to globally standardize and analyze cancer genomics data, with the aim of uncovering commonalities and differences between molecular disease mechanisms in disparate cancer entities. By studying recurrent somatic SVs affecting intergenic regions, we recently demonstrated that “enhancer hijacking” – the juxtaposition of active enhancers near proto-oncogenes, and often across topological association domain (TAD) boundaries – is a frequent oncogene activation mechanism in solid tumors (Figure 2).

Figure 2: A. Identification of enhancer hijacking mediating oncogenic overexpression in cancer genomes; B. IGF2 activation via de novo 3D contact domain (‘neo-TAD’) formation is mediated by recurrent somatic tandem duplications (Weischenfeldt et al. Nat Genet 2017; Northcott et al. Nature 2017).

Future projects and goals

  • Identifying determinants for the formation and selection of genetic variation in cancer and during ageing (which includes the development of deep learning and statistical methodologies).
  • Development of methodology to facilitate single cell studies of structural variation, and integrating state-of-the-art microscopy methods with single cell sequencing.
  • Completion of human genome variation maps using strand-specific and single molecule DNA sequencing techniques.
  • Deciphering the basis of genomic instability using cell-based models.