Speeding up Single Cell
Tumor Mutation Tree Inference

Deep Dive into Computational Statistic in Cancer Genomics

Feb 2023 - Oct 2023

For my 2nd Master thesis at ETH Zürich in Biotechnology, I wanted to learn more about computational/bayesian statistics applied in genomics. This is the project am currently doing:


Cancer progression is an evolutionary process in which cells with different characteristics compete with each other. Understanding the cell heterogeneity is important for prognosis and suggesting the right treatment. Single-cell DNA sequencing allows one to measure mutations in selected cells at a specific time point. The tumor phylogeny problem aims at reconstruction of the whole evolutionary history of the tumor based on this single snapshot, in the form of a mutation tree.

Although there exist principled methods, such as SCITE [1], which traverse the space of all possible mutation histories to suggest the most likely ones, they can be too slow and computationally intense to run. Hence, there is a sparking interest in fast heuristic methods, which can be used to approximate a single most-likely tree.

[1] Tree inference for single-cell data, Genom Biology, Jahn et al. 2016

Why this?

Learn advanced computational statistics.

Working on single-cell genomic data sounds cool.

Throw in some high-performacne computing.

Escaping the burden of feeding your cells in the wet-lab.


The code for the project, I'll be a contributer to will be here
Github: cbg-ethz/PYggdrasil