Johannes Köster is a computer scientist with a focus on algorithm engineering and data analysis in bioinformatics. Johannes studied computer science at the University of Dortmund (diploma thesis 2010, with Max Planck Institute of Molecular Physiology Dortmund). Then, he did his PhD in the group of Prof. Sven Rahmann (TU Dortmund, 2015). Afterwards, Johannes was a postdoc in the groups of Prof. Shirley Liu and Prof. Myles Brown at Dana Farber Cancer Institute and Harvard University (2015-2016). In 2016, Johannes moved to the lab of Prof. Alexander Schönhuth for a brief second postdoc at Centrum Wiskunde & Informatica Amsterdam, Netherlands (CWI), where he quickly received a VENI grant for an independent position. In 2017, Johannes became the head of the group "Algorithms for reproducible bioinformatics at the University of Duisburg-Essen in Germany.

Johannes' research is focused on reproducibility in three ways. First, he is the author of the popular workflow management system Snakemake and the founder of the Bioconda project for sustainably distributing bioinformatics software as easily installable packages. Together, these projects form the base for a large fraction of currently performed scalable and reproducible data analysis in bioinformatics.
Second, Johannes is the author of the Rust-Bio library, enabling the use of the Rust programming language for bioinformatics by providing standard bioinformatics algorithms and data structures. Using Rust promotes reproducibilty by guaranteeing thread and memory safety at compile-time.
Third, Johannes is working in the field of Bayesian statistics (e.g., for variant calling and single cell transcriptomics) in order to provide algorithms for analysis of high-throughput data while capturing and quantifying all known sources of uncertainty, thereby providing more reproducible predictions.

Position Group
Since August 2017 Group leader Algorithms for reproducible bioinformatics, Institute of Human Genetics, University of Duisburg-Essen
August 2016 -
July 2017
Researcher Life Sciences, CWI Amsterdam
Since May 2016 Consultant Myles Brown, Division of Molecular and Cellular Oncology, Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School
May 2016 -
July 2016
Postdoc Alexander Schönhuth, Life Sciences, CWI Amsterdam
April 2015 -
April 2016
Postdoc Shirley Liu, Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Harvard School of Public Health
Postdoc Myles Brown, Division of Molecular and Cellular Oncology, Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School
January 2011 -
March 2015
PhD student Sven Rahmann, Genome Informatics, University Duisburg-Essen
Guest member Eli Zamir, Systems Biology of Cell Matrix Adhesion, Max-Planck-Institute of Molecular Physiology Dortmund
See here for a full CV.




Snakemake is a workflow engine and language. It aims to reduce the complexity of creating workflows by providing a fast and comfortable execution environment, together with a clean and modern domain specific specification language (DSL) in python style.

visit homepage


A distribution of bioinformatics software realized as a channel for the versatile package manager Conda.

visit homepage


Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control. The project aims to provide a unified theory of variant calling, across all size ranges (small, structural) and calling scenarios (germline, somatic, arbitrary).

visit homepage


A Bayesian model for single-cell transcript (differential) expression analysis on MERFISH data. The model allows to overcome systematic biases occurring with MERFISH and provides measures of uncertainty and control of the false discovery rate in a strictly Bayesian way. MERFISHtools is a corresponding command line client and analysis library written in Rust and Python. MERFISHtools is also available via Bioconda.

visit homepage


A bioinformatics library written in the Rust language. The implementation provides state of the art solutions for common tasks in bioinformatics, focusing on stability by using comprehensive unit tests and continuous integration.

visit homepage


ALPACA is a variant caller for next-generation sequencing data that incorporates sample based filtering into the calling. This allows intuitive control of the false discovery rate with generic sample filtering scenarios. Further, it uses preprocessing and merging of BCF files to solve the N+1 problem: an existing study can be extended with new samples without redundant computations. After the preprocessing, the actual calling is a matter of seconds.

visit homepage


PEANUT is a read mapper for DNA or RNA sequence reads. By exploiting the massive parallelism of modern graphics processors and a novel index datastructure, PEANUT achieves superior speed compared to current state of the art read mappers like BWA MEM, Bowtie2 and RazerS3, while maintaining their accuracy. It thereby allows to report both only the best hits or all hits of a read. In case of reporting all hits, PEANUT is four to ten times faster than competitors.

visit homepage


LibModalLogic is a JAVA implementation of Modal Logic K and Propositional Logic. Logic formulas can be build in memory, saved to and read from MathML and formatted human readable. Reasoning is implemented by the (modal) logic tableau algorithm, including dynamic backtracking for maximum performance.

visit at google code


TRMiner is a python tool that aims at scientific data curators. It allows to rapidly prune large collections of scientific publications to sentences relevant for a given mining goal, using a linear time matching algorithm.

visit at google code

Protein Hypernetworks

Protein Hypernetworks are an approach for endowing protein networks with interaction dependencies using propositional logic. This allows refined network based predictions of protein complexes, functional importance and functional similarity.

visit at google code




Awards and Grants

  • NWO Veni grant (€ 250,000) for the project "Fully reproducible workflows scaling from workstations to the cloud"
  • Uhde-Award 2011 for my diploma thesis "Propagating Interaction Logic towards Predictive Protein Hypernetworks".
  • Honorable Mention at the Doktorandenkolleg Ruhr 2011 for my poster "Protein Hypernetworks".
  • Poster Award of the University Hospital Essen at the Forschunstag 2011 for my poster "Protein Hypernetworks".
  • Travel Award of the 9th International Conference on Pathways, Networks and Systems Medicine 2011 for my talk on protein hypernetworks.


2013Guest lecture "Detecting SNVs with Next-generation-Sequencing" in the course "Statistik in der Genetik", Faculty of Statistics, TU Dortmund.
2012Co-supervised bachelor thesis "Rekonstruktion von Protein-Interaktionsabhängigkeiten mit dem Quine-McCluskey-Algorithmus", Bianca Patro, TU Dortmund.
2011Teaching assistant for "Datenstrukturen Algorithmen und Programmierung" (DAP1), Faculty of Computer Science, TU Dortmund
Co-supervised bachelor thesis "Konstruktion von Protein-Hypernetzwerken durch Text-Mining in der PubMed Datenbank", Michael Nimbs, TU Dortmund.
Co-supervised diploma thesis "Entwurf einer Datenstruktur für Pangenome", Christiane Küch, TU Dortmund.

phone+49 (0)201 723 1908
office Room 1.13 University Hospital Essen Virchowstr. 183 45147 Essen
postal Dr. rer. nat. Johannes Köster
Institute of Human Genetics
University of Duisburg-Essen
Hufelandstr. 55
45147 Essen