Johannes Köster is a computer scientist with a focus on algorithm engineering and data analysis in bioinformatics. Johannes studied computer science at the University of Dortmund (diploma thesis 2010, with Max Planck Institute of Molecular Physiology Dortmund). Then, he did his PhD in the group of Prof. Sven Rahmann (TU Dortmund, 2015). Afterwards, Johannes was a postdoc in the groups of Prof. Shirley Liu and Prof. Myles Brown at Dana Farber Cancer Institute and Harvard University (2015-2016). In 2016, Johannes moved to the lab of Prof. Alexander Schönhuth for a brief second postdoc at Centrum Wiskunde & Informatica Amsterdam, Netherlands (CWI), where he quickly received a VENI grant for an independent position. In 2017, Johannes became the head of the group "Algorithms for reproducible bioinformatics at the University of Duisburg-Essen in Germany.
Johannes' research is focused on reproducibility in three ways.
First, he is the author of the popular workflow management system Snakemake and the founder of the Bioconda project for sustainably distributing bioinformatics software as easily installable packages. Together, these projects form the base for a large fraction of currently performed scalable and reproducible data analysis in bioinformatics.
Second, Johannes is the author of the Rust-Bio library, enabling the use of the Rust programming language for bioinformatics by providing standard bioinformatics algorithms and data structures. Using Rust promotes reproducibilty by guaranteeing thread and memory safety at compile-time.
Third, Johannes is working in the field of Bayesian statistics (e.g., for variant calling and single cell transcriptomics) in order to provide algorithms for analysis of high-throughput data while capturing and quantifying all known sources of uncertainty, thereby providing more reproducible predictions.
|Since August 2017||Group leader||Algorithms for reproducible bioinformatics, Institute of Human Genetics, University of Duisburg-Essen|
|August 2016 -July 2017||Researcher||Life Sciences, CWI Amsterdam|
|Since May 2016||Consultant||Myles Brown, Division of Molecular and Cellular Oncology, Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School|
|May 2016 -
|Postdoc||Alexander Schönhuth, Life Sciences, CWI Amsterdam|
|April 2015 -
|Postdoc||Shirley Liu, Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Harvard School of Public Health|
|Postdoc||Myles Brown, Division of Molecular and Cellular Oncology, Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School|
|January 2011 -
|PhD student||Sven Rahmann, Genome Informatics, University Duisburg-Essen|
|Guest member||Eli Zamir, Systems Biology of Cell Matrix Adhesion, Max-Planck-Institute of Molecular Physiology Dortmund|
Snakemake is a workflow engine and language. It aims to reduce the complexity of creating workflows by providing a fast and comfortable execution environment, together with a clean and modern domain specific specification language (DSL) in python style.
A distribution of bioinformatics software realized as a channel for the versatile package manager Conda.
Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control. The project aims to provide a unified theory of variant calling, across all size ranges (small, structural) and calling scenarios (germline, somatic, arbitrary).
A Bayesian model for single-cell transcript (differential) expression analysis on MERFISH data. The model allows to overcome systematic biases occurring with MERFISH and provides measures of uncertainty and control of the false discovery rate in a strictly Bayesian way. MERFISHtools is a corresponding command line client and analysis library written in Rust and Python. MERFISHtools is also available via Bioconda.
A bioinformatics library written in the Rust language. The implementation provides state of the art solutions for common tasks in bioinformatics, focusing on stability by using comprehensive unit tests and continuous integration.
ALPACA is a variant caller for next-generation sequencing data that incorporates sample based filtering into the calling. This allows intuitive control of the false discovery rate with generic sample filtering scenarios. Further, it uses preprocessing and merging of BCF files to solve the N+1 problem: an existing study can be extended with new samples without redundant computations. After the preprocessing, the actual calling is a matter of seconds.
PEANUT is a read mapper for DNA or RNA sequence reads. By exploiting the massive parallelism of modern graphics processors and a novel index datastructure, PEANUT achieves superior speed compared to current state of the art read mappers like BWA MEM, Bowtie2 and RazerS3, while maintaining their accuracy. It thereby allows to report both only the best hits or all hits of a read. In case of reporting all hits, PEANUT is four to ten times faster than competitors.
LibModalLogic is a JAVA implementation of Modal Logic K and Propositional Logic. Logic formulas can be build in memory, saved to and read from MathML and formatted human readable. Reasoning is implemented by the (modal) logic tableau algorithm, including dynamic backtracking for maximum performance.
TRMiner is a python tool that aims at scientific data curators. It allows to rapidly prune large collections of scientific publications to sentences relevant for a given mining goal, using a linear time matching algorithm.
Protein Hypernetworks are an approach for endowing protein networks with interaction dependencies using propositional logic. This allows refined network based predictions of protein complexes, functional importance and functional similarity.
|2013||Guest lecture "Detecting SNVs with Next-generation-Sequencing" in the course "Statistik in der Genetik", Faculty of Statistics, TU Dortmund.|
|2012||Co-supervised bachelor thesis "Rekonstruktion von Protein-Interaktionsabhängigkeiten mit dem Quine-McCluskey-Algorithmus", Bianca Patro, TU Dortmund.|
|2011||Teaching assistant for "Datenstrukturen Algorithmen und Programmierung" (DAP1), Faculty of Computer Science, TU Dortmund|
Co-supervised bachelor thesis "Konstruktion von Protein-Hypernetzwerken durch Text-Mining in der PubMed Datenbank", Michael Nimbs, TU Dortmund.
Co-supervised diploma thesis "Entwurf einer Datenstruktur für Pangenome", Christiane Küch, TU Dortmund.
|phone||+49 (0)201 723 1908|
|office||Room 1.13 University Hospital Essen Virchowstr. 183 45147 Essen|
Dr. rer. nat. Johannes Köster
Institute of Human Genetics
University of Duisburg-Essen