I’m an Associate Professor in HPC/Scientific Computing at the Department of Computer Science (former School of Engineering and Computing Sciences) at Durham University. Before I joined Durham, I graduated with a PhD from Technische Universität München (TUM) in Germany and served there as PostDoc Scientific Project Manager for international research consortia in collaboration with KAUST and the Munich Centre of Advanced Computing. I also obtained a habilitation (venia legendi) from TUM.
My objective in research is to find novel algorithms and clever implementations for applications from scientific computing that today are too hard to solve as we don’t have the right software. Often, existing solvers lack the required computational efficiency, hardware-awareness, anticipation of data structuredness and distribution, ability to handle the required data cardinality, or software maturity. I want to change this: My work shall enable others to simulate physical phenomena or study engineering challenges with unprecedented speed, accuracy and details.
Within the multifaceted world of computer-enabled sciences, I focus on algorithms and implementation patterns that help us to deliver faster code. Where possible, I also try to proof the correctness and efficiency of the implementations I propose, and to suggest how to program the ideas more efficiently. All in all, I work primarily on methodological/theoretic aspects of the computer science side of scientific computing and HPC, and I work quite generic from an application point of view. Yet, I anticipate ideas from hardware-aware performance engineering, pick up the latest trends in mathematics and consider application knowledge, too. When I search for applications of my ideas, I am particularly interested in dynamically adaptive multiscale methods based upon spacetrees that interact with multigrid solvers for partial differential equations, that host particle systems with particles of varying cut-off radii or size, or that carry Finite Volume-alike discretisations. Besides the core algorithm and data challenges, I am finally fascinated by on-the-fly visualisation and in-situ postprocessing.
Whenever possible, I make my research endeavours lead into open source software. My two major research codes are the Peano PDE solver framework and the C++ precompiler DaStGen which is an example for a (very simple) HPC-specific language extension. Both are used in the ExaHyPE project, e.g. Another code we currently work on is the Δ code (pronounced Delta), a triangle-based contact detection toolkit.
Three credos shape my work:
It is important to use state-of-the-art mathematics plus state-of-the-art hardware technology when we study algorithms and their implementation. It does not make sense to invest effort into suboptimal mathematics, and neither does it make sense to test things on old hardware.
It is important to find an efficient implementation. The best algorithm and the best quality data are of limited value if there is no efficient implementation and processing.
It is important to deliver plain and verifiable correct implementations which are made freely available. Otherwise, others cannot adopt ideas and results are difficult to reproduce and to validate. Wherever possible, we have to provide performance, robustness and data access models for our codes.
Recent areas of research
Communication-avoiding algorithms and implementations
I’ve proposed multiple algorithms, implementation patterns and techniques that I would classify as communication-avoiding. My notion of communication-avoiding here is a little bit more generic compared to what you find in linear algebra or MPI literature: I consider both data transfers on a node (between memory, caches and CPU, e.g.) and the transfer between nodes as communication, and I don’t think that communication per se is a problem. It is flaws tied to communication such as algorithmic latency (as we have to wait for data to arrive in our registers) or a lack of memory as we exceed caches due to too many data stores. The major breakthroughs in this area are our work on single-touch multiscale algorithms (arXiv:1607.00648 and arXiv:1508.03954), and on Finite-Volume/ADER-DG implementation techniques that are single-touch, hide data transfer behind communication and memory modest (arXiv:1806.07984 and arXiv:1801.08682). I summarised my key insights on communication-avoiding in this overview article.
How to write high-quality large-scale software
My two flagship codes are definitely Peano and ExaHyPE, and we’ve written various papers that describe their software design. I think many ideas therein can be used as general patterns for various application domains, and thus can stimulate a discussion how to write high-quality codes. The most recent and perhaps best papers summarising the key ideas are arXiv:1905.07987 for ExaHyPE and arXiv:1506.04496. The latter is the no 1 paper I do recommend.
Code that adopts to the machine
We spend a significant time of our life on the tuning and tailoring of our software to machine characteristics. We try to load balance work, to allow compilers to select the best machine instruction set, or to identify code parts that fit best to special-purpose accelerators. I’m not sure if this is, on the long term, the end of the story. We may assume that future machines are by construction not time-invariant – we already see that performance fluctuates for energy reasons – or that algorithms are not intrinsically well-suited to exploit a whole machine. As a result, we may ask ourselves whether we can even achieve a reasonable tuning/tailoring ourselves; or whether we have to equip our codes with the capability to adopt to the (changing) machine characteristics. First steps in this direction a very light-weight task-based load balancing or task parallelisation which can cope with partial machine breakdowns.