The projects below are suggestions for student projects on the MSc level. Some of them can be developed into proper PhD proposals. If you are interested in pursuing a PhD, feel free to contact me.
SPH (Smooth Particle Hydrodynamics) is a popular technique in many application fields to simulate fluids (matter distribution in galaxies, e.g.). It is convenient, as the fluid is not simulated over a grid but through distributions which move around in space. Typically, developers add however a grid on top of these “particles” (centres of distributions) to speed up certain steps such as finding close-by other distributions. In the Peano code, we have successfully written some particle management routines where the grid hosts particles – the grid is not a sole metadata but actually owns all particles. While we have only used it for Particle in Cell (PIC) methods and Discrete Elements (DEM), there is no reason why these data structures should not work for SPH as well. In this project, we will demonstrate this idea by implementing a real SPH solver within Peano4.
Space-filling brute force
Space-filling curves are a popular tool to realise domain decomposition on regular and adaptive Cartesian grids. They define a ‘curve’ (polynomial) that runs through all cells of a grid or voxel field recursively. Popular curves are the Hilbert curve, the Lebesgue curve and the Peano curve. All of them serialise a 2d or 3d space. When we cut this curve into equally sized fragments, we also divide the space into equally sized fragments. We obtain a grid/domain partition. This process is known to yield quasi-optimal partitions, i.e. segments that are as good as spheres besides a constant, when we measure their surface relative to their volume; if the underlying grid is regular. The ratio is important as it determines the amount of local work to communication that we have to do on a parallel computer. Unfortunately, the constant is not known quantitatively (all existing bounds are rather inaccurate). We do not even know whether there is such a constant for adaptive grids. In this project, we want to create a simulator that starts from very simple curves and then advances to more and more complicated curve patterns. For each pattern, it determines the ratios or, more accurate, all constants determining the ratio. We hope to find, with such a brute-force approach, that the constant is bounded (and converges towards a magic number if the partitions become big enough) which would give the first quantitative formula to determine the partitions’ surface-to-volume ratio.
The ExaHyPE project is a EU-funded international project developing an engine to solve hyperbolic differential equations on supercomputers. The engine is used to run long-range seismic risk assessment (impact of earthquakes on critical infrastructure such as dams) and to simulate the dynamics of binary star systems that are possible sources of gravitational waves. ExaHyPE works with a so-called Discontinuous Galerkin approach and dynamically adaptive meshes, i.e. the problem of interest (the earthquakes, e.g.) is represented on a graph. For many applications, it would be more convenient (also because of the size of these graphs) if there were only a few tracer particles inserted into the simulation that illustrate how the underlying waves evolve over time. The goal of this project is to take ExaHyPE plus an existing particle administration approach and to fuse them in ExaHyPE. Once this is done, some larger simulations are to be ran and one can start to discuss how the particles in turn can affect the simulation (through some random noise added, e.g.).
Michael Griebel and others have written an excellent book on Navier-Stokes solvers. The book is tutorial-like and had been written actually for postgrad and undergrad courses doing lab work. The goal of this project is to follow the steps from the book and to realise them within Peano4, a massively parallel software for adaptive Cartesian meshes. The goal is to deliver a solver for incompressible fluids.
DaStGen is a simple, nice tool which I use quite intensively for all of my project. You find a description on my homepage. In its current form, it allows me to model C++ classes which are very memory-modest and supported by MPI. The biggest “problem” is that DaStGen is written in Java. In this project, you are supposed to redesign DaStGen in Python. DaStGen 2.0 will still be “usable” as command line tool, i.e. parse a domain-specific C++ extension, but there will also be a variant where users can assemble a DaStGen data within their Python code and then ask the code to generate plain C++ code. Finally, the new DaStGen version will support novel compressed float precision formats (bfloat16, e.g.).
Today’s schedulers typically ask the user to specify how many ranks (processes) run on a node concurrently. The scheduler then splits up the available cores among equally among the ranks. This does not take into account that load in simulations changes quickly and we might thus run into situations where some ranks can effectively use quite a lot of cores, while others have not that much to do. The goal in this project is to write a new TBB-based library where the individual ranks bid against each other how many cores each rank may use. And then they quickly migrate ownership if it suits the code. One rank invades the cores of the other rank if it is very compute-heavy and can make use of more ranks. Low-workload processes retreat from the cores in return.
For our flagship codes Peano and ExaHyPE which are used in multiple EU and UK research projects, we have defined a tailored file format to dump tree-based, dynamically adaptive data which can host both block-structured and Legendre/Lobatto substructures. As it is custom-made, it has a low memory footprint and is relatively simplistic. Examples for visualisation data dumped can be found in our video gallery.
As the format is custom-made, it is not natively supported by state-of-the-art visualisation software. I thus ship a small C++ command line tool which converts the file (or a file sequence for a video) into VTK. This is a mainstream visualisation format. For large-scale data postprocessing, such an offline approach is of limited use, as it is too time-consuming and requires/produces too many, too lare files. In this project, you will replace the C++ prototype with a Python library. The library will be available as stand-along library that can be used within other scripts, but it will also be available as Paraview Python plugin. It allows researchers to
- load Peano/ExaHyPE data natively within Paraview,
- run the required data conversations on-the-fly (as you load data or as you invoke it),
- filter and project data within the native format, i.e. to select which resolution is displayed where.
With a Plugin into Paraview, these processes become available within the Paraview GUI, but it should also be possible to run some steps remotely and to display/render data within a client-server setting. For this, we use Paraview’s client-server architecture. Eventually, we work towards a data exploration environment, where certain parts of the domain are postprocessed/displayed with certain resolution – depending on the level of detail visible to the scientists – are delivered with coarse resolution first to facilitate quick previews, and exploit on-the-fly compression to bring down the memory footprint.
Once these steps are completed – and if time permits – there are a couple of challenging research questions arising, from which the student can choose: Which steps of the resulting postprocessing pipeline could be rewritten in C/C++ and done right on the compute nodes producing the data? With such an in-situ data production, we can reduce the memory footprint and bandwidth needs further. Can we predict – via ML or heuristics – which data scientists explore next/first and trigger the preparation of these data asynchronously while the first pictures are rendered? Can we use pattern matching/ML to automatically identify regions of interest (anomalies) in the data and guide users towards these data regions, or can we use pattern matching to compress the data automatically? All of these questions have to be asked in a 4d context, as we typically work with 3d data over time.
Atanasov, Atanas, Srinivasan, Madhusudhanan & Weinzierl, Tobias (2012), Query-driven Parallel Exploration of Large Datasets, Large Data Analysis and Visualization (LDAV), 2012 IEEE Symposium on. 23 -30.
Atanasov, Atanas & Weinzierl, Tobias (2011), Query-driven Multiscale Data Postprocessing in Computational Fluid Dynamics, in Sato, Mitsuhisa, Matsuoka, Satoshi, van Albada, G. Dick, Dongarra, Jack & Sloot, Peter M.A. eds, Procedia Computer Science 4: Proceedings of the International Conference on Computational Science, ICCS 2011. 332-341.
A. Baker, D. Hammerling, S. Mickleson, H. Xu, M. Stolpe, P. Naveau, B. Sanderson, I. Ebert-Uphoff, S. Samarasinghe, F. De Simone, F. Carbone, C. Gencarelli, J. Dennis, J. Kay, and P. Lindstrom, “Evaluating Lossy Data Compression on Climate Simulation Data within a Large Ensemble,” Geoscientific Model Development, 9(12):4381-4403, December 2016.
P. Lindstrom, P. Chen, and E.-J. Lee, “Reducing Disk Storage of Full-3D Seismic Waveform Tomography (F3DT) Through Lossy Online Compression,” Computers & Geosciences, 93:45-54, August 2016. doi:10.1016/j.cageo.2016.04.009.