In the first project period, a selection of eight dwarfs has been extracted from the host forecast model, of which four have been ported and assessed on Intel Xeon, Intel Xeon Phi and NVIDIA GPU processor types so far. This has been achieved in conjunction with the ESCAPE flexible and parallel data framework Atlas, which is a generalized interface for dwarfs and links to the established GridTools library. GridTools is the software layer that contains, among other features, DSL for hiding hardware dependence from ESCAPE dwarfs. This DSL concept has been demonstrated for the MPDATA dwarf.
Further, installations of limited-area prediction models at ECMWF have been performed in the first phase that serve as reference standards for performance evaluations once selected dwarfs will be reintegrated in the models. These reference installations also include the use of the Atlas library. This capability allows gauging the impact of running optimized dwarfs on novel hardware in full-sized forecast systems.
In the second period the work focused on the following.
9 dwarfs were created and used in the project for hardware adaptation, performance optimisation and energy measurements. Multiple resolutions for different processes and multiple resolutions through multigrid preconditioning were investigated, finding significant potential for speedup and reduction in the number of elliptic solver iterations.
Different dwarfs were ported to different architectures (CPU, GPU and MIC) using directive-based approaches based on standards supported by vendors, revealing several key features of the directives essential for providing both portability and performance portability of the ESCAPE dwarfs.
A complete DSL definition and implementation was delivered, capable to represent dynamical cores on unstructured meshes as well as structured grids. The use of the language and the performance obtained for multiple architectures was demonstrated for the MPDATA unstructured dwarf. Accelerator-capable dwarfs were delivered that will become part of future HPC benchmarks for typical weather applications after the conclusion of the project.
Platform-specific optimization of dwarfs was performed on both single- and multi-node CPU and GPU systems. Dwarfs related to the dynamical core as well as column-based physics were selected, with specific focus on the formulation relevant to spectral transformations as used in ECMWF’s IFS code.
Modelling of the achieved performance based on measurements was investigated. The models used key performance drivers such as data flow vs locality and communication patterns and their dependence on precision, accuracy and clock-speed to accurately represent dwarf performance with a simplified parameterization that is based on meaningful predictors.
For selected dwarfs, the simulator framework DCworms was used to ingest the performance parameterization and to estimate overall computing and energy performance at scale. Based on these simulations, criteria for the distinction between various hardware/software choices were derived, as well as a possible strategy for choosing a specific processor. Four reference Limited-Area Models (LAM) were installed at ECMWF to assess the performance and scalability of full-scale models as well as to determine the representativeness of dwarfs for the full workload. Energy-aware metrics and an energy measurement methodology were proposed that characterizes entire models sufficiently well.
Finally, the second dissemination workshop was delivered as well as the final dissemination assembly (as a webinar), and held the Young Scientists Summer School in Copenhagen in the second project phase.