Periodic Reporting for period 2 - exaFOAM (Exploitation of Exascale Systems for Open-Source Computational Fluid Dynamics by Mainstream Industry)
Berichtszeitraum: 2022-10-01 bis 2024-03-31
The exaFOAM project aimed to exploit new architectures and hybrid approaches is undertaken through the development and validation of algorithmic improvements, across the entire CFD process chain (preprocessing, simulation, I/O, post-processing). Effectiveness is demonstrated via a suite of HPC Grand Challenge and Industrial Application Challenge across all sectors of industry (transportation, power generation, disaster prevention, health and safety) to increase energy efficiency comfort and health/safety.
Stated AIMS:
- Demonstrate one or more orders-of-magnitude performance improvement on industry-based grand challenges
- Release technology advances realised during this project via opensource OpenFOAM as respecting the terms of GPLv3
- Exploit performance gains among European Industry partners vested in exaFOAM as supporters/stakeholders in coordination with the OpenFOAM Governance Structure
Society IMPACT:
Supporters and Stakeholders represent original equipment manufacturers whose design improvements realised by performance gains in this project benefits society directly in the form of energy-efficiencies, pollution-reduction and improved health/safety.
Conclusions:
The exaFOAM Consortium Members are proud to report incremental performance of percent or factors which collectively sum to several factors of gain, and individual “one-hit” developments which by themselves demonstrate order-of-magnitude performance gains through I/O refactoring, algorithm changes, and hand-off to external utilities on heterogeneous HPC architectures. This fulfils our first overall collective Aim.
The second Overall Aim, to release developments from exaFOAM to the general public as open source is achieved via several consecutive releases of OpenFOAM, and through providing openly accessible benchmark cases (microbenchmarks, industrial cases and grand challenges) covering applications in several Industry sectors.
Fulfilling our third Overall Aim, active and continuing involvement of Stakeholders and the wider community through OpenFOAM Governance ensures the medium-term dissemination of our demonstrated gains. Developments in exaFOAM also form a firm foundation for further exploitation in a world of fast-changing HPC technology.
WP2: Validation and Assessment, has successfully provided several industry-strength benchmark cases with free and open public access. These cases are used in WP6 to profile performance and identify potential improvements, realise improvements in WPs 3&4, and targeted for implementation and release in WP6.
WP3: Code Refactoring, two key performance improvement targets identified are I/O and framework for external solver utilization for execution on GPUs, both resulting in order-of-magnitude performance improvements.
WP4: Code Evolution, direct use of Cuda-code (for GPUs), lossy compression techniques, coupled solves assessments, use of I/O external utilities, assessments of GGI/AMI rotational implementations and dynamic load balancing all provide tangible performance improvements. New semi-implicit coupled solver techniques are enablers which allow physics solutions not possible before.
WP5: Co-design, Profiling and Performance has developed new metrics for evaluating performance independent of cores used, demonstrated on several microbenchmarks, industrial cases and grand challenge, with profiling tools now integrated in the public release code.
WP6: Integration, and release within the project timescale have manifested the I/O enablers, external solver utilities and order-of-magnitude algorithmic improvement arising from the combination of WP technical and validation tasks.
WP7: All dissemination and impact target metrics are met for this project. An Open Access book publication containing learnings, findings and best practices is in preparation. Several of the technologies offer room for further exploitation. In particular, GPU deployment gives ample opportunity to continue dialogue with major chip-architecture organisations, separately and collectively. The project beneficiaries have identified several further opportunities through National Funding Calls and will pursue these on the basis of strong foundations laid during exaFOAM.
- Improvement in absolute time-to-solution for steady state problems. Coupled solver convergence and stability improvement over existing segregated solution method - Target = 0.8 Result = <0.5 - Coupled solver runs for several steady-state cases exceeded the 20% expected gain.
- Improvement in execution speed of highly parallel linear solvers. The Implementation of parallelizable preconditioners and improvements in vector-matrix multiplication reduces execution time per iteration of existing linear solvers - Target = 0.9 Result = <0.5 - Deliverable D4.3 further demonstrates that the semi-coupled solver approach in a technology enabler in that, for certain classes of viscoelastic cases, solutions previously not possible were achieved.
- AMI Overhead versus non-AMI benchmark Target = 0.9 Result = <0.5 - Pre-exaFOAM performance of sliding interface runs using GGI/AMI measures an overhead due rotating meshes of 200% compared with the non-sliding mesh equivalent. The performance of GGI/AMI for rotating cases demonstrates up to 63% performance gain at low-thousands core-count using AMI improvements from exaFOAM the release code. Regrettably, scaling improvement results for GGI are inconclusive, since this is the case of incremental improvement of a fundamentally non-scalable algorithm of significant complexity.
- Vectorization of current OpenFOAM Code - Target = 0.9 Result = 30-100% If one measures in terms of solver time of vectorizable part of the code, offload to external linear solver on GPU achieves between 20% and 200% gain for steady and transient execution using the vector/pipe-line solver AMGX on Nvidia’s A100 GPU (higher accelerations on the more recent Grace-Hopper systems just coming onto the market). If one measures in terms of Accelerators (e.g. OpenMP Pragma), we can state that 100% of the code has been piloted for vectorized-pipelined execution, using the approach of AMD on their newest MI300a system. Details in Deliverable D3.10 and D4.7.
- Increase Computational Intensity (X-axis of Roof-line model). A better data caching will produce a higher theoretical limit for performance - Target = x4, Result = x4. x4 performance benefit was demonstrated by using data accessible directly by the hybrid chips on AMDs MI300a CPU/GPU shared memory architecture.
- Increase magnitude in Performance (Y-axis Roof-line model, it is Hw dependent). Provides an overall measurement of code efficiency - Target = x4, Result = x4. The work done realises performance gains of x3.8-8.4 purely from hardware architectural enhancements tuned for OpenFOAM utilities developed during exaFOAM, which benefits from high-memory bandwidth and more accessible memory (e.g. Nvidia’s Grace-Hopper and AMD’s MI300a)
IMPACT:
European original equipment manufacturers (OEMs), particularly project stakeholders, will utilize the technologies developed in this project to accelerate time-to-market and enhance engineering design for energy efficiency, comfort, and health/safety.