Hybrid Parallel Computing in CMPS

November 22, 2025

Hybrid Parallel Computing and HPC Methods in the CMPS Solver: Distributed Memory, Shared Memory, Hybrid, and GPU Computing

Computational Fluid Dynamics (CFD) solvers help us understand and simulate complex flow phenomena in engineering and scientific research. The CMPS CFD solver offers an effective solution in this field with its physics-based modeling and parallel computing approaches.

Scalability Challenges in CFD

CFD simulations require precise resolution of spatial and temporal scales, often leading to an increased need for computational resources. The ability of a solver to efficiently utilize an increasing number of processors—high scalability—is crucial for solving complex problems. CMPS addresses this need with its optimization for parallelization strategies and HPC environments.

Domain Decomposition in CMPS

Parallel Computing in CMPS

CMPS is designed to be compatible with various platforms, including current supercomputer clusters, desktop systems, shared memory systems, and multi-core architectures. It also integrates effectively with high-speed networks like InfiniBand and Myrinet. During its development, both cluster systems and shared memory architectures were supported.

Distributed Architecture

Distributed Memory Methods in CMPS

Combining multiple CPU cores in a shared memory system can be challenging. As a result, modern supercomputers adopt a distributed memory strategy: multiple computer nodes are linked together and connected via high-speed networks. In today's supercomputers:

Each individual computer is referred to as a node.

Each node operates independently with its own memory, isolated from other nodes.

Each node runs its own operating system.

Data exchange between nodes occurs over a network.

Launching a distributed solver in CMPS is straightforward. Users can easily input the names of nodes through the GUI. CMPS balances these nodes and automatically transmits the necessary data for each node to perform computations. This process is known as domain decomposition.

CMPS employs algorithms for domain decomposition and parallel solution strategies to ensure efficient computation.

Hybrid Parallel and Shared Memory Methods in CMPS

Shared memory systems allow multiple processors to use a common memory pool, which accelerates data access and inter-processor communication. However, mechanisms like locks or barriers may be required for processors attempting to access or modify the same data.

Solution Algorithm

Shared Memory Method

MPI-based models require less synchronization compared to shared memory systems and reduce errors during data transfers, offering a more reliable solution in complex systems.

CMPS adopts a hybrid approach. This approach facilitates effective communication within a shared memory system while connecting physical machines through high-speed networks. This ensures both scalability and performance improvements.

MPI Buffer Optimization in CMPS

When communicating non-basic data types with MPI, the data must be packed in buffer memory, which creates an additional workload on the receiving node. CMPS optimizes this process by making buffer memory allocations persistent and minimizing repetitive operations. This accelerates data transfers and enhances overall performance.

GPU Computing

GPU (Graphics Processing Unit) computing is experimentally used in the CMPS solver for specific matrix solver processes and computation-intensive operations, but it will not be available in the initial release. Thanks to their numerous parallel cores, GPUs can perform computation-heavy tasks faster than CPUs. By utilizing GPUs, the CMPS solver achieves:

Reduction of Heavy Computational Loads: Repetitive tasks like matrix multiplications and linear algebra operations are offloaded to the GPU.

Performance Improvements: GPU computing significantly reduces total processing times.

Efficient Parallelization: GPUs handle multiple threads simultaneously, processing workloads more efficiently.

With GPU optimization, CMPS reduces computation times for large-scale problems while improving energy efficiency.

Language