|Total run-time efficiency (all solution phases) for parallel MLFMM
solution of a problem with 3.18 million unknowns.
Many modern computer systems make use of multiple processing units in order to improve computing performance. Such systems include:
- simple multicore CPUs (i.e. one computer with one CPU having multiple cores),
- multi-CPU PCs and SMP workstations (symmetric multiprocessor, typically 2 to 8 CPUs),
- large massively parallel distributed systems with typically 128 to 1024 CPUs (which can again be multicore).
In order to gain the most benefit from the computational hardware, parallel versions of FEKO support state-of-the-art interconnect technologies like GigE, Myrinet, Infiniband or vendor proprietary interconnects like the SGI NumaFlex technology.
In FEKO all the solution phases for all the various numerical techniques have been parallelised, e.g. the ray-tracing for UTD, the MoM matrix setup and solution, the near- and far-field calculations and also seemingly simple things such as power loss computations.
|MoM factor time decreasing with increasing
number of processes
We are very proud of the parallel efficiency of the MLFMM in FEKO. Even for this mathematically complex technique all the phases of the solution process (near-field matrix setup, aggregation, translation, disaggregation, pre-conditioning, iterative solution etc.) have been parallelised rigorously. The efficiency of the parallel implementation in FEKO is in the order of 80% to 95%, depending on the problem and the solution phase etc. This means that for a system with 32 cores the run-time would be approximately 26 times (0.8*32) faster than on a sequential run, i.e. a single core.
MoM parallelisation is optimized for both shared and distributed memory computers and clusters to maximally utilize RAM and available CPU cores in such systems. Special coding techniques ensure highly efficient scaling of parallel processed MoM solutions with increasing unknowns or number of processes.
Distributed/Shared Memory Hybridisation Optimized
FEKO is designed to use the RAM available in any computing system as optimally as possible.
Distributed memory parallellization breaks up large blocks of memory for simulation on different nodes in a multi-node cluster computing system. To save processing time FEKO limits communication between processors by making identical copies of information that is often required and sending such a block to each node in the cluster.
|Distributed memory parallellization for a multi-core CPU computer|
In multi-core CPU and/or server architectures where multiple CPUs and cores are located on the same motherboard, sharing a block of RAM that contains relevant information is the ideal implementation. Each core can address any memory block in the shared RAM and copies of important information is not necessary to speed up computation. Such processing architectures are catered for in FEKO with shared memory parallellization.
|Shared memory parallellization for a multi-core CPU computer|
Distributed and shared memory parallellization is also hybridised in FEKO for multi-node computing clusters made up of multiple muti-core CPUs.
|Hybridised memory parallellization in a multi-core two CPU computing system|
Intel Cluster Ready
The "Intel Cluster Ready" program facilitates easier design, build and deployment of cluster computers. Intel works with system and software vendors to provide you with certified Linux-based systems clusters based on the Intel Cluster Ready architecture to assure application compatibility and ease of deployment.
FEKO is dedicated to improving the performance of our software in cluster computing environments and work closely with Intel engineers in this endeavour. As such FEKO was first certified as ISV by Intel in 2007 and may since proudly brand our software with the Intel Cluster Ready logo. This means that FEKO customers can purchase an Intel Cluster Ready certified computer with the confidence that FEKO has been qualified on this computing environment and will work straight out of the box.
More information on this initiative can be found on the Intel Cluster Ready website. These are our preferred Intel Cluster Ready hardware providers:
Windows HPC Server Ready
Windows HPC Server provides a productive, cost-effective, and high-performance computing (HPC) solution that runs on x64-bit hardware. In addition to supporting OpenMP, MPI, and Web Services, Windows HPC Server also supports third-party numerical library providers, performance optimizers, compilers, and a native parallel debugger for developing and troubleshooting parallel programs.
FEKO has been optimised for use on high performance parallel processing systems and as such it is important to FEKO to maintain compatibility with Microsoft's HPC Server.