Parallelization issues

pw.x can run in principle on any number of processors (up to maxproc, presently fixed at 128 in PW/para.f90). The Np processors can be divided into Npk pools of Npr processors, Np = Npk*Npr . The k-points are divided across Npk pools (``k-point parallelization''), while both R- and G-space grids are divided across the Npr processors of each pool (``PW parallelization''). A third level of parallelization, on the number of bands, is currently confined to the calculation of a few quantities that would not be parallelized at all otherwise. A fourth level of parallelization, on the number of NEB images, is available for NEB calculation only.

The effectiveness of parallelization depends on the size and type of the system and on a judicious choice of the Npk and Npr :

Note that for each system there is an optimal range of number of processors on which to run the job. A too large number of processors will yield performance degradation, or may cause the parallelization algorithm to fail in distributing properly R- and G-space grids.

Note also that Beowulf-style machines (PC clusters) may have disappointing parallelization performances unless they have a decent communication hardware (at least Gigabit ethernet). Do not expect good scaling with cheap hardware: plane-wave calculations are not at all an "embarrassing parallel" problem. Note that multiprocessor motherboards for Intel Pentium CPUs typically have just one memory bus for all processors. This dramatically slows down any code doing massive access to memory (as most codes in the Quantum-ESPRESSO package do) that runs on processors of the same motherboard.

The PWSCF Group - 2005-11-18