Almost all problems in PWscf arise from incorrect input data
and result in error stop. Error messages should be
self-explanatory, but unfortunately this is not always true.
Note that the program may have stopped well after it stopped
to write, because buffers may not have been flushed. This is
especially true for parallel execution under a batch queue.
In the latter case, error messages have a nasty habit of
hiding into error files where nobody looks into, or to
disappear altogether.
Typical pw.x (mis-)behavior:
- pw.x stops with error in reading. There is an error
in the input data. Usually it is a misspelled namelist variable,
or an empty input file.
Note that out-of-bound indices in dimensioned variables read in the
namelist may cause the code to crash with really mysterious error
messages.
- pw.x mumbles something like ``cannot recover'' or
``error reading recover file''. You have a bad restart file
from a preceding failed execution. Remove all files restart*
in outdir.
- pw.x stops in cdiagh or cdiaghg.
Possible reasons: 1) error in data, such as bad atomic positions or bad
crystal structure/supercell; 2) a bad PP; 3) IBM SP3: under some
circumstances (typically a large number of k-points) we get an error
in cdiaghg that is reproducible but disappears if we change anything
in the calculation. We don't know what happens and why. Try to use
conjugate-gradient diagonalization (diagonalization='cg').
- pw.x stops with no error message for no apparent reason.
Possible reasons: 1) the error message has been swallowed
by the operating system, see above.
2) nonexistent or non accessible outdir.
Note that in parallel execution, outdir must exist and be
accessible to all active processors.
3) too much memory requested. Possible solutions:
- increase the amount of memory you are authorized to use,
if possible (ask your system guru)
- reduce nbnd to the strict minimum
- use conjugate-gradient diagonalization
(diagonalization='cg'):
slower but requires less memory.
- in parallel execution, use more processors, or use the
same number of processors with less pools.
Remember that parallelization with respect to k-points (pools)
does not distribute memory: parallelization with respect to
R- and G-space does.
- IBM only: if you need more than 256 Mb you must specify it
at link time (option -bmaxdata).
- pw.x runs but nothing happens.
Possible reasons:
1) In parallel execution, the code died on on just one processor.
Unpredictable behavior may follow.
2) In scalar execution, the code encountered a floating-point
error and goes on producing NaN's (Not a Number) forever
unless exception handling is on (and usually it isn't).
In both cases, look for one of the reasons given above.
- pw.x yields weird results or crashes for no good reason.
If this happen after a change in the code or in compilation or
precompilation options, try make clean and recompile.
The make command should take care of all dependencies,
but do not rely too heavily on it.
You may also try to reduce the optimization level.
- pw.x does not find all the symmetries you expected.
Increase the number of significant figures in the atomic positions,
or increase the value of variable
accep in PP/checksym.f90. accep is used to
decide whether a rotation is a symmetry operation. Its current
value (10-5) is quite strict: a rotated atom must coincide
with another atom to 5 significant digits.
- Self-consistency is slow or does not converge.
Reduce the mixing_beta parameter from the default value
(0.7) to
0.3 - 0.1 or smaller, or try a different
mixing_style. You may also try to increase mixing_ndim
to more than 4 (default value).
Specific to US PP: the presence of negative charge density regions
due to either the pseudization procedure of the augmentation part
or to truncation at finite cutoff may give convergence problems.
Raising the ecutrho cutoff for charge density will usually
help, especially in gradient-corrected calculations.
- Structural optimization goes wild after the first or second step
The algorithm used in structural optimization is not very robust.
If you start too far away from minimum, it may lead to badly
wrong atomic positions. Restart from a better starting point.
- Structural optimization is slow or does not converge.
Close to convergence the self-consistency error in forces may
become large with respect to the value of forces. The resulting
mismatch between
forces and energies may confuse the line minimization algorithm,
which assumes consistency between the two. The code reduces
the starting self-consistency threshold conv_thr when approaching
the minimum energy configuration, up to a factor defined by
upscale. Reducing conv_thr (or increasing upscale)
yields a smoother structural optimization, but if conv_thr
becomes
too small, electronic self-consistency may not converge. You may also
increase variables etot_conv_thr and forc_conv_thr
that determine the
threshold for convergence (the default values are quite strict).
A limitation to the accuracy of forces comes from the absence of
perfect translational invariance. If we had only the Hartree
potential, our PW calculation would be translationally invariant
to machine precision. The presence of an exchange-correlation
potential introduces Fourier components in the potential that are
not in our basis set. This loss of precision (more serious for
gradient-corrected functionals) translates into a slight but
detectable loss of translational invariance (the energy changes
if all atoms are displaced by the same quantity, not commensurate
with the FFT grid). This puts a limit to the accuracy of forces.
The situation improves somewhat by increasing the ecutrho
cutoff.
Also note that in many systems you may have ``floppy'' low-energy
modes, that make very difficult - and of little use anyway
- to reach a well converged structure, no matter what.
For the phonon code, most of the above applies as well.
- ph.x mumbles something like ``cannot recover'' or
``error reading recover file''. You have a bad restart file
from a preceding failed execution. Remove all files recover*
in outdir.
- ph.x does not yield acoustic modes with
= 0 at q=0.
This may not be an error: the Acoustic Sum Rule (ASR) is never
exactly verified, because the system is never exactly translationally
invariant as it should be (see the discussion above).
The frequency of the acoustic mode should not exceed 50 cm-1
or so, and if the dynamical matrix is diagonalized with
program dynmat.x imposing the ASR,
should go much
closer to 0, with all other modes virtually unchanged.
- ph.x yields really lousy phonons, with bad frequencies
or wrong symmetries or gross ASR violations.
Possible reasons:
1) Wrong data file file read. 2) For US PP: insufficient
cutoff for the charge density (increase ecutrho).
3) Convergence threshold for
either scf (conv_thr) or phonon (tr2_ph) too large.
The PWSCF Group - 2003-01-31