|
Almost all problems in PWscf arise from incorrect
input data and result in error stop. Error messages should be self-explanatory,
but unfortunately this is not always true.
Note that the program may have stopped well
after it stopped to write, because buffers may not have been flushed.
This is especially true for parallel execution under a batch queue.
In the latter case, error messages have a nasty habit of hiding
into error files where nobody looks into, or to disappear altogether.
Typical pw.x (mis-)behavior:
- pw.x stops with error in readin.
There is an error in the input data. Usually it is a misspelled
namelist variable. For IBM machines, or if you really cannot find
anything wrong, see Sec. "Running PWscf". Note that out-of-bound
indices in dimensioned variables read in the namelist may cause
the code to crash with really mysterious error messages.
- pw.x mumbles something like ''cannot
recover'' or ''error reading recover file''. You have a bad
restart file from a preceding failed execution. Remove all files
restart* in tmp_dir.
- pw.x stops in cdiagh or cdiaghg.
Possible reasons: 1) error in data, such as bad atomic positions
or bad crystal structure/supercell; 2) a bad PP; 3) IBM SP3: under
some circumstances (typically a large number of k-points) we get
an error in cdiaghg that is reproducible but disappears if we
change anything in the calculation. We don't know what happens
and why. Try to use conjugate-gradient diagonalization (isolve=1).
- pw.x stops with no error message for
no apparent reason. Possible reasons: 1) the error message
has been swallowed by the operating system, see above. 2) nonexistent
or non accessible tmp_dir. Note that in parallel execution,
tmp_dir must exist and be accessible to all active processors.
3) too much memory requested. Possible solutions:
- increase the amount of memory you are
authorized to use, if possible (ask your system guru)
- reduce nbnd to the strict
minimum
- use conjugate-gradient diagonalization
(isolve=1): slower but requires less memory.
- in parallel execution, use more processors,
or use the same number of processors with less pools. Remember
that parallelization with respect to k-points (pools) does
not distribute memory: parallelization with respect to R-
and G-space does.
- IBM only: if you need more than 256
Mb you must specify it at link time (option -bmaxdata).
- pw.x runs but nothing happens.
Possible reasons: 1) In parallel execution, the code died on on
just one processor. Unpredictable behavior may follow. 2) In scalar
execution, the code encountered a floating-point error and goes
on producing NaN's (Not a Number) forever unless exception handling
is on (and usually it isn't). In both cases, look for one of the
reasons given above.
- pw.x yields weird results or crashes
for no good reason. If this happen after a change in the
code or in compilation or precompilation options, try make
clean and recompile. The make command should take
care of all dependencies, but do not rely too heavily on it. You
may also try to reduce the optimization level.
- pw.x does not find all the symmetries
you expected. Increase the number of significant figures
in the atomic positions, or increase the value of variable accep
in pwlib/checksym.f90. accep is used to decide
whether a rotation is a symmetry operation. Its current value
(10-5) is quite strict: a rotated
atom must coincide with another atom to 5 significant digits.
- Self-consistency is slow or does not
converge. Reduce the beta parameter from the default
value (0.7) to
0.3 - 0.1, down
to as little as 0.01 for difficult
cases. You may also try to increase nmix to more than
4 (default value). Specific to US PP: the presence of negative
charge density regions due to either the pseudization procedure
of the augmentation part or to truncation at finite cutoff may
give convergence problems. Raising the dual parameter
to increase the cutoff for charge density will usually help, especially
in gradient-corrected calculations.
- Structural optimization goes wild after
the first or second step The algorithm used in structural
optimization is not very robust. If you start too far away from
minimum, it may lead to badly wrong atomic positions. Restart
from a better starting point.
- Structural optimization is slow or
does not converge. Close to convergence the self-consistency
error in forces may become large with respect to the value of
forces. The resulting mismatch between forces and energies may
confuse the line minimization algorithm, which assumes consistency
between the two. The code reduces the starting self-consistency
threshold tr2 when approaching the minimum energy configuration,
up to a factor defined by upscale. Reducing tr2
(or increasing upscale) yields a smoother structural
optimization, but if tr2 becomes too small, electronic
self-consistency may not converge. You may also increase variables
epse and epsf that determine the threshold for
convergence (the default values are quite strict).
A limitation to the accuracy of forces
comes from the absence of perfect translational invariance.
If we had only the Hartree potential, our PW calculation would
be exactly (to machine precision) translationally invariant.
The presence of an exchange-correlation potential introduces
Fourier components in the potential that are not in our basis
set. This loss of precision (more serious for gradient-corrected
functionals) translates into a slight but detectable loss of
translational invariance (the energy changes if all atoms are
displaced by the same quantity, not commensurate with the FFT
grid). This puts a limit to the accuracy of forces. The situation
improves somewhat by increasing the dual parameter.
Also note that in many systems you may
have ''floppy'' low-energy modes, that make very difficult -
and of little use anyway - to reach a well converged structure,
no matter what.
For the phonon code, most of the above applies
as well.
- ph.x mumbles something like ''cannot
recover'' or ''error reading recover file''. You have a bad
restart file from a preceding failed execution. Remove all files
recover* in tmp_dir.
- ph.x does not yield an acoustic mode
at q=0 with
= 0. This may not be an error: the
Acoustic Sum Rule (ASR) is never exactly verified, because the
system is never exactly translationally invariant as it should
be (see the discussion above). The frequency of the acoustic mode
should not exceed 50 cm-1 or so, and if the dynamical matrix is diagonalized
with program dynmat.x imposing the ASR, should go much closer to 0, with all
other modes virtually unchanged.
- ph.x yields really lousy phonons, with
bad frequencies or wrong symmetries or gross ASR violations.
Possible reasons: 1) Wrong filpun file read, 2) For US
PP: insufficient cutoff for the charge density (increase dual).
3) Convergence threshold for either scf (tr2) or phonon
(tr2_ph) too large.
|