:: home
:: team
Pwscf Pwscf Pwscf
PwscfHOME
Pwscf Pwscf
Pwscf Pwscf
Pwscf Pwscf
Pwscf
Menu Pwscf
Pwscf Pwscf

::

about PWscf
Pwscf Pwscf

::
Pwscf Pwscf

::

user's guide
Pwscf Pwscf

::

download PWscf
Pwscf Pwscf

::

tests and examples
Pwscf Pwscf

::

pseudopotentials
Pwscf Pwscf

::

scientific literature
Pwscf Pwscf
Pwscf

USER'S GUIDE 1.1> Troubleshooting
Pwscf
Pwscf
Pwscf


Pwscf Troubleshooting

 

Almost all problems in PWscf arise from incorrect input data and result in error stop. Error messages should be self-explanatory, but unfortunately this is not always true.

Note that the program may have stopped well after it stopped to write, because buffers may not have been flushed. This is especially true for parallel execution under a batch queue. In the latter case, error messages have a nasty habit of hiding into error files where nobody looks into, or to disappear altogether.

Typical pw.x (mis-)behavior:

  • pw.x stops with error in readin. There is an error in the input data. Usually it is a misspelled namelist variable. For IBM machines, or if you really cannot find anything wrong, see Sec. "Running PWscf". Note that out-of-bound indices in dimensioned variables read in the namelist may cause the code to crash with really mysterious error messages.
  • pw.x mumbles something like ''cannot recover'' or ''error reading recover file''. You have a bad restart file from a preceding failed execution. Remove all files restart* in tmp_dir.
  • pw.x stops in cdiagh or cdiaghg. Possible reasons: 1) error in data, such as bad atomic positions or bad crystal structure/supercell; 2) a bad PP; 3) IBM SP3: under some circumstances (typically a large number of k-points) we get an error in cdiaghg that is reproducible but disappears if we change anything in the calculation. We don't know what happens and why. Try to use conjugate-gradient diagonalization (isolve=1).
  • pw.x stops with no error message for no apparent reason. Possible reasons: 1) the error message has been swallowed by the operating system, see above. 2) nonexistent or non accessible tmp_dir. Note that in parallel execution, tmp_dir must exist and be accessible to all active processors. 3) too much memory requested. Possible solutions:
    • increase the amount of memory you are authorized to use, if possible (ask your system guru)
    • reduce nbnd to the strict minimum
    • use conjugate-gradient diagonalization (isolve=1): slower but requires less memory.
    • in parallel execution, use more processors, or use the same number of processors with less pools. Remember that parallelization with respect to k-points (pools) does not distribute memory: parallelization with respect to R- and G-space does.
    • IBM only: if you need more than 256 Mb you must specify it at link time (option -bmaxdata).
  • pw.x runs but nothing happens. Possible reasons: 1) In parallel execution, the code died on on just one processor. Unpredictable behavior may follow. 2) In scalar execution, the code encountered a floating-point error and goes on producing NaN's (Not a Number) forever unless exception handling is on (and usually it isn't). In both cases, look for one of the reasons given above.
  • pw.x yields weird results or crashes for no good reason. If this happen after a change in the code or in compilation or precompilation options, try make clean and recompile. The make command should take care of all dependencies, but do not rely too heavily on it. You may also try to reduce the optimization level.
  • pw.x does not find all the symmetries you expected. Increase the number of significant figures in the atomic positions, or increase the value of variable accep in pwlib/checksym.f90. accep is used to decide whether a rotation is a symmetry operation. Its current value (10-5) is quite strict: a rotated atom must coincide with another atom to 5 significant digits.
  • Self-consistency is slow or does not converge. Reduce the beta parameter from the default value (0.7) to $ \sim$ 0.3 - 0.1, down to as little as $ \sim$ 0.01 for difficult cases. You may also try to increase nmix to more than 4 (default value). Specific to US PP: the presence of negative charge density regions due to either the pseudization procedure of the augmentation part or to truncation at finite cutoff may give convergence problems. Raising the dual parameter to increase the cutoff for charge density will usually help, especially in gradient-corrected calculations.
  • Structural optimization goes wild after the first or second step The algorithm used in structural optimization is not very robust. If you start too far away from minimum, it may lead to badly wrong atomic positions. Restart from a better starting point.
  • Structural optimization is slow or does not converge. Close to convergence the self-consistency error in forces may become large with respect to the value of forces. The resulting mismatch between forces and energies may confuse the line minimization algorithm, which assumes consistency between the two. The code reduces the starting self-consistency threshold tr2 when approaching the minimum energy configuration, up to a factor defined by upscale. Reducing tr2 (or increasing upscale) yields a smoother structural optimization, but if tr2 becomes too small, electronic self-consistency may not converge. You may also increase variables epse and epsf that determine the threshold for convergence (the default values are quite strict).

    A limitation to the accuracy of forces comes from the absence of perfect translational invariance. If we had only the Hartree potential, our PW calculation would be exactly (to machine precision) translationally invariant. The presence of an exchange-correlation potential introduces Fourier components in the potential that are not in our basis set. This loss of precision (more serious for gradient-corrected functionals) translates into a slight but detectable loss of translational invariance (the energy changes if all atoms are displaced by the same quantity, not commensurate with the FFT grid). This puts a limit to the accuracy of forces. The situation improves somewhat by increasing the dual parameter.

    Also note that in many systems you may have ''floppy'' low-energy modes, that make very difficult - and of little use anyway - to reach a well converged structure, no matter what.

For the phonon code, most of the above applies as well.

  • ph.x mumbles something like ''cannot recover'' or ''error reading recover file''. You have a bad restart file from a preceding failed execution. Remove all files recover* in tmp_dir.
  • ph.x does not yield an acoustic mode at q=0 with $ \omega$ = 0. This may not be an error: the Acoustic Sum Rule (ASR) is never exactly verified, because the system is never exactly translationally invariant as it should be (see the discussion above). The frequency of the acoustic mode should not exceed 50 cm-1 or so, and if the dynamical matrix is diagonalized with program dynmat.x imposing the ASR, $ \omega$ should go much closer to 0, with all other modes virtually unchanged.
  • ph.x yields really lousy phonons, with bad frequencies or wrong symmetries or gross ASR violations. Possible reasons: 1) Wrong filpun file read, 2) For US PP: insufficient cutoff for the charge density (increase dual). 3) Convergence threshold for either scf (tr2) or phonon (tr2_ph) too large.

 

Pwscf
Pwscf      powered by Incipit