1. Q: What’s the relationship between DP-GEN and DeePMD-kit?

A: These two packages do different things. Deepmd-kit is for training. DP-GEN is for automatically generating ab initio data for training. If you already have a set of data, you can train your data directly with DeePMD-kit (All deepmd-kit to DeePMD-kit) and check the quality of the model. If the results are not good enough, you need to use DP-GEN to generate new data to further improve the quality of the model.

The training data is generated by first principles software such as VASP, but calculating which configurations  is not arbitrary. What DP-GEN does is to automate the following steps:

  • 1)Sample with the DP model.
  • 2)Select configurations from 1) and do ab initio calculation where the deviation is large.
  • 3)Accumulate ab initio data for training and ultimately obtain a uniformly accurate DP model.


2. Q: When DeePMD-kit was used to simulate amorphous alloy, there were both crystalline structures and md structures during training. Although the results of DP agree well with first-principles calculations at high temperature, after cooling down, the two phases tend to separate. Is there any good way? (All tenses should be consistent)

A: At this time, it is indicated that some configurations after the temperature drops may not be well trained, so more training data may be generated by DP-GEN. To verify the assumption, the simulation error on the MD trajectories can be tested by using model deviation. See the method of showing model deviation in LAMMPS



3. Q: How do I install the DeePMD-kit?

A: In DeePMD-kit’s readme, we gave a complete setup strategy from scratch. Meanwhile, here are two convenient ways to install the DeePMD-kit from



4. Q: When training a model with DeepMD, I want to train substances such as Ni3Al and Al3Ni together, how do I Set up the calculation? And How do I write the type.raw ?

A: Both systems containing different components and systems with different atomic numbers should be placed in different folders.


5. Q: DeepMD has been successfully installed and Operates correctly most of the times, but when executing dp_frz ,it will report the following error. Why?ERROR REPORTING

A: Try to execute pip install six. Generally, there is the “six” module when installed with anaconda.


6. Q: Which versions of the dependent software are recommended?


python Training 3.5~3.7
bazel Compiling the tensorflow 0.24.1,it depends on the version of tensorflow.
tensorflow Machine learning tool 1.12 1.14 2.0
git Compiling the Bazel The newer the better
cuda Supporting the GPU environment of tensorflow cuda-9.0 cuda-10.0 cuda-10.1
cudnn Supporting the GPU environment of tensorflow It depends on the cuda you used. Sometimes it was installed.
nccl Supporting the GPU environment of tensorflow It depends on the cuda you used. Sometimes it was installed
cmake Compiling the DeePMD-kit >= 3.7
ICC The system’s own C compiler The newer the better
gcc The system’s own C compiler gcc4.8 gcc5 ..
lammps Running the MD 18 and 19 are both ok.
mpi Compiling the lammps Recommend intel impi


7. Q: The python version in ubuntu18.04 is 3.6.8. Can I change it to Python 3.7.1?

A: It is not recommended to change the default Python version. The installation in the manual is done with virtual env, and the system default Python will not affect from it. Py3.6 can be used with gcc5 and above.


8. Q: How do I install the DP-GEN?

A: If you are in China, try this.

conda install dpgen -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/


9. Q: When running the example of water by GPU, the rate of untility is only 20%, Is it acceptable?

A: The system of the test example itself is relatively small, the calculation amount is also relatively small, so GPU acceleration is not obvious and the rate of utility is normal. It is normal within 5s of each 100 steps, and there is no need to worry about GPU usage, which is related to machine type and system size.


10. Q: What’s the recommended relationship between stop_batch  and the number of frames?

A: There is no absolute answer to this question. Some training sets have a large amount of data but little information, while others are the opposite. In general, the larger the stop_batch sets, the better quality of the model is.

We recommend 400,000 steps for training in DP-GEN process, and 2,000,000 steps for a production model, for the first try.

Besides, we commonly let stop_batch = 200 * decay_steps, in order to let the learning rate at the end of training process 3*10^-8.


11. Q: How do I “reload” DP-GEN? I have changed the environment variable in .bashrc , but when running `which dpgen` , the path still points to the original directory

A: 1. If you have administrative rights to the current python environment, re-use “pip install.” to override the installation.

  1. If you use a cluster environment, and use ‘pip install.--user‘, the corresponding package will be installed in ~/.local.
  2. The reason for the situation you mentioned may be that you use package manager like anaconda but do not have a writing permission. Please check or provide more complete installation information.
  3. If you want to force the original package to be deleted and manually update it, you can type `which dpgen` and enter into the ‘lib‘ folder for manual control.


12. Q: The question is about the CPU or GPU parallel problems.

A: If you train the model, you can use the CPU paralleling and only one GPU. If you run the MD with the trained model, CPU paralleling and GPU paralleling are both supportable.


13. Q: What’s the meaning of the error “nforce” when running the init_bulk.

A: The number of frames read by dpdata from the OUTCAR of VASP_md needs to equal the number of steps you set in INCAR, otherwise it will be considered that the MD does not end normally, so the data of this OUTCAR will be excluded and there is no valid OUTCAR.


14. Q: What is the general software for the visualization of dump?

A: Change the suffix to lammpstrj and drag it directly to VMD; Ovito can also be useful.


15. Q: Why is the energy data provided in the http://www.deepmd.org/database/deeppot-se-data/,very big? it is about 1000 electron volts per atom on average. May I ask how the zero of potential energy here is taken? Why do not directly use the result obtained by vasp calculation?

A: This is supposed to be the result of QE. The only meaningful thing is the change in energy. The absolute energy is different because different software takes different values. For details please visit https://github.com/deepmodeling/dpdata.


16. Q: How does DeePMD-kit interface with LAMMPS and passing energy and atomic forces to LAMMPS?

A: Please refer to following two codes.

The first illustrates how to obtain energy and forces from the deep potentials, and the second serves to interface with LAMMPS.




17. Q: In the relative model deviation formula Ef=|Df|/(|f| + level). Does |f| mean the norm of the <fi> which is taken over the ensemble of models?

A: No, the Efis computed form every of atom in the system, more precisely it should be denoted by Ef_i.  By default, the maximal, minimal and average Ef_i are printed to the out_file:


18. Q: How does DeePMD-kit realize the derivatives of the descriptors with respect to the cartesian coordinates?

A: Please check the implementation in source/lib/include/ComputeDescriptor.h 


19. Q: I hope to simulate some heterogenous reactions. However, it is too slow to simulate when using BOMD. Is the DeepMD model capable of reproducing some reactions based on some reaction-involved trajectories produced by BOMD?

A: Sure. Please try it directly. One of the strong advantages of DeePMD is its ability to describe chemical reactions. If you already have some data, you could try to fit it with our training code


20. Q: We have installed deepmd with the docker project and got the graph.pb file. However, we don’t know how to run MD with LAMMPS in docker?

A: LAMMPS should be installed in docker. Once you have graph.pb file with the frozen model, you can follow the instruction to run MD with LAMMPS.

Reply from frankhan91


21. Q: How can I prepare the “raw” for DeePMD-kit?

A: It is in the GitHub Readme: “… In box.raw, the 9 components of the box vectors should be provided on each line …”


22. Q: I met following error when using DP-GEN: assert(len(fp_pp_files) == len(type_map)), ‘size of fp_pp_files should be the same as the size of type_map’AssertionError: size of fp_pp_files should be the same as the size of type_map. How I can solve it?

A: The size of keyword “fp_pp_files” should be equal to “type_map”.

For example:

“type_map” : [“H”, “C”],

“fp_pp_files”  : [“POTCAR_H”, “POTCAR_C”]

23. Q: Why the press in json in 01.model_deviation when using DP-GEN is always -1?

A: Since you are running NVT ensemble, Press does not work.  So it is always -1. If you use the NPT ensemble, you will see the same as the PARAM Settings


24. Q: If I have trained a model and now I collected some new data. Can I continue the training with the previous model? Or do I need to retrain?

A: Training a model from scratch is recommended.


25. Q: Is it possible to train a dataset with different systems with different atoms? If possible, please kindly advise regarding how to make the file of raw, thanks.

A: Each system is supposed to be stored in a folder. You can provide multiple systems to the key “systems” in the parameter file that accepts a list of data systems.


26. Q: The DeePMD-kit can run normally but it reports error when running the LAMMPS.

A: It is possible that the version is different between training model and running LAMMPS. We hope that you will try to use the same version for both.