LAMMPS (23 Jun 2022 - Update 3) using 1 OpenMP thread(s) per MPI task Lattice spacing in x,y,z = 1.6795962 1.6795962 1.6795962 Created orthogonal box = (0 0 0) to (134.3677 67.183848 67.183848) 4 by 2 by 2 MPI processor grid Created 512000 atoms using lattice units in orthogonal box = (0 0 0) to (134.3677 67.183848 67.183848) create_atoms CPU = 0.008 seconds CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE Your simulation uses code contributions which should be cited: - GPU package (short-range, long-range and three-body potentials): The log file lists these citations in BibTeX format. CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE -------------------------------------------------------------------------- - Using acceleration for lj/cut: - with 8 proc(s) per device. - Horizontal vector operations: ENABLED - Shared memory system: No -------------------------------------------------------------------------- Device 0: NVIDIA L40, 142 CUs, 41/44 GB, 2.5 GHZ (Mixed Precision) Device 1: NVIDIA L40, 142 CUs, 2.5 GHZ (Mixed Precision) -------------------------------------------------------------------------- Initializing Device and compiling on process 0...Done. Initializing Devices 0-1 on core 0...Done. Initializing Devices 0-1 on core 1...Done. Initializing Devices 0-1 on core 2...Done. Initializing Devices 0-1 on core 3...Done. Initializing Devices 0-1 on core 4...Done. Initializing Devices 0-1 on core 5...Done. Initializing Devices 0-1 on core 6...Done. Initializing Devices 0-1 on core 7...Done. Generated 0 of 0 mixed pair_coeff terms from geometric mixing rule Setting up Verlet run ... Unit style : lj Current step : 0 Time step : 0.005 Per MPI rank memory allocation (min/avg/max) = 9.532 | 9.552 | 9.589 Mbytes Step Temp E_pair E_mol TotEng Press 0 1.44 -6.7733683 0 -4.6133726 -5.0196713 10 1.1253136 -6.3000033 0 -4.6120362 -2.560464 Loop time of 0.0240014 on 16 procs for 10 steps with 512000 atoms Performance: 179989.164 tau/day, 416.642 timesteps/s 99.5% CPU use with 16 MPI tasks x 1 OpenMP threads MPI task timing breakdown: Section | min time | avg time | max time |%varavg| %total --------------------------------------------------------------- Pair | 0.0094606 | 0.011985 | 0.014395 | 1.3 | 49.93 Neigh | 0 | 0 | 0 | 0.0 | 0.00 Comm | 0.0056568 | 0.0085354 | 0.010583 | 1.4 | 35.56 Output | 9.9233e-05 | 0.00086939 | 0.0014893 | 0.0 | 3.62 Modify | 0.0017854 | 0.0019166 | 0.0021452 | 0.2 | 7.99 Other | | 0.0006954 | | | 2.90 Nlocal: 32000 ave 32800 max 31200 min Histogram: 4 0 0 0 0 8 0 0 0 4 Nghost: 19911 ave 20711 max 19111 min Histogram: 4 0 0 0 0 8 0 0 0 4 Neighs: 0 ave 0 max 0 min Histogram: 16 0 0 0 0 0 0 0 0 0 Total # of neighbors = 0 Ave neighs/atom = 0 Neighbor list builds = 0 Dangerous builds not checked --------------------------------------------------------------------- Device Time Info (average): --------------------------------------------------------------------- Data Transfer: 0.0025 s. Neighbor copy: 0.0001 s. Neighbor build: 0.0002 s. Force calc: 0.0043 s. Device Overhead: 0.0069 s. Average split: 1.0000. Lanes / atom: 4. Vector width: 32. Max Mem / Proc: 27.82 MB. CPU Neighbor: 0.0099 s. CPU Cast/Pack: 0.0091 s. CPU Driver_Time: 0.0003 s. CPU Idle_Time: 0.0038 s. --------------------------------------------------------------------- -------------------------------------------------------------------------- - Using acceleration for lj/cut: - with 8 proc(s) per device. - Horizontal vector operations: ENABLED - Shared memory system: No -------------------------------------------------------------------------- Device 0: NVIDIA L40, 142 CUs, 41/44 GB, 2.5 GHZ (Mixed Precision) Device 1: NVIDIA L40, 142 CUs, 2.5 GHZ (Mixed Precision) -------------------------------------------------------------------------- Initializing Device and compiling on process 0...Done. Initializing Devices 0-1 on core 0...Done. Initializing Devices 0-1 on core 1...Done. Initializing Devices 0-1 on core 2...Done. Initializing Devices 0-1 on core 3...Done. Initializing Devices 0-1 on core 4...Done. Initializing Devices 0-1 on core 5...Done. Initializing Devices 0-1 on core 6...Done. Initializing Devices 0-1 on core 7...Done. Generated 0 of 0 mixed pair_coeff terms from geometric mixing rule Setting up Verlet run ... Unit style : lj Current step : 10 Time step : 0.005 Per MPI rank memory allocation (min/avg/max) = 9.538 | 9.558 | 9.578 Mbytes Step Temp E_pair E_mol TotEng Press 10 1.1253136 -6.3000033 0 -4.6120362 -2.560464 800 0.71124955 -5.6878049 0 -4.6209327 0.64033881 Loop time of 1.8159 on 16 procs for 790 steps with 512000 atoms Performance: 187939.953 tau/day, 435.046 timesteps/s 99.6% CPU use with 16 MPI tasks x 1 OpenMP threads MPI task timing breakdown: Section | min time | avg time | max time |%varavg| %total --------------------------------------------------------------- Pair | 0.91185 | 1.0168 | 1.1917 | 7.4 | 56.00 Neigh | 1.071e-05 | 1.4111e-05 | 2.0134e-05 | 0.0 | 0.00 Comm | 0.44571 | 0.62733 | 0.74029 | 9.9 | 34.55 Output | 0.00013866 | 0.0008704 | 0.0016015 | 0.0 | 0.05 Modify | 0.11247 | 0.11895 | 0.12982 | 1.5 | 6.55 Other | | 0.05191 | | | 2.86 Nlocal: 32000 ave 32094 max 31896 min Histogram: 1 0 1 3 3 2 2 2 1 1 Nghost: 19047.5 ave 19135 max 18904 min Histogram: 1 1 1 0 2 2 2 1 2 4 Neighs: 0 ave 0 max 0 min Histogram: 16 0 0 0 0 0 0 0 0 0 Total # of neighbors = 0 Ave neighs/atom = 0 Neighbor list builds = 39 Dangerous builds not checked --------------------------------------------------------------------- Device Time Info (average): --------------------------------------------------------------------- Data Transfer: 0.1505 s. Neighbor copy: 0.0014 s. Neighbor build: 0.0239 s. Force calc: 0.3228 s. Device Overhead: 0.4906 s. Average split: 1.0000. Lanes / atom: 4. Vector width: 32. Max Mem / Proc: 25.91 MB. CPU Neighbor: 0.1181 s. CPU Cast/Pack: 0.5317 s. CPU Driver_Time: 0.0135 s. CPU Idle_Time: 0.3476 s. --------------------------------------------------------------------- Total wall time: 0:00:06 ------------------------------------------------------------ Sender: LSF System Subject: Job 326114: <#!/bin/bash;#BSUB -n 16;#BSUB -R "span[hosts=1]";#BSUB -W 5;##BSUB -q new_gpu;#BSUB -q gpu;#BSUB -R "select[l40]";#BSUB -gpu "num=2";#BSUB -o out_l40.%J;#BSUB -e err_l40.%J;. /usr/share/Modules/init/bash;module load lammps/2022Jun23/intel/gpu;mpirun lmp_hazel -sf gpu -in in.intel.lj> in cluster Done Job <#!/bin/bash;#BSUB -n 16;#BSUB -R "span[hosts=1]";#BSUB -W 5;##BSUB -q new_gpu;#BSUB -q gpu;#BSUB -R "select[l40]";#BSUB -gpu "num=2";#BSUB -o out_l40.%J;#BSUB -e err_l40.%J;. /usr/share/Modules/init/bash;module load lammps/2022Jun23/intel/gpu;mpirun lmp_hazel -sf gpu -in in.intel.lj> was submitted from host by user in cluster at Thu May 23 22:59:10 2024 Job was executed on host(s) <16*gpu14>, in queue , as user in cluster at Thu May 23 22:59:11 2024 was used as the home directory. was used as the working directory. Started at Thu May 23 22:59:11 2024 Terminated at Thu May 23 22:59:20 2024 Results reported at Thu May 23 22:59:20 2024 Your job looked like: ------------------------------------------------------------ # LSBATCH: User input #!/bin/bash #BSUB -n 16 #BSUB -R "span[hosts=1]" #BSUB -W 5 ##BSUB -q new_gpu #BSUB -q gpu #BSUB -R "select[l40]" #BSUB -gpu "num=2" #BSUB -o out_l40.%J #BSUB -e err_l40.%J . /usr/share/Modules/init/bash module load lammps/2022Jun23/intel/gpu mpirun lmp_hazel -sf gpu -in in.intel.lj ------------------------------------------------------------ Successfully completed. Resource usage summary: CPU time : 112.00 sec. Max Memory : 2 GB Average Memory : 2.00 GB Total Requested Memory : - Delta Memory : - Max Swap : - Max Processes : 7 Max Threads : 9 Run time : 37 sec. Turnaround time : 10 sec. The output (if any) is above this job summary. PS: Read file for stderr output of this job.