LAMMPS (23 Jun 2022 - Update 3) using 1 OpenMP thread(s) per MPI task Lattice spacing in x,y,z = 1.6795962 1.6795962 1.6795962 Created orthogonal box = (0 0 0) to (134.3677 67.183848 67.183848) 4 by 2 by 2 MPI processor grid Created 512000 atoms using lattice units in orthogonal box = (0 0 0) to (134.3677 67.183848 67.183848) create_atoms CPU = 0.005 seconds CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE Your simulation uses code contributions which should be cited: - GPU package (short-range, long-range and three-body potentials): The log file lists these citations in BibTeX format. CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE -------------------------------------------------------------------------- - Using acceleration for lj/cut: - with 4 proc(s) per device. - Horizontal vector operations: ENABLED - Shared memory system: No -------------------------------------------------------------------------- Device 0: NVIDIA A30, 56 CUs, 23/23 GB, 1.4 GHZ (Mixed Precision) Device 1: NVIDIA A30, 56 CUs, 1.4 GHZ (Mixed Precision) -------------------------------------------------------------------------- Initializing Device and compiling on process 0...Done. Initializing Devices 0-1 on core 0...Done. Initializing Devices 0-1 on core 1...Done. Initializing Devices 0-1 on core 2...Done. Initializing Devices 0-1 on core 3...Done. Generated 0 of 0 mixed pair_coeff terms from geometric mixing rule Setting up Verlet run ... Unit style : lj Current step : 0 Time step : 0.005 Per MPI rank memory allocation (min/avg/max) = 9.532 | 9.552 | 9.589 Mbytes Step Temp E_pair E_mol TotEng Press 0 1.44 -6.7733683 0 -4.6133726 -5.0196713 10 1.1253136 -6.3000033 0 -4.6120362 -2.560464 Loop time of 0.0253179 on 16 procs for 10 steps with 512000 atoms Performance: 170630.095 tau/day, 394.977 timesteps/s 96.5% CPU use with 16 MPI tasks x 1 OpenMP threads MPI task timing breakdown: Section | min time | avg time | max time |%varavg| %total --------------------------------------------------------------- Pair | 0.0086808 | 0.010791 | 0.01325 | 1.3 | 42.62 Neigh | 0 | 0 | 0 | 0.0 | 0.00 Comm | 0.0088365 | 0.01075 | 0.012877 | 1.2 | 42.46 Output | 0.00016458 | 0.00069017 | 0.0011771 | 0.0 | 2.73 Modify | 0.0020755 | 0.0022418 | 0.0024185 | 0.2 | 8.85 Other | | 0.000845 | | | 3.34 Nlocal: 32000 ave 32800 max 31200 min Histogram: 4 0 0 0 0 8 0 0 0 4 Nghost: 19911 ave 20711 max 19111 min Histogram: 4 0 0 0 0 8 0 0 0 4 Neighs: 0 ave 0 max 0 min Histogram: 16 0 0 0 0 0 0 0 0 0 Total # of neighbors = 0 Ave neighs/atom = 0 Neighbor list builds = 0 Dangerous builds not checked --------------------------------------------------------------------- Device Time Info (average): --------------------------------------------------------------------- Data Transfer: 0.0030 s. Neighbor copy: 0.0004 s. Neighbor build: 0.0010 s. Force calc: 0.0069 s. Device Overhead: 0.0095 s. Average split: 1.0000. Lanes / atom: 4. Vector width: 32. Max Mem / Proc: 27.82 MB. CPU Neighbor: 0.0042 s. CPU Cast/Pack: 0.0058 s. CPU Driver_Time: 0.0002 s. CPU Idle_Time: 0.0052 s. --------------------------------------------------------------------- -------------------------------------------------------------------------- - Using acceleration for lj/cut: - with 4 proc(s) per device. - Horizontal vector operations: ENABLED - Shared memory system: No -------------------------------------------------------------------------- Device 0: NVIDIA A30, 56 CUs, 23/23 GB, 1.4 GHZ (Mixed Precision) Device 1: NVIDIA A30, 56 CUs, 1.4 GHZ (Mixed Precision) -------------------------------------------------------------------------- Initializing Device and compiling on process 0...Done. Initializing Devices 0-1 on core 0...Done. Initializing Devices 0-1 on core 1...Done. Initializing Devices 0-1 on core 2...Done. Initializing Devices 0-1 on core 3...Done. Generated 0 of 0 mixed pair_coeff terms from geometric mixing rule Setting up Verlet run ... Unit style : lj Current step : 10 Time step : 0.005 Per MPI rank memory allocation (min/avg/max) = 9.538 | 9.558 | 9.578 Mbytes Step Temp E_pair E_mol TotEng Press 10 1.1253136 -6.3000033 0 -4.6120362 -2.560464 800 0.71124955 -5.6878049 0 -4.6209327 0.64033881 Loop time of 1.95021 on 16 procs for 790 steps with 512000 atoms Performance: 174996.773 tau/day, 405.085 timesteps/s 94.6% CPU use with 16 MPI tasks x 1 OpenMP threads MPI task timing breakdown: Section | min time | avg time | max time |%varavg| %total --------------------------------------------------------------- Pair | 0.7282 | 0.89292 | 0.98862 | 7.1 | 45.79 Neigh | 5.3e-05 | 6.4266e-05 | 7.3575e-05 | 0.0 | 0.00 Comm | 0.71999 | 0.8092 | 0.98051 | 7.5 | 41.49 Output | 0.00018794 | 0.00065662 | 0.0010328 | 0.0 | 0.03 Modify | 0.16626 | 0.17215 | 0.18144 | 1.0 | 8.83 Other | | 0.07522 | | | 3.86 Nlocal: 32000 ave 32094 max 31896 min Histogram: 1 0 1 3 3 2 2 2 1 1 Nghost: 19047.5 ave 19135 max 18904 min Histogram: 1 1 1 0 2 2 2 1 2 4 Neighs: 0 ave 0 max 0 min Histogram: 16 0 0 0 0 0 0 0 0 0 Total # of neighbors = 0 Ave neighs/atom = 0 Neighbor list builds = 39 Dangerous builds not checked --------------------------------------------------------------------- Device Time Info (average): --------------------------------------------------------------------- Data Transfer: 0.2171 s. Neighbor copy: 0.0026 s. Neighbor build: 0.0352 s. Force calc: 0.3598 s. Device Overhead: 0.6766 s. Average split: 1.0000. Lanes / atom: 4. Vector width: 32. Max Mem / Proc: 25.91 MB. CPU Neighbor: 0.1415 s. CPU Cast/Pack: 0.3952 s. CPU Driver_Time: 0.0110 s. CPU Idle_Time: 0.3242 s. --------------------------------------------------------------------- Total wall time: 0:00:04 ------------------------------------------------------------ Sender: LSF System Subject: Job 98388: <#!/bin/bash;#BSUB -n 16;#BSUB -R "span[ptile=8]";#BSUB -W 5;#BSUB -q gpu;#BSUB -R "select[a30]";#BSUB -gpu "num=2";#BSUB -o out.%J;#BSUB -e err.%J;module load lammps/2022Jun23/intel/gpu;mpirun lmp_hazel -sf gpu -in in.intel.lj> in cluster Done Job <#!/bin/bash;#BSUB -n 16;#BSUB -R "span[ptile=8]";#BSUB -W 5;#BSUB -q gpu;#BSUB -R "select[a30]";#BSUB -gpu "num=2";#BSUB -o out.%J;#BSUB -e err.%J;module load lammps/2022Jun23/intel/gpu;mpirun lmp_hazel -sf gpu -in in.intel.lj> was submitted from host by user in cluster at Wed Dec 20 14:00:28 2023 Job was executed on host(s) <8*gpu08>, in queue , as user in cluster at Wed Dec 20 14:00:30 2023 <8*gpu07> was used as the home directory. was used as the working directory. Started at Wed Dec 20 14:00:30 2023 Terminated at Wed Dec 20 14:00:39 2023 Results reported at Wed Dec 20 14:00:39 2023 Your job looked like: ------------------------------------------------------------ # LSBATCH: User input #!/bin/bash #BSUB -n 16 #BSUB -R "span[ptile=8]" #BSUB -W 5 #BSUB -q gpu #BSUB -R "select[a30]" #BSUB -gpu "num=2" #BSUB -o out.%J #BSUB -e err.%J module load lammps/2022Jun23/intel/gpu mpirun lmp_hazel -sf gpu -in in.intel.lj ------------------------------------------------------------ Successfully completed. Resource usage summary: CPU time : 77.00 sec. Max Memory : 1 GB Average Memory : 1.00 GB Total Requested Memory : - Delta Memory : - Max Swap : - Max Processes : 7 Max Threads : 9 Run time : 16 sec. Turnaround time : 11 sec. The output (if any) is above this job summary. PS: Read file for stderr output of this job.