LAMMPS (23 Jun 2022 - Update 3)
  using 1 OpenMP thread(s) per MPI task
Lattice spacing in x,y,z = 1.6795962 1.6795962 1.6795962
Created orthogonal box = (0 0 0) to (134.3677 67.183848 67.183848)
  4 by 2 by 2 MPI processor grid
Created 512000 atoms
  using lattice units in orthogonal box = (0 0 0) to (134.3677 67.183848 67.183848)
  create_atoms CPU = 0.008 seconds

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

Your simulation uses code contributions which should be cited:
- GPU package (short-range, long-range and three-body potentials):
The log file lists these citations in BibTeX format.

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE


--------------------------------------------------------------------------
- Using acceleration for lj/cut:
-  with 8 proc(s) per device.
-  Horizontal vector operations: ENABLED
-  Shared memory system: No
--------------------------------------------------------------------------
Device 0: NVIDIA L40, 142 CUs, 41/44 GB, 2.5 GHZ (Mixed Precision)
Device 1: NVIDIA L40, 142 CUs, 2.5 GHZ (Mixed Precision)
--------------------------------------------------------------------------

Initializing Device and compiling on process 0...Done.
Initializing Devices 0-1 on core 0...Done.
Initializing Devices 0-1 on core 1...Done.
Initializing Devices 0-1 on core 2...Done.
Initializing Devices 0-1 on core 3...Done.
Initializing Devices 0-1 on core 4...Done.
Initializing Devices 0-1 on core 5...Done.
Initializing Devices 0-1 on core 6...Done.
Initializing Devices 0-1 on core 7...Done.

Generated 0 of 0 mixed pair_coeff terms from geometric mixing rule
Setting up Verlet run ...
  Unit style    : lj
  Current step  : 0
  Time step     : 0.005
Per MPI rank memory allocation (min/avg/max) = 9.532 | 9.552 | 9.589 Mbytes
   Step          Temp          E_pair         E_mol          TotEng         Press
         0   1.44          -6.7733683      0             -4.6133726     -5.0196713
        10   1.1253136     -6.3000033      0             -4.6120362     -2.560464
Loop time of 0.0240014 on 16 procs for 10 steps with 512000 atoms

Performance: 179989.164 tau/day, 416.642 timesteps/s
99.5% CPU use with 16 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 0.0094606  | 0.011985   | 0.014395   |   1.3 | 49.93
Neigh   | 0          | 0          | 0          |   0.0 |  0.00
Comm    | 0.0056568  | 0.0085354  | 0.010583   |   1.4 | 35.56
Output  | 9.9233e-05 | 0.00086939 | 0.0014893  |   0.0 |  3.62
Modify  | 0.0017854  | 0.0019166  | 0.0021452  |   0.2 |  7.99
Other   |            | 0.0006954  |            |       |  2.90

Nlocal:          32000 ave       32800 max       31200 min
Histogram: 4 0 0 0 0 8 0 0 0 4
Nghost:          19911 ave       20711 max       19111 min
Histogram: 4 0 0 0 0 8 0 0 0 4
Neighs:              0 ave           0 max           0 min
Histogram: 16 0 0 0 0 0 0 0 0 0

Total # of neighbors = 0
Ave neighs/atom = 0
Neighbor list builds = 0
Dangerous builds not checked


---------------------------------------------------------------------
      Device Time Info (average):
---------------------------------------------------------------------
Data Transfer:   0.0025 s.
Neighbor copy:   0.0001 s.
Neighbor build:  0.0002 s.
Force calc:      0.0043 s.
Device Overhead: 0.0069 s.
Average split:   1.0000.
Lanes / atom:    4.
Vector width:    32.
Max Mem / Proc:  27.82 MB.
CPU Neighbor:    0.0099 s.
CPU Cast/Pack:   0.0091 s.
CPU Driver_Time: 0.0003 s.
CPU Idle_Time:   0.0038 s.
---------------------------------------------------------------------


--------------------------------------------------------------------------
- Using acceleration for lj/cut:
-  with 8 proc(s) per device.
-  Horizontal vector operations: ENABLED
-  Shared memory system: No
--------------------------------------------------------------------------
Device 0: NVIDIA L40, 142 CUs, 41/44 GB, 2.5 GHZ (Mixed Precision)
Device 1: NVIDIA L40, 142 CUs, 2.5 GHZ (Mixed Precision)
--------------------------------------------------------------------------

Initializing Device and compiling on process 0...Done.
Initializing Devices 0-1 on core 0...Done.
Initializing Devices 0-1 on core 1...Done.
Initializing Devices 0-1 on core 2...Done.
Initializing Devices 0-1 on core 3...Done.
Initializing Devices 0-1 on core 4...Done.
Initializing Devices 0-1 on core 5...Done.
Initializing Devices 0-1 on core 6...Done.
Initializing Devices 0-1 on core 7...Done.

Generated 0 of 0 mixed pair_coeff terms from geometric mixing rule
Setting up Verlet run ...
  Unit style    : lj
  Current step  : 10
  Time step     : 0.005
Per MPI rank memory allocation (min/avg/max) = 9.538 | 9.558 | 9.578 Mbytes
   Step          Temp          E_pair         E_mol          TotEng         Press
        10   1.1253136     -6.3000033      0             -4.6120362     -2.560464
       800   0.71124955    -5.6878049      0             -4.6209327      0.64033881
Loop time of 1.8159 on 16 procs for 790 steps with 512000 atoms

Performance: 187939.953 tau/day, 435.046 timesteps/s
99.6% CPU use with 16 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 0.91185    | 1.0168     | 1.1917     |   7.4 | 56.00
Neigh   | 1.071e-05  | 1.4111e-05 | 2.0134e-05 |   0.0 |  0.00
Comm    | 0.44571    | 0.62733    | 0.74029    |   9.9 | 34.55
Output  | 0.00013866 | 0.0008704  | 0.0016015  |   0.0 |  0.05
Modify  | 0.11247    | 0.11895    | 0.12982    |   1.5 |  6.55
Other   |            | 0.05191    |            |       |  2.86

Nlocal:          32000 ave       32094 max       31896 min
Histogram: 1 0 1 3 3 2 2 2 1 1
Nghost:        19047.5 ave       19135 max       18904 min
Histogram: 1 1 1 0 2 2 2 1 2 4
Neighs:              0 ave           0 max           0 min
Histogram: 16 0 0 0 0 0 0 0 0 0

Total # of neighbors = 0
Ave neighs/atom = 0
Neighbor list builds = 39
Dangerous builds not checked


---------------------------------------------------------------------
      Device Time Info (average):
---------------------------------------------------------------------
Data Transfer:   0.1505 s.
Neighbor copy:   0.0014 s.
Neighbor build:  0.0239 s.
Force calc:      0.3228 s.
Device Overhead: 0.4906 s.
Average split:   1.0000.
Lanes / atom:    4.
Vector width:    32.
Max Mem / Proc:  25.91 MB.
CPU Neighbor:    0.1181 s.
CPU Cast/Pack:   0.5317 s.
CPU Driver_Time: 0.0135 s.
CPU Idle_Time:   0.3476 s.
---------------------------------------------------------------------

Total wall time: 0:00:06

------------------------------------------------------------
Sender: LSF System <lsfadmin@gpu14>
Subject: Job 326114: <#!/bin/bash;#BSUB -n 16;#BSUB -R "span[hosts=1]";#BSUB -W 5;##BSUB -q new_gpu;#BSUB -q gpu;#BSUB -R "select[l40]";#BSUB -gpu "num=2";#BSUB -o out_l40.%J;#BSUB -e err_l40.%J;. /usr/share/Modules/init/bash;module load lammps/2022Jun23/intel/gpu;mpirun lmp_hazel -sf gpu -in in.intel.lj> in cluster <Hazel> Done

Job <#!/bin/bash;#BSUB -n 16;#BSUB -R "span[hosts=1]";#BSUB -W 5;##BSUB -q new_gpu;#BSUB -q gpu;#BSUB -R "select[l40]";#BSUB -gpu "num=2";#BSUB -o out_l40.%J;#BSUB -e err_l40.%J;. /usr/share/Modules/init/bash;module load lammps/2022Jun23/intel/gpu;mpirun lmp_hazel -sf gpu -in in.intel.lj> was submitted from host <login03> by user <jpcranfo> in cluster <Hazel> at Thu May 23 22:59:10 2024
Job was executed on host(s) <16*gpu14>, in queue <gpu>, as user <jpcranfo> in cluster <Hazel> at Thu May 23 22:59:11 2024
</home/jpcranfo> was used as the home directory.
</home/jpcranfo/share/benchmarks/lammps> was used as the working directory.
Started at Thu May 23 22:59:11 2024
Terminated at Thu May 23 22:59:20 2024
Results reported at Thu May 23 22:59:20 2024

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
#!/bin/bash
#BSUB -n 16
#BSUB -R "span[hosts=1]"
#BSUB -W 5
##BSUB -q new_gpu
#BSUB -q gpu
#BSUB -R "select[l40]"
#BSUB -gpu "num=2"
#BSUB -o out_l40.%J
#BSUB -e err_l40.%J
. /usr/share/Modules/init/bash
module load lammps/2022Jun23/intel/gpu
mpirun lmp_hazel -sf gpu -in in.intel.lj

------------------------------------------------------------

Successfully completed.

Resource usage summary:

    CPU time :                                   112.00 sec.
    Max Memory :                                 2 GB
    Average Memory :                             2.00 GB
    Total Requested Memory :                     -
    Delta Memory :                               -
    Max Swap :                                   -
    Max Processes :                              7
    Max Threads :                                9
    Run time :                                   37 sec.
    Turnaround time :                            10 sec.

The output (if any) is above this job summary.


PS:

Read file <err_l40.326114> for stderr output of this job.