LAMMPS (23 Jun 2022 - Update 3)
  using 1 OpenMP thread(s) per MPI task
Lattice spacing in x,y,z = 1.6795962 1.6795962 1.6795962
Created orthogonal box = (0 0 0) to (134.3677 67.183848 67.183848)
  4 by 2 by 2 MPI processor grid
Created 512000 atoms
  using lattice units in orthogonal box = (0 0 0) to (134.3677 67.183848 67.183848)
  create_atoms CPU = 0.005 seconds

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

Your simulation uses code contributions which should be cited:
- GPU package (short-range, long-range and three-body potentials):
The log file lists these citations in BibTeX format.

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE


--------------------------------------------------------------------------
- Using acceleration for lj/cut:
-  with 4 proc(s) per device.
-  Horizontal vector operations: ENABLED
-  Shared memory system: No
--------------------------------------------------------------------------
Device 0: NVIDIA A30, 56 CUs, 23/23 GB, 1.4 GHZ (Mixed Precision)
Device 1: NVIDIA A30, 56 CUs, 1.4 GHZ (Mixed Precision)
--------------------------------------------------------------------------

Initializing Device and compiling on process 0...Done.
Initializing Devices 0-1 on core 0...Done.
Initializing Devices 0-1 on core 1...Done.
Initializing Devices 0-1 on core 2...Done.
Initializing Devices 0-1 on core 3...Done.

Generated 0 of 0 mixed pair_coeff terms from geometric mixing rule
Setting up Verlet run ...
  Unit style    : lj
  Current step  : 0
  Time step     : 0.005
Per MPI rank memory allocation (min/avg/max) = 9.532 | 9.552 | 9.589 Mbytes
   Step          Temp          E_pair         E_mol          TotEng         Press     
         0   1.44          -6.7733683      0             -4.6133726     -5.0196713    
        10   1.1253136     -6.3000033      0             -4.6120362     -2.560464     
Loop time of 0.0253179 on 16 procs for 10 steps with 512000 atoms

Performance: 170630.095 tau/day, 394.977 timesteps/s
96.5% CPU use with 16 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 0.0086808  | 0.010791   | 0.01325    |   1.3 | 42.62
Neigh   | 0          | 0          | 0          |   0.0 |  0.00
Comm    | 0.0088365  | 0.01075    | 0.012877   |   1.2 | 42.46
Output  | 0.00016458 | 0.00069017 | 0.0011771  |   0.0 |  2.73
Modify  | 0.0020755  | 0.0022418  | 0.0024185  |   0.2 |  8.85
Other   |            | 0.000845   |            |       |  3.34

Nlocal:          32000 ave       32800 max       31200 min
Histogram: 4 0 0 0 0 8 0 0 0 4
Nghost:          19911 ave       20711 max       19111 min
Histogram: 4 0 0 0 0 8 0 0 0 4
Neighs:              0 ave           0 max           0 min
Histogram: 16 0 0 0 0 0 0 0 0 0

Total # of neighbors = 0
Ave neighs/atom = 0
Neighbor list builds = 0
Dangerous builds not checked


---------------------------------------------------------------------
      Device Time Info (average): 
---------------------------------------------------------------------
Data Transfer:   0.0030 s.
Neighbor copy:   0.0004 s.
Neighbor build:  0.0010 s.
Force calc:      0.0069 s.
Device Overhead: 0.0095 s.
Average split:   1.0000.
Lanes / atom:    4.
Vector width:    32.
Max Mem / Proc:  27.82 MB.
CPU Neighbor:    0.0042 s.
CPU Cast/Pack:   0.0058 s.
CPU Driver_Time: 0.0002 s.
CPU Idle_Time:   0.0052 s.
---------------------------------------------------------------------


--------------------------------------------------------------------------
- Using acceleration for lj/cut:
-  with 4 proc(s) per device.
-  Horizontal vector operations: ENABLED
-  Shared memory system: No
--------------------------------------------------------------------------
Device 0: NVIDIA A30, 56 CUs, 23/23 GB, 1.4 GHZ (Mixed Precision)
Device 1: NVIDIA A30, 56 CUs, 1.4 GHZ (Mixed Precision)
--------------------------------------------------------------------------

Initializing Device and compiling on process 0...Done.
Initializing Devices 0-1 on core 0...Done.
Initializing Devices 0-1 on core 1...Done.
Initializing Devices 0-1 on core 2...Done.
Initializing Devices 0-1 on core 3...Done.

Generated 0 of 0 mixed pair_coeff terms from geometric mixing rule
Setting up Verlet run ...
  Unit style    : lj
  Current step  : 10
  Time step     : 0.005
Per MPI rank memory allocation (min/avg/max) = 9.538 | 9.558 | 9.578 Mbytes
   Step          Temp          E_pair         E_mol          TotEng         Press     
        10   1.1253136     -6.3000033      0             -4.6120362     -2.560464     
       800   0.71124955    -5.6878049      0             -4.6209327      0.64033881   
Loop time of 1.95021 on 16 procs for 790 steps with 512000 atoms

Performance: 174996.773 tau/day, 405.085 timesteps/s
94.6% CPU use with 16 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 0.7282     | 0.89292    | 0.98862    |   7.1 | 45.79
Neigh   | 5.3e-05    | 6.4266e-05 | 7.3575e-05 |   0.0 |  0.00
Comm    | 0.71999    | 0.8092     | 0.98051    |   7.5 | 41.49
Output  | 0.00018794 | 0.00065662 | 0.0010328  |   0.0 |  0.03
Modify  | 0.16626    | 0.17215    | 0.18144    |   1.0 |  8.83
Other   |            | 0.07522    |            |       |  3.86

Nlocal:          32000 ave       32094 max       31896 min
Histogram: 1 0 1 3 3 2 2 2 1 1
Nghost:        19047.5 ave       19135 max       18904 min
Histogram: 1 1 1 0 2 2 2 1 2 4
Neighs:              0 ave           0 max           0 min
Histogram: 16 0 0 0 0 0 0 0 0 0

Total # of neighbors = 0
Ave neighs/atom = 0
Neighbor list builds = 39
Dangerous builds not checked


---------------------------------------------------------------------
      Device Time Info (average): 
---------------------------------------------------------------------
Data Transfer:   0.2171 s.
Neighbor copy:   0.0026 s.
Neighbor build:  0.0352 s.
Force calc:      0.3598 s.
Device Overhead: 0.6766 s.
Average split:   1.0000.
Lanes / atom:    4.
Vector width:    32.
Max Mem / Proc:  25.91 MB.
CPU Neighbor:    0.1415 s.
CPU Cast/Pack:   0.3952 s.
CPU Driver_Time: 0.0110 s.
CPU Idle_Time:   0.3242 s.
---------------------------------------------------------------------

Total wall time: 0:00:04

------------------------------------------------------------
Sender: LSF System <lsfadmin@gpu08>
Subject: Job 98388: <#!/bin/bash;#BSUB -n 16;#BSUB -R "span[ptile=8]";#BSUB -W 5;#BSUB -q gpu;#BSUB -R "select[a30]";#BSUB -gpu "num=2";#BSUB -o out.%J;#BSUB -e err.%J;module load lammps/2022Jun23/intel/gpu;mpirun lmp_hazel -sf gpu -in in.intel.lj> in cluster <Hazel> Done

Job <#!/bin/bash;#BSUB -n 16;#BSUB -R "span[ptile=8]";#BSUB -W 5;#BSUB -q gpu;#BSUB -R "select[a30]";#BSUB -gpu "num=2";#BSUB -o out.%J;#BSUB -e err.%J;module load lammps/2022Jun23/intel/gpu;mpirun lmp_hazel -sf gpu -in in.intel.lj> was submitted from host <login03> by user <dshah8> in cluster <Hazel> at Wed Dec 20 14:00:28 2023
Job was executed on host(s) <8*gpu08>, in queue <gpu>, as user <dshah8> in cluster <Hazel> at Wed Dec 20 14:00:30 2023
                            <8*gpu07>
</home/dshah8> was used as the home directory.
</share/hpc-support/dshah8/lammps> was used as the working directory.
Started at Wed Dec 20 14:00:30 2023
Terminated at Wed Dec 20 14:00:39 2023
Results reported at Wed Dec 20 14:00:39 2023

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
#!/bin/bash
#BSUB -n 16
#BSUB -R "span[ptile=8]"
#BSUB -W 5
#BSUB -q gpu
#BSUB -R "select[a30]"
#BSUB -gpu "num=2"
#BSUB -o out.%J
#BSUB -e err.%J
module load lammps/2022Jun23/intel/gpu
mpirun lmp_hazel -sf gpu -in in.intel.lj

------------------------------------------------------------

Successfully completed.

Resource usage summary:

    CPU time :                                   77.00 sec.
    Max Memory :                                 1 GB
    Average Memory :                             1.00 GB
    Total Requested Memory :                     -
    Delta Memory :                               -
    Max Swap :                                   -
    Max Processes :                              7
    Max Threads :                                9
    Run time :                                   16 sec.
    Turnaround time :                            11 sec.

The output (if any) is above this job summary.


PS:

Read file <err.98388> for stderr output of this job.