justIN           Dashboard       Workflows       Jobs       AWT       Sites       Storages       Docs       Login

Jobsub ID 14068.4@dunegpschedd02.fnal.gov

Jobsub ID14068.4@dunegpschedd02.fnal.gov
Workflow ID253
Stage ID1
User nameichong@fnal.gov
HTCondor Groupgroup_dune
RequestedProcessors1
GPUNo
RSS bytes4194304000 (4000 MiB)
Wall seconds limit80000 (22 hours)
Submitted time2025-08-02 23:05:47
SiteNL_NIKHEF
EntryVIRGO_NL_NIKHEF_klomp
Last heartbeat2025-08-02 23:11:02
From worker nodeHostnamewn-pep-004.farm.nikhef.nl
cpuinfoIntel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
OS releaseScientific Linux release 7.9 (Nitrogen)
Processors1
RSS bytes4194304000 (4000 MiB)
Wall seconds limit129600 (36 hours)
GPU
Inner Apptainer?True
Job statejobscript_error
Started2025-08-02 23:06:31
Input filesfardet-hd:atmnu_max_weighted_randompolicy_dune10kt_1x2x6_74505925_780_20231202T162805Z_gen_g4_detsim_hitreco__20240508T045723Z_reco2.root
JobscriptExit code1
Real time0m (0s)
CPU time0m (0s = 0%)
Max RSS bytes0 (0 MiB)
Outputting started 
Output files
Finished2025-08-02 23:11:02
Saved logsjustin-logs:14068.4-dunegpschedd02.fnal.gov.logs.tgz
List job events     Cached HTCondor job logs

Jobscript log (last 10,000 characters)

78093 at 03-Aug-2025 01:10:45 CEST
Analysing.

Warning: there were 14 reconstructed PFParticle clusters; only the first 10 being stored in tree
Warning: there was no track found for track-like PFParticle with ID 14
Begin processing the 94th record. run: 74505925 subRun: 1 event: 78094 at 03-Aug-2025 01:10:46 CEST
Analysing.

Warning: there was no track found for track-like PFParticle with ID 1
Begin processing the 95th record. run: 74505925 subRun: 1 event: 78095 at 03-Aug-2025 01:10:46 CEST
Analysing.

Warning: there was no track found for track-like PFParticle with ID 2
Begin processing the 96th record. run: 74505925 subRun: 1 event: 78096 at 03-Aug-2025 01:10:46 CEST
Analysing.

Warning: there was no track found for track-like PFParticle with ID 2
Begin processing the 97th record. run: 74505925 subRun: 1 event: 78097 at 03-Aug-2025 01:10:46 CEST
Analysing.

Warning: there was no track found for track-like PFParticle with ID 3
Begin processing the 98th record. run: 74505925 subRun: 1 event: 78098 at 03-Aug-2025 01:10:46 CEST
Analysing.

Warning: there were 12 reconstructed PFParticle clusters; only the first 10 being stored in tree
Warning: there was no track found for track-like PFParticle with ID 4
Begin processing the 99th record. run: 74505925 subRun: 1 event: 78099 at 03-Aug-2025 01:10:46 CEST
Analysing.

Warning: there was no track found for track-like PFParticle with ID 3
Begin processing the 100th record. run: 74505925 subRun: 1 event: 78100 at 03-Aug-2025 01:10:46 CEST
Analysing.

Warning: there was no track found for track-like PFParticle with ID 8
03-Aug-2025 01:10:47 CEST  Closed input file "root://otter12.grid.surfsara.nl:21094/pnfs/grid.sara.nl/data/dune/disk/RSE/fardet-hd/69/c6/atmnu_max_weighted_randompolicy_dune10kt_1x2x6_74505925_780_20231202T162805Z_gen_g4_detsim_hitreco__20240508T045723Z_reco2.root"

========================================================================================================================
TimeTracker printout (sec)                Min           Avg           Max         Median          RMS         nEvts   
========================================================================================================================
Full event                             0.0102035     0.0544847     0.747147      0.0217726     0.108016        100    
------------------------------------------------------------------------------------------------------------------------
source:RootInput(read)                0.000514367    0.0009436    0.00137777    0.000926918   0.000196338      100    
end_path:analysistree:AnalysisTree    0.00938894     0.053435      0.746177      0.0207697     0.108002        100    
========================================================================================================================

====================================================================================================
MemoryTracker summary (base-10 MB units used)

  Peak virtual memory usage (VmPeak)  : 7395.96 MB
  Peak resident set size usage (VmHWM): 1042.48 MB
====================================================================================================
Art has completed and will exit with status 0.
=== End last 100 lines of third lar log file ===
=== Start last 100 lines of lar log file ===
* Association Name:  spShowerAssociationsbase * Instance Name:  * Type: art::Assns<recob::Shower, recob::SpacePoint, void>*         *
*************************************************************************************************************************************
03-Aug-2025 01:07:26 CEST  Initiating request to open input file "root://otter12.grid.surfsara.nl:21094/pnfs/grid.sara.nl/data/dune/disk/RSE/fardet-hd/69/c6/atmnu_max_weighted_randompolicy_dune10kt_1x2x6_74505925_780_20231202T162805Z_gen_g4_detsim_hitreco__20240508T045723Z_reco2.root"
03-Aug-2025 01:08:15 CEST  Opened input file "root://otter12.grid.surfsara.nl:21094/pnfs/grid.sara.nl/data/dune/disk/RSE/fardet-hd/69/c6/atmnu_max_weighted_randompolicy_dune10kt_1x2x6_74505925_780_20231202T162805Z_gen_g4_detsim_hitreco__20240508T045723Z_reco2.root"
Begin processing the 1st record. run: 74505925 subRun: 1 event: 78001 at 03-Aug-2025 01:08:16 CEST
0 X, 0 U, 0 V bad channels
Finding XUV coincidences...
C:0 T:2 8 XUs and 5 XVs -> 4 XUVs
C:0 T:6 16 XUs and 22 XVs -> 12 XUVs
C:0 T:7 61 XUs and 41 XVs -> 21 XUVs
C:0 T:11 5937 XUs and 16734 XVs -> 1770 XUVs
C:0 T:12 4 XUs and 4 XVs -> 3 XUVs
C:0 T:13 4 XUs and 4 XVs -> 3 XUVs
1813 XUVs total
470 collection wire objects
1813 potential space points
Neighbour search...
201755 tests to find 73240 neighbours
Iterating with no regularization...
Begin: 5.28301e+08
0 3.22339e+08
1 2.98629e+08
2 2.95879e+08
3 2.95125e+08
4 2.94804e+08
5 2.94641e+08
Now with regularization...
Begin: 2.57717e+08
0 2.57202e+08
1 2.57011e+08
---MC-PARTICLE-MONITORING-----------------------------------------------------------------------

BeamNeutrinos: 

--Primary 0, MCPDG -11, Energy 2.32269, Dist. 56.7617, nMCHits 1787 (406, 871, 510)
MCPDG -11, Energy 2.32269, Dist. 56.7617, nMCHits 1787 (406, 871, 510)
------------------------------------------------------------------------------------------------
Loaded the TorchScript model '/cvmfs/dune.osgstorage.org/pnfs/fnal.gov/usr/dune/persistent/stash//PandoraNetworkData/PandoraNet_Vertex_DUNEFD_HD_Atmos_1_U_v04_03_00.pt'
Loaded the TorchScript model '/cvmfs/dune.osgstorage.org/pnfs/fnal.gov/usr/dune/persistent/stash//PandoraNetworkData/PandoraNet_Vertex_DUNEFD_HD_Atmos_1_V_v04_03_00.pt'
Loaded the TorchScript model '/cvmfs/dune.osgstorage.org/pnfs/fnal.gov/usr/dune/persistent/stash//PandoraNetworkData/PandoraNet_Vertex_DUNEFD_HD_Atmos_1_W_v04_03_00.pt'
Operating in training mode.
The eid is 0
Graph saved to training1_CaloHitListW_graph.data
Size of file training1_CaloHitListW_graph.data is 14036 bytes.
The eid is 0
Graph saved to training1_CaloHitListU_graph.data
Size of file training1_CaloHitListU_graph.data is 12124 bytes.
The eid is 0
Graph saved to training1_CaloHitListV_graph.data
Size of file training1_CaloHitListV_graph.data is 22844 bytes.
Operating in inference mode.
Operating in training mode.
The eid is -1
Graph saved to training2_CaloHitListW_graph.data
Size of file training2_CaloHitListW_graph.data is 14036 bytes.
The eid is -1
Graph saved to training2_CaloHitListU_graph.data
Size of file training2_CaloHitListU_graph.data is 12124 bytes.
The eid is -1
Graph saved to training2_CaloHitListV_graph.data
Size of file training2_CaloHitListV_graph.data is 22844 bytes.
Boundary wire vector sizes: 496, 935, 573
minwire 0: 237
minwire 1: 1237
minwire 2: 178
Used alternate method to get min and max tdcs due to vertex determination failure: 0, 499
Used alternate method to get min and max tdcs due to vertex determination failure: 0, 499
Used alternate method to get min and max tdcs due to vertex determination failure: 0, 499
03-Aug-2025 01:08:19 CEST  Opened output file with pattern "atmnu_max_weighted_randompolicy_dune10kt_1x2x6_74505925_780_20231202T162805Z_gen_g4_detsim_hitreco__20240508T045723Z_reco2_graph_2025-08-02T_230637Z.root"
03-Aug-2025 01:08:28 CEST  Closed input file "root://otter12.grid.surfsara.nl:21094/pnfs/grid.sara.nl/data/dune/disk/RSE/fardet-hd/69/c6/atmnu_max_weighted_randompolicy_dune10kt_1x2x6_74505925_780_20231202T162805Z_gen_g4_detsim_hitreco__20240508T045723Z_reco2.root"
Malformed TimeTracker database.  The TimeEvent table is empty, but
the TimeModule table is not.  This can happen if an exception has
been thrown from a module while processing the first event.  Any
saved database file is suspect and should not be used.

====================================================================================================
MemoryTracker summary (base-10 MB units used)

  Peak virtual memory usage (VmPeak)  : 8586.35 MB
  Peak resident set size usage (VmHWM): 1679.56 MB
====================================================================================================
%MSG-s ArtException:  PostEndJob 03-Aug-2025 01:08:29 CEST ModuleEndJob
---- EventProcessorFailure BEGIN
  EventProcessor: an exception occurred during current event processing
  ---- ScheduleExecutionFailure BEGIN
    Path: ProcessingStopped.
    ---- BadAlloc BEGIN
      A bad_alloc exception was thrown while processing module CVNEvaluator/cvneva run: 74505925 subRun: 1 event: 78001
      The job has probably exhausted the virtual memory available to the process.
    ---- BadAlloc END
    Exception going through path reco
  ---- ScheduleExecutionFailure END
---- EventProcessorFailure END
---- FatalRootError BEGIN
  Fatal Root Error: TTree::SetEntries
  Tree branches have different numbers of entries, eg EventAuxiliary has 0 entries while recob::PCAxisrecob::Showervoidart::Assns_pandoraShower__Reco2. has 100 entries.
  ROOT severity: 2000
---- FatalRootError END
%MSG
Art has completed and will exit with status 1.
=== End last 100 lines of lar log file ===
=== Generated output files ===
14068.4_dunegpschedd02.fnal.gov.logs.tgz
RootOutput-4d43-3cc3-7d4b-30d7.root
ana_tree_hd.root
analysiseid.root
atmnu_max_weighted_randompolicy_dune10kt_1x2x6_74505925_780_20231202T162805Z_gen_g4_detsim_hitreco__20240508T045723Z_reco2_graph_2025-08-02T_230637Z.log
debugprod.log
jobscript.log
reco_hist.root
secondary_atmnu_max_weighted_randompolicy_dune10kt_1x2x6_74505925_780_20231202T162805Z_gen_g4_detsim_hitreco__20240508T045723Z_reco2_graph_2025-08-02T_230637Z.log
third_atmnu_max_weighted_randompolicy_dune10kt_1x2x6_74505925_780_20231202T162805Z_gen_g4_detsim_hitreco__20240508T045723Z_reco2_graph_2025-08-02T_230637Z.log
training1_CaloHitListU.csv
training1_CaloHitListU_graph.data
training1_CaloHitListV.csv
training1_CaloHitListV_graph.data
training1_CaloHitListW.csv
training1_CaloHitListW_graph.data
training2_CaloHitListU.csv
training2_CaloHitListU_graph.data
training2_CaloHitListV.csv
training2_CaloHitListV_graph.data
training2_CaloHitListW.csv
training2_CaloHitListW_graph.data
justIN time: 2025-08-04 14:18:42 UTC       justIN version: 01.04.00