justIN           Dashboard       Workflows       Jobs       AWT       Sites       Storages       Docs       Login

Jobsub ID 20209.0@dunegpschedd01.fnal.gov

Jobsub ID20209.0@dunegpschedd01.fnal.gov
Workflow ID221
Stage ID1
User nameichong@fnal.gov
HTCondor Groupgroup_dune
RequestedProcessors1
GPUNo
RSS bytes4194304000 (4000 MiB)
Wall seconds limit80000 (22 hours)
Submitted time2025-08-01 14:12:08
SiteNL_NIKHEF
EntryVIRGO_NL_NIKHEF_brug
Last heartbeat2025-08-01 14:22:32
From worker nodeHostnamewn-pep-007.farm.nikhef.nl
cpuinfoIntel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
OS releaseScientific Linux release 7.9 (Nitrogen)
Processors1
RSS bytes4194304000 (4000 MiB)
Wall seconds limit129600 (36 hours)
GPU
Inner Apptainer?True
Job statejobscript_error
Started2025-08-01 14:12:31
Input filesfardet-hd:atmnu_max_weighted_randompolicy_dune10kt_1x2x6_6452759_211_20231207T071816Z_gen_g4_detsim_hitreco__20240510T062815Z_reco2.root
JobscriptExit code1
Real time0m (0s)
CPU time0m (0s = 0%)
Max RSS bytes0 (0 MiB)
Outputting started 
Output files
Finished2025-08-01 14:22:27
Saved logsjustin-logs:20209.0-dunegpschedd01.fnal.gov.logs.tgz
List job events     Cached HTCondor job logs

Jobscript log (last 10,000 characters)

the 91st record. run: 6452759 subRun: 1 event: 21191 at 01-Aug-2025 16:21:58 CEST
Analysing.

Begin processing the 92nd record. run: 6452759 subRun: 1 event: 21192 at 01-Aug-2025 16:21:59 CEST
Analysing.

Warning: there was no track found for track-like PFParticle with ID 1
Begin processing the 93rd record. run: 6452759 subRun: 1 event: 21193 at 01-Aug-2025 16:22:00 CEST
Analysing.

Warning: there was no track found for track-like PFParticle with ID 4
Begin processing the 94th record. run: 6452759 subRun: 1 event: 21194 at 01-Aug-2025 16:22:01 CEST
Analysing.

Warning: there was no track found for track-like PFParticle with ID 8
Begin processing the 95th record. run: 6452759 subRun: 1 event: 21195 at 01-Aug-2025 16:22:03 CEST
Analysing.

Warning: there was no track found for track-like PFParticle with ID 2
Begin processing the 96th record. run: 6452759 subRun: 1 event: 21196 at 01-Aug-2025 16:22:04 CEST
Analysing.

Warning: there were 13 reconstructed PFParticle daughters; only the first 10 being stored in tree
Warning: there was no track found for track-like PFParticle with ID 15
%MSG-e AnalysisTree:limits:  AnalysisTree:analysistree@BeginModule  01-Aug-2025 16:22:06 CEST run: 6452759 subRun: 1 event: 21196
the pandoraTrack track #0 has 3153 hits on calorimetry plane #2, only 2000 stored in tree
%MSG
%MSG-e AnalysisTree:limits:  AnalysisTree:analysistree@BeginModule  01-Aug-2025 16:22:06 CEST run: 6452759 subRun: 1 event: 21196
the pandoraTrack track #0 has 7842 hits on calorimetry plane #1, only 2000 stored in tree
%MSG
%MSG-e AnalysisTree:limits:  AnalysisTree:analysistree@BeginModule  01-Aug-2025 16:22:06 CEST run: 6452759 subRun: 1 event: 21196
the pandoraTrack track #0 has 5711 hits on calorimetry plane #0, only 2000 stored in tree
%MSG
Begin processing the 97th record. run: 6452759 subRun: 1 event: 21197 at 01-Aug-2025 16:22:07 CEST
Analysing.

Warning: there was no track found for track-like PFParticle with ID 2
Begin processing the 98th record. run: 6452759 subRun: 1 event: 21198 at 01-Aug-2025 16:22:08 CEST
Analysing.

Warning: there was no track found for track-like PFParticle with ID 7
Begin processing the 99th record. run: 6452759 subRun: 1 event: 21199 at 01-Aug-2025 16:22:09 CEST
Analysing.

Warning: there was no track found for track-like PFParticle with ID 1
Begin processing the 100th record. run: 6452759 subRun: 1 event: 21200 at 01-Aug-2025 16:22:10 CEST
Analysing.

Warning: there was no track found for track-like PFParticle with ID 1
01-Aug-2025 16:22:11 CEST  Closed input file "root://se1.farm.particle.cz:1094//dune/RSE/fardet-hd/2f/c8/atmnu_max_weighted_randompolicy_dune10kt_1x2x6_6452759_211_20231207T071816Z_gen_g4_detsim_hitreco__20240510T062815Z_reco2.root"

========================================================================================================================
TimeTracker printout (sec)                Min           Avg           Max         Median          RMS         nEvts   
========================================================================================================================
Full event                             0.444347      0.791564       1.45976      0.820127      0.157082        100    
------------------------------------------------------------------------------------------------------------------------
source:RootInput(read)                 0.0144858     0.0216776     0.0296994     0.0150781    0.00712998       100    
end_path:analysistree:AnalysisTree     0.429573      0.769755       1.44488      0.797612      0.157374        100    
========================================================================================================================

====================================================================================================
MemoryTracker summary (base-10 MB units used)

  Peak virtual memory usage (VmPeak)  : 7350.67 MB
  Peak resident set size usage (VmHWM): 998.146 MB
====================================================================================================
Art has completed and will exit with status 0.
=== End last 100 lines of third lar log file ===
=== Start last 100 lines of lar log file ===
9 2.36953e+07
10 2.36746e+07
Now with regularization...
Begin: 8.92595e+06
0 8.38586e+06
1 8.09279e+06
2 7.8737e+06
3 7.71372e+06
4 7.59218e+06
5 7.49977e+06
6 7.4236e+06
7 7.36248e+06
8 7.31038e+06
9 7.26782e+06
10 7.2349e+06
11 7.20588e+06
12 7.18228e+06
13 7.16354e+06
14 7.14761e+06
15 7.13359e+06
16 7.12107e+06
17 7.11011e+06
18 7.10102e+06
19 7.09315e+06
20 7.08625e+06
---MC-PARTICLE-MONITORING-----------------------------------------------------------------------

BeamNeutrinos: 

--Primary 0, MCPDG -11, Energy 1.39196, Dist. 25.1368, nMCHits 1664 (442, 646, 576)
MCPDG -11, Energy 1.39196, Dist. 25.1368, nMCHits 1664 (442, 646, 576)

--Primary 1, MCPDG -211, Energy 0.857537, Dist. 176.623, nMCHits 856 (195, 346, 315)
MCPDG -211, Energy 0.857537, Dist. 176.623, nMCHits 845 (190, 342, 313)
\_ MCPDG 2212, Energy 0.997541, Dist. 3.11248, nMCHits 11 (5, 4, 2)
------------------------------------------------------------------------------------------------
Loaded the TorchScript model '/cvmfs/dune.osgstorage.org/pnfs/fnal.gov/usr/dune/persistent/stash//PandoraNetworkData/PandoraNet_Vertex_DUNEFD_HD_Atmos_1_U_v04_03_00.pt'
Loaded the TorchScript model '/cvmfs/dune.osgstorage.org/pnfs/fnal.gov/usr/dune/persistent/stash//PandoraNetworkData/PandoraNet_Vertex_DUNEFD_HD_Atmos_1_V_v04_03_00.pt'
Loaded the TorchScript model '/cvmfs/dune.osgstorage.org/pnfs/fnal.gov/usr/dune/persistent/stash//PandoraNetworkData/PandoraNet_Vertex_DUNEFD_HD_Atmos_1_W_v04_03_00.pt'
Operating in training mode.
The eid is 0
Graph saved to training1_CaloHitListW_graph.data
Size of file training1_CaloHitListW_graph.data is 23148 bytes.
The eid is 0
Graph saved to training1_CaloHitListU_graph.data
Size of file training1_CaloHitListU_graph.data is 17452 bytes.
The eid is 0
Graph saved to training1_CaloHitListV_graph.data
Size of file training1_CaloHitListV_graph.data is 25932 bytes.
Operating in inference mode.
Operating in training mode.
The eid is -1
Graph saved to training2_CaloHitListW_graph.data
Size of file training2_CaloHitListW_graph.data is 23148 bytes.
The eid is -1
Graph saved to training2_CaloHitListU_graph.data
Size of file training2_CaloHitListU_graph.data is 17452 bytes.
The eid is -1
Graph saved to training2_CaloHitListV_graph.data
Size of file training2_CaloHitListV_graph.data is 25932 bytes.
Boundary wire vector sizes: 720, 1070, 956
minwire 0: 1586
minwire 1: 817
minwire 2: 1528
Used alternate method to get min and max tdcs due to vertex determination failure: 0, 499
Used alternate method to get min and max wires due to vertex determination failure: 2379, 2878
Used alternate method to get min and max tdcs due to vertex determination failure: 0, 499
Used alternate method to get min and max tdcs due to vertex determination failure: 0, 499
01-Aug-2025 16:13:21 CEST  Opened output file with pattern "atmnu_max_weighted_randompolicy_dune10kt_1x2x6_6452759_211_20231207T071816Z_gen_g4_detsim_hitreco__20240510T062815Z_reco2_graph_2025-08-01T_141235Z.root"
01-Aug-2025 16:19:42 CEST  Closed input file "root://se1.farm.particle.cz:1094//dune/RSE/fardet-hd/2f/c8/atmnu_max_weighted_randompolicy_dune10kt_1x2x6_6452759_211_20231207T071816Z_gen_g4_detsim_hitreco__20240510T062815Z_reco2.root"
Malformed TimeTracker database.  The TimeEvent table is empty, but
the TimeModule table is not.  This can happen if an exception has
been thrown from a module while processing the first event.  Any
saved database file is suspect and should not be used.

====================================================================================================
MemoryTracker summary (base-10 MB units used)

  Peak virtual memory usage (VmPeak)  : 8583.23 MB
  Peak resident set size usage (VmHWM): 1680.82 MB
====================================================================================================
%MSG-s ArtException:  PostEndJob 01-Aug-2025 16:19:42 CEST ModuleEndJob
---- EventProcessorFailure BEGIN
  EventProcessor: an exception occurred during current event processing
  ---- ScheduleExecutionFailure BEGIN
    Path: ProcessingStopped.
    ---- BadAlloc BEGIN
      A bad_alloc exception was thrown while processing module CVNEvaluator/cvneva run: 6452759 subRun: 1 event: 21101
      The job has probably exhausted the virtual memory available to the process.
    ---- BadAlloc END
    Exception going through path reco
  ---- ScheduleExecutionFailure END
---- EventProcessorFailure END
---- FatalRootError BEGIN
  Fatal Root Error: TTree::SetEntries
  Tree branches have different numbers of entries, eg EventAuxiliary has 0 entries while recob::PCAxisrecob::Showervoidart::Assns_pandoraShower__Reco2. has 100 entries.
  ROOT severity: 2000
---- FatalRootError END
%MSG
Art has completed and will exit with status 1.
=== End last 100 lines of lar log file ===
=== Generated output files ===
20209.0_dunegpschedd01.fnal.gov.logs.tgz
RootOutput-1be7-5b70-a3b6-c795.root
ana_tree_hd.root
analysiseid.root
atmnu_max_weighted_randompolicy_dune10kt_1x2x6_6452759_211_20231207T071816Z_gen_g4_detsim_hitreco__20240510T062815Z_reco2_graph_2025-08-01T_141235Z.log
debugprod.log
jobscript.log
reco_hist.root
secondary_atmnu_max_weighted_randompolicy_dune10kt_1x2x6_6452759_211_20231207T071816Z_gen_g4_detsim_hitreco__20240510T062815Z_reco2_graph_2025-08-01T_141235Z.log
third_atmnu_max_weighted_randompolicy_dune10kt_1x2x6_6452759_211_20231207T071816Z_gen_g4_detsim_hitreco__20240510T062815Z_reco2_graph_2025-08-01T_141235Z.log
training1_CaloHitListU.csv
training1_CaloHitListU_graph.data
training1_CaloHitListV.csv
training1_CaloHitListV_graph.data
training1_CaloHitListW.csv
training1_CaloHitListW_graph.data
training2_CaloHitListU.csv
training2_CaloHitListU_graph.data
training2_CaloHitListV.csv
training2_CaloHitListV_graph.data
training2_CaloHitListW.csv
training2_CaloHitListW_graph.data
justIN time: 2025-08-04 17:41:55 UTC       justIN version: 01.04.00