justIN           Dashboard       Workflows       Jobs       AWT       Sites       Storages       Docs       Login

Jobsub ID 252812.158@dunegpschedd01.fnal.gov

Jobsub ID252812.158@dunegpschedd01.fnal.gov
Workflow ID10258
Stage ID1
User nameepennacc@fnal.gov
HTCondor Groupgroup_dune.prod_mcsim
RequestedProcessors1
GPUNo
RSS bytes4194304000 (4000 MiB)
Wall seconds limit80000 (22 hours)
Submitted time2025-11-16 13:33:58
SiteUK_Edinburgh
EntryDUNE_UK_SGridECDF_ce1_multicore
Last heartbeat2025-11-16 14:25:42
From worker nodeHostnamenode2b07.ecdf.ed.ac.uk
cpuinfoIntel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz
OS releaseScientific Linux release 7.9 (Nitrogen)
Processors1
RSS bytes4194304000 (4000 MiB)
Wall seconds limit171000 (47 hours)
GPU
Inner Apptainer?True
Job statejobscript_error
Started2025-11-16 14:12:03
Input filesfardet-hd:prodgenie_nue_dune10kt_1x2x6_20251004T000249Z_gen_001507_g4_detsim.root
fardet-hd:prodgenie_nue_dune10kt_1x2x6_20251003T223137Z_gen_000052_g4_detsim.root
fardet-hd:prodgenie_nue_dune10kt_1x2x6_20251004T012023Z_gen_003506_g4_detsim.root
fardet-hd:prodgenie_nue_dune10kt_1x2x6_20251003T231934Z_gen_000648_g4_detsim.root
JobscriptExit code1
Real time0m (0s)
CPU time0m (0s = 0%)
Max RSS bytes0 (0 MiB)
Outputting started 
Output files
Finished2025-11-16 14:25:42
Saved logsjustin-logs:252812.158-dunegpschedd01.fnal.gov.logs.tgz
List job events     (HTCondor job logs unavailable)

Jobscript log (last 10,000 characters)

OR] Socket error
[2025-11-16 14:21:15.273542 +0000][Debug  ][AsyncSock         ] [ccdcacli447.in2p3.fr:30714.0] Closing the socket
[2025-11-16 14:21:15.273550 +0000][Debug  ][Poller            ] <[::ffff:192.41.105.40]:50404><--><[::ffff:134.158.209.218]:30714> Removing socket from the poller
[2025-11-16 14:21:15.273602 +0000][Debug  ][PostMaster        ] [ccdcacli447.in2p3.fr:30714] Recovering error for stream #0: [ERROR] Socket error.
[2025-11-16 14:21:15.273608 +0000][Debug  ][PostMaster        ] [ccdcacli447.in2p3.fr:30714] Reporting disconnection to queued message handlers.
[2025-11-16 14:21:15.273615 +0000][Debug  ][XRootD            ] [ccdcacli447.in2p3.fr:30714] Handling error while processing kXR_read (handle: 0x00000000, offset: 102070675, size: 179535534): [ERROR] Socket error.
[2025-11-16 14:21:15.273619 +0000][Error  ][XRootD            ] [ccdcacli447.in2p3.fr:30714] Unable to get the response to request kXR_read (handle: 0x00000000, offset: 102070675, size: 179535534)
[2025-11-16 14:21:15.273625 +0000][Debug  ][ExDbgMsg          ] [ccdcacli447.in2p3.fr:30714] Passing to the thread-pool MsgHandler: 0x955d0b0 (message: kXR_read (handle: 0x00000000, offset: 102070675, size: 179535534) ).
[2025-11-16 14:21:15.273747 +0000][Debug  ][ExDbgMsg          ] [ccdcacli447.in2p3.fr:30714] Calling MsgHandler: 0x955d0b0 (message: kXR_read (handle: 0x00000000, offset: 102070675, size: 179535534) ) with status: [ERROR] Socket error.
[2025-11-16 14:21:15.273776 +0000][Debug  ][File              ] [0x6c61920@root://ccxrootdegee.in2p3.fr:1094/pnfs/in2p3.fr/data/dune/disk/fardet-hd/04/fd/prodgenie_nue_dune10kt_1x2x6_20251004T000249Z_gen_001507_g4_detsim.root?xrdcl.requuid=13d8547c-4551-42c4-b6bf-94d5ce644461] Running the recovery procedure
[2025-11-16 14:21:15.273799 +0000][Debug  ][ExDbgMsg          ] [ccxrootdegee.in2p3.fr:1094] MsgHandler created: 0x596f1a0 (message: kXR_open (file: pnfs/in2p3.fr/data/dune/disk/fardet-hd/04/fd/prodgenie_nue_dune10kt_1x2x6_20251004T000249Z_gen_001507_g4_detsim.root, mode: 00, flags: kXR_open_read ) ).
[2025-11-16 14:21:15.273822 +0000][Debug  ][ExDbgMsg          ] [ccdcacli447.in2p3.fr:30714] Destroying MsgHandler: 0x955d0b0.
[2025-11-16 14:21:15.273854 +0000][Debug  ][ExDbgMsg          ] [ccxrootdegee.in2p3.fr:1094] Moving MsgHandler: 0x596f1a0 (message: kXR_open (file: pnfs/in2p3.fr/data/dune/disk/fardet-hd/04/fd/prodgenie_nue_dune10kt_1x2x6_20251004T000249Z_gen_001507_g4_detsim.root, mode: 00, flags: kXR_open_read ) ) from out-queu to in-queue.
[2025-11-16 14:21:18.347912 +0000][Debug  ][ExDbgMsg          ] [msg: 0x5970eb0] Assigned MsgHandler: 0x596f1a0.
[2025-11-16 14:21:18.347957 +0000][Debug  ][ExDbgMsg          ] [handler: 0x596f1a0] Removed MsgHandler: 0x596f1a0 from the in-queue.
[2025-11-16 14:21:18.348130 +0000][Debug  ][XRootD            ] [ccxrootdegee.in2p3.fr:1094] Handling error while processing kXR_open (file: pnfs/in2p3.fr/data/dune/disk/fardet-hd/04/fd/prodgenie_nue_dune10kt_1x2x6_20251004T000249Z_gen_001507_g4_detsim.root, mode: 00, flags: kXR_open_read ): [ERROR] Error response: bad address.
[2025-11-16 14:21:18.348208 +0000][Debug  ][ExDbgMsg          ] [ccxrootdegee.in2p3.fr:1094] Calling MsgHandler: 0x596f1a0 (message: kXR_open (file: pnfs/in2p3.fr/data/dune/disk/fardet-hd/04/fd/prodgenie_nue_dune10kt_1x2x6_20251004T000249Z_gen_001507_g4_detsim.root, mode: 00, flags: kXR_open_read ) ) with status: [ERROR] Error response: bad address.
[2025-11-16 14:21:18.348282 +0000][Debug  ][File              ] [0x6c61920@root://ccxrootdegee.in2p3.fr:1094/pnfs/in2p3.fr/data/dune/disk/fardet-hd/04/fd/prodgenie_nue_dune10kt_1x2x6_20251004T000249Z_gen_001507_g4_detsim.root?xrdcl.requuid=13d8547c-4551-42c4-b6bf-94d5ce644461] Open has returned with status [ERROR] Server responded with an error: [3012] Internal timeout
[2025-11-16 14:21:18.348291 +0000][Debug  ][File              ] [0x6c61920@root://ccxrootdegee.in2p3.fr:1094/pnfs/in2p3.fr/data/dune/disk/fardet-hd/04/fd/prodgenie_nue_dune10kt_1x2x6_20251004T000249Z_gen_001507_g4_detsim.root?xrdcl.requuid=13d8547c-4551-42c4-b6bf-94d5ce644461] Error while opening at ccxrootdegee.in2p3.fr:1094: [ERROR] Server responded with an error: [3012] Internal timeout
[2025-11-16 14:21:18.348353 +0000][Debug  ][ExDbgMsg          ] [ccxrootdegee.in2p3.fr:1094] Destroying MsgHandler: 0x596f1a0.
16-Nov-2025 14:21:18 GMT  Opened output file with pattern "prodgenie_nue_dune10kt_1x2x6_20251004T000249Z_gen_001507_g4_detsim_20251116T141208Z_reco.root"

====================================================================================================================
TimeTracker printout (sec)            Min           Avg           Max         Median          RMS         nEvts   
====================================================================================================================
[ No processed events ]
====================================================================================================================

====================================================================================================
MemoryTracker summary (base-10 MB units used)

  Peak virtual memory usage (VmPeak)  : 1402.94 MB
  Peak resident set size usage (VmHWM): 672.133 MB
  Details saved in: 'mem.db'
====================================================================================================
%MSG-s ArtException:  PostEndJob 16-Nov-2025 14:21:18 GMT ModuleEndJob
---- EventProcessorFailure BEGIN
  EventProcessor: an exception occurred during current event processing
  ---- ScheduleExecutionFailure BEGIN
    Path: ProcessingStopped.
    ---- FileReadError BEGIN
      ---- FatalRootError BEGIN
        Fatal Root Error: TNetXNGFile::ReadBuffer
        [ERROR] Server responded with an error: [3012] Internal timeout
        ROOT severity: 3000
      ---- FatalRootError END
      
      The above exception was thrown while processing module TriggerPrimitiveMakerTPC/tpmakerTPCsimpleThr run: 8528 subRun: 0 event: 15061
    ---- FileReadError END
    Exception going through path makers
  ---- ScheduleExecutionFailure END
---- EventProcessorFailure END
---- FatalRootError BEGIN
  Fatal Root Error: TNetXNGFile::TNetXNGFile
  The remote file is not open
  ROOT severity: 3000
---- FatalRootError END
---- FatalRootError BEGIN
  Fatal Root Error: TNetXNGFile::Close
  [ERROR] Server responded with an error: [3012] Internal timeout
  ROOT severity: 3000
---- FatalRootError END
---- FatalRootError BEGIN
  Fatal Root Error: TNetXNGFile::Close
  [ERROR] Server responded with an error: [3012] Internal timeout
  ROOT severity: 3000
---- FatalRootError END
%MSG
[2025-11-16 14:21:18.800652 +0000][Debug  ][JobMgr            ] Stopping the job manager...
[2025-11-16 14:21:18.801091 +0000][Debug  ][JobMgr            ] Job manager stopped
[2025-11-16 14:21:18.801122 +0000][Debug  ][TaskMgr           ] Stopping the task manager...
[2025-11-16 14:21:18.801197 +0000][Debug  ][TaskMgr           ] Task manager stopped
[2025-11-16 14:21:18.801234 +0000][Debug  ][Poller            ] Stopping the poller...
[2025-11-16 14:21:18.801432 +0000][Debug  ][AsyncSock         ] [ccdcacli447.in2p3.fr:30084.0] Closing the socket
[2025-11-16 14:21:18.801442 +0000][Debug  ][PostMaster        ] [ccdcacli447.in2p3.fr:30084] Destroying stream
[2025-11-16 14:21:18.801474 +0000][Debug  ][AsyncSock         ] [ccdcacli447.in2p3.fr:30084.0] Closing the socket
[2025-11-16 14:21:18.801506 +0000][Debug  ][AsyncSock         ] [ccdcacli447.in2p3.fr:30350.0] Closing the socket
[2025-11-16 14:21:18.801511 +0000][Debug  ][PostMaster        ] [ccdcacli447.in2p3.fr:30350] Destroying stream
[2025-11-16 14:21:18.801515 +0000][Debug  ][AsyncSock         ] [ccdcacli447.in2p3.fr:30350.0] Closing the socket
[2025-11-16 14:21:18.801526 +0000][Debug  ][AsyncSock         ] [ccdcacli447.in2p3.fr:30555.0] Closing the socket
[2025-11-16 14:21:18.801532 +0000][Debug  ][PostMaster        ] [ccdcacli447.in2p3.fr:30555] Destroying stream
[2025-11-16 14:21:18.801535 +0000][Debug  ][AsyncSock         ] [ccdcacli447.in2p3.fr:30555.0] Closing the socket
[2025-11-16 14:21:18.801546 +0000][Debug  ][AsyncSock         ] [ccdcacli447.in2p3.fr:30572.0] Closing the socket
[2025-11-16 14:21:18.801551 +0000][Debug  ][PostMaster        ] [ccdcacli447.in2p3.fr:30572] Destroying stream
[2025-11-16 14:21:18.801555 +0000][Debug  ][AsyncSock         ] [ccdcacli447.in2p3.fr:30572.0] Closing the socket
[2025-11-16 14:21:18.801563 +0000][Debug  ][AsyncSock         ] [ccdcacli447.in2p3.fr:30700.0] Closing the socket
[2025-11-16 14:21:18.801569 +0000][Debug  ][PostMaster        ] [ccdcacli447.in2p3.fr:30700] Destroying stream
[2025-11-16 14:21:18.801573 +0000][Debug  ][AsyncSock         ] [ccdcacli447.in2p3.fr:30700.0] Closing the socket
[2025-11-16 14:21:18.801582 +0000][Debug  ][AsyncSock         ] [ccdcacli447.in2p3.fr:30714.0] Closing the socket
[2025-11-16 14:21:18.801587 +0000][Debug  ][PostMaster        ] [ccdcacli447.in2p3.fr:30714] Destroying stream
[2025-11-16 14:21:18.801591 +0000][Debug  ][AsyncSock         ] [ccdcacli447.in2p3.fr:30714.0] Closing the socket
[2025-11-16 14:21:18.801599 +0000][Debug  ][AsyncSock         ] [ccdcacli447.in2p3.fr:30954.0] Closing the socket
[2025-11-16 14:21:18.801605 +0000][Debug  ][PostMaster        ] [ccdcacli447.in2p3.fr:30954] Destroying stream
[2025-11-16 14:21:18.801609 +0000][Debug  ][AsyncSock         ] [ccdcacli447.in2p3.fr:30954.0] Closing the socket
[2025-11-16 14:21:18.801618 +0000][Debug  ][AsyncSock         ] [ccxrootdegee.in2p3.fr:1094.0] Closing the socket
[2025-11-16 14:21:18.801628 +0000][Debug  ][Poller            ] <[::ffff:192.41.105.40]:60324><--><[::ffff:134.158.209.218]:1094> Removing socket from the poller
[2025-11-16 14:21:18.801755 +0000][Debug  ][PostMaster        ] [ccxrootdegee.in2p3.fr:1094] Destroying stream
[2025-11-16 14:21:18.801761 +0000][Debug  ][AsyncSock         ] [ccxrootdegee.in2p3.fr:1094.0] Closing the socket
Art has completed and will exit with status 1.
justIN time: 2025-12-19 12:14:23 UTC       justIN version: 01.05.03