Solved

Indicium - Aborting schedule '...' of process flow - please give information

  • 8 January 2024
  • 6 replies
  • 98 views

Badge +7

I see that in IAM 2023.3 that if i create a file with the name 'app_offline.htm' then i see in the logfile of indicium that the running process flow is aborted.

something like:
2024-01-08T14:58:06.7288205+01:00  [WRN] Aborting schedule '01/08/2024 13:58:30' of process flow '"system_flow_send_email_notification"' from application 1 because Indicium has received a shutdown signal. (e8c21c8e)

But what means aborting?
- does it mean it just kills the process flow, (creating a maybe a problem)
- or does it mean it waits until all the steps of a process flow are done, while preventing new process flows to be started. (which would be nice)
and in de second case - how long will it wait? indefinitely?

I have been searching for documentation on this topic but have not been able to find any.

icon

Best answer by Vincent Doppenberg 9 January 2024, 11:32

View original

This topic has been closed for comments

6 replies

Userlevel 6
Badge +4

Hello @ericbosman,

When Indicium receives a shutdown signal, it will do the following:

  • Prevent new scheduled system flows from being picked up.
  • Abort system flows that were already queued, but had not yet started.
  • Await the completion of system flows that are in progress.

We make an effort to do graceful shutdowns and allow all system flows that are in progress to finish, but in the end it is up to the web server to decide how long the process can be in the ‘shutting down’ state before it will simply end the process.

In IIS, the default shutdown time limit is 90 seconds, so this means that system flows will have 90 seconds to finish up before IIS will end the process anyway. If necessary, the shutdownTimeLimit setting on the Application Pool can be used to increase (or decrease) the window of time in which system flows are allowed finish. More information on that setting can be found here. Despite there being no mention of this in Microsoft's documentation, I do believe there is an upper limit of 5 minutes for this setting.

I hope this helps.

Badge +7

Thank you for this information, it helps.

In our case we have sometimes exceptional system flows which run's for hours. And in that case we do not want to to terminate the system flow.

Do you know that if in this case

  • i set the shutdown time limit to 6 hours and
  • i remove the 'app_offline.htm' file (after 30 minutes)

the abortion of the running system flows will be canceld?

Yours sincerely,

Eric

 

Userlevel 6
Badge +4

Hello @ericbosman,

I would expect the system flow to be terminated after 5 minutes in this case because, as stated earlier, I believe that the upper limit for the shutdownTimeLimit is 5 minutes. In addition to that, once a shutdown signal is sent, it will go through even if you remove the app_offline.htm file before the shutdown is finished.

It is also important to realize that when the app_offline.htm file is present, the web application will no longer accept incoming requests, nor will it start a new instance of the web application until the file is removed. In other words, Indicium would be inaccessible for a long time. So there is a good reason why long shutdownTimeLimits should not be supported.

When you have a very long running system flow that cannot be interrupted, I would not look for a solution where the shutdown signal is sent, but is ignored for long periods of time or even canceled after half an hour. Because if it is acceptable to cancel the shutdown, then apparently the shutdown was not necessary.

So instead, I would look for a solution where the shutdown signal is never sent while this system flow is running by ensuring that the Application Pool sticks to a recycle schedule that cannot overlap with the system flow schedule + x hours. This does require the system flow to have some level of predictability.

Alternatively, I would look to make the system flow itself more robust so it can deal with restarts. This would involve some kind of mechanism where you track where a system flow left off and pick it back up when Indicium is restarted, perhaps by means of a second, short interval system flow that simply looks for pending operations and completes them. It might also require the use of transactions to ensure that critical sections are either completed or rolled back.

I hope this helps.

Badge +7

Yes, this helps.

Knowing that a shutdown signal is irreversible means that before sending it, I check to see if any system flows are active.

And take all the necessary actions.

Thx

Userlevel 6
Badge +10

@Vincent Doppenberg We see below error occuring regularly at the Indicium recycle time, specifically for a Scheduled flow that runs every second. It doesn't read as a very ‘graceful’ cancelation, so which of the 3 bullets you mentioned relates to this error?

When Indicium receives a shutdown signal, it will do the following:

  • Prevent new scheduled system flows from being picked up.
  • Abort system flows that were already queued, but had not yet started.
  • Await the completion of system flows that are in progress.
{"Timestamp":"2024-01-23T03:00:11.5151829+00:00","Level":"Error","MessageTemplate":"Error scheduling system flow '' for application .","Exception":"System.OperationCanceledException: The operation was canceled.\r\n at System.Threading.CancellationToken.ThrowOperationCanceledException()\r\n at System.Threading.SemaphoreSlim.WaitUntilCountOrTimeoutAsync(TaskNode asyncWaiter, Int32 millisecondsTimeout, CancellationToken cancellationToken)\r\n at Indicium.BackgroundServices.SystemFlowScheduler.scheduleSystemFlow(Int32 guiApplID, String systemFlowID, DateTime scheduledTime, CancellationToken stoppingToken) in Indicium\BackgroundServices\SystemFlowScheduler.cs:line 246","Properties":{"systemFlowID":"rapid_scheduled_process_api_queue","guiApplID":99,"SourceContext":"SystemFlowScheduling","MachineName":"IP-0A92176C"}}

 

Userlevel 6
Badge +4

Hello @Arie V,

I see that the error handling is not optimal in this scenario, I will make sure that this is addressed in the next release.

This error is related to the second point, aborting system flows that were already scheduled, but had not yet started. Functionally this does work as intended, however, the message in the error log should have been more friendly.