Skip to main content

Hi,

 

I have made a system flow that mainly consists of 3 sub flows and a decision + 2 tasks for handling data of a flag value. 

This is the diagram in SF:

The initial decision checks if the flag value can allow the process to start, otherwise it goes to end.

In the execute subflows, the main process is based on doing repetitive GET calls on ms graph endpoints in order to recieve data for microsoft plans, buckets and tasks. For those processes I use some DTO tables in order to store the fresh data and at end of each subflow, the data from dto table/tables is sync in a transaction with the data from our environment.

 

When I schaduled the flow to 1 minute for example, everything went smooth and I could see the changes in our environment but things went a little strange when I changed the scaduler to start the process each 5 seconds.

In my oppinion it should check for the flag value and if this is not yet released from the previous run, it should stop the system flow immediately. After some checks in IAM, I saw that it still took the process 8-9 seconds to finish for that case, wich is bizare. And yes, the decision and the flag value does work accordingly since I ve run some manual tests on it.

Made this screen on IAM and the Starting dates (like 1 min each) started actually after a couple of seconds. That results on starting date + end date to increase with 1 minute each 5 seconds

Also, after I schaduled it to try to run each 5 seconds I saw that is messed up with my data. For reference, I use the same DTO table in those 3 subflows and at the beggining of each subflow, the DTO table/tables are cleared so they can be ready for the sync later on. I imagine the data is gone (doing that in a soft delete manner) for some reason that the subflows are being ran in sync and not async even tho they are in a sequence in the process flow (actual system flow).

 

Thanks,

Paul

Hey Paul,

The settings of the Process flow are interesting here. Did you enable Multiple running instances? I can imagine the flow takes longer than 5 seconds to run, causing following instances to await the previous run before starting.

If Multiple running instances is on, you must ensure these runs do not interfere with each other. You can do this by creating a unique string at the start of the flow and use it throughout the run. That can be used as a reference to the data your flow produced, and allows you to distinguish one instance from another.

With Multiple running instances on, multiple instances can run simultaneously.


Hi Mark,

In my case, I dont want the process to have multiple instances running the same time but just 1 at a time.

Indeed, the entire system flow takes about 8-10 seconds for the amount of data I have at that moment and the flag value should not allow the subflows to execute. 

The "multiple running instances” was not checked in my chase wich left me thinking that I would not really need the flag configuration anyhow since the startings of the flows should not interfeer with each other. I will enabled it to check how it goes anyway.


@TurcuPaulAndrei did you encounter any odd (dead)locks in your indicium logfiles? I can imagine if the processes are running on/using the same tables things could get hairy...


Hi,

After more investigations I found out a couple of more things:

  1. It may be a chance that the first problem pointed was either a failed sync to IAM or bad data being displayed on system flow run logs. So the problem with timestamp on logs can be ignored.
  2. I have set up the system flow to allow multiple instances. That lead to two cocnerns:
    1. If i run a deploy while the system flow is running, it will not terminate and the execution will be kept on running. If I manually update the flag value, it allows a next system flow to start and also not finish, causing two system flows to run. At some point it stops the execution of one and the data rezulted from it is deleted (since the next one started cleared the DTO table values)
    2. The 2nd problem may be a cause effect of first one but for that I need to check more. Yet, if the system flow finish the execution (it consists on some query as transactions at the end of each sub flow from the system flow) and starts a new one right after, there is a slight chance to have deadlock or to mess up with the data.
  3. I'he set up the system flow back to not allow execute multiple instances and gave it a colldown of 30 seconds between each execution. Normally the process taxes max 10 seconds with my input data. I've let it run for a while and also over the weekend (with 1 min cooldown), no deploy done on branch and so and the data was ok. After i've ran a deploy on my branch, in the same time (depends on luck here) the system flow also started and my best guess is that it messed up with the data since the output resulted in wrong execution.

For now I cant tell if there should be some option to disable the system flow for the specific application on IAM for when a deploy is being done or how to approach the issue.


Hi ​@TurcuPaulAndrei,

Trying to help you out with the issues one by one, if I overlook or forget something, please point it out:

IAM system flow instance timings

The system flow instances are started every interval set up on the schedule. This can be 1 minute or 5 seconds. If the previous instance executing time exceeds that interval, the next instance will or won't wait for that running instance based on the Multiple running instances setting.

The start times differing from the interval is logical is the execution time exceeds the interval (if Multiple running instances if turned off). Indicium can and will only start a new instance if the previous one is finished, either by having an End date time (system flow ended natually), or being Abandoned (system flow ended by another Indicium instance. This can occur is the current Indicium restarts for example)

DTO/Staging tables

Using Staging tables for the MS Graph data is perfectly fine to use. However, you must ensure that every run is unique in these tables, otherwise they can only be used and filled by one system flow instance at a time. Either the DTO tables have to be modified to allow multiple instances using it, or somewhere in your application a checkbox needs to exist to ensure the previous system flow instance has finished. Or even creating DTO tables per instance by creating them during the process itself.

However, you will also need to deal with errors that may occur during the system flow execution that may leave the checkbox on, or that fails to remove the data from the DTO tables. Working with Try-catch structures in Process procedures that may exist in this system flow will help you catch the error and allow you to act upon that, for instance by deleting the data from the DTO table and/or unchecking the checkbox so the next instance can do its thing.

Alternatively, you could opt to migrate this system flow to a sub flow, then create a separate system flow that executes this sub flow. If anything goes wrong in that sub flow, you can act on it and uncheck the checkbox and/or remove the data in the staging tables.

Synchronizing to IAM / Deploying during a run

We currently have no true safeguard for that I'm afraid. A live model should not be deployed or synched to without understanding what consequences that might have. You could opt to disable all system flow schedules temporarily to ensure no system flow instances are running at the time of deploying/syncing.

In IAM you would typically deploy a new version and make this available to users once configured and deployed successfully. The previous version will keep running until the next version is set online. When turning off an application, the system flow instances that are running may be stopped, not immediately though. No new instances will be started.

Deadlocks

Never fun but deadlocks are possible to occur when one process is using the data another process is trying to delete. This can also occur due to locking, and in particular Page or Table locking. When SQL Server tries to delete all records in a table, it may set a Table lock. This gives that process exclusive access to the table to ensure it can delete all records in the table before other processes can continue. Or if a subset of the records is deleted, SQL Server may set a Page lock, with the same result.

You can circumvent that by deleting the records one-by-one, using a Cursor or a While loop. This will enable SQL Server to not have to set a lock on the Page or Table, as it only needs one record. Nevermind this.

 

Does that answers all questions and address all the issues? Hope it helps!


...

You can circumvent that by deleting the records one-by-one, using a Cursor or a While loop. This will enable SQL Server to not have to set a lock on the Pgae or Table, as it only needs one record.

...

 

AFAIK, that is not how lock escalations work and you may have just made it worse 😶. There are thresholds iirc for about 5k objects and then it will escalate. Since you are doing a cursor which includes the overhead of IO, you’ve just made the duration of the complete transaction longer which in turn will make matters worse. Might as wel built a try catch mechanism around the act to catch the deadlock and retry it when it breaks loose of the lock(which is a 5 second hold detection on the lock according to Brent Ozar?)


Might as wel built a try catch mechanism around the act to catch the deadlock and retry it when it breaks loose of the lock(which is a 5 second hold detection on the lock according to Brent Ozar?)

Sounds like a better solution yes


Hi,

Thanks for the suggestions. For now I think I have it sorted out.

The deadlock issue i saw on live error log came from another colleague. For now I have set up all of the subflows and system flow included to terminate imidiately if something goes wrong in the meantime. The order it gets executed is in a linear sequence so for me it would not make sence to even think about deadlocks. 

The subflows I use follow the same basic logic: it gets data in a loop by using some web connectors and then, at the end of each subflow, in a transaction the data gets synced with the real environment data by doing some joins with DTO tables. The insert and update are done on the table right away, while the delete is done in a soft delete manner. The actual delete happens at the verry end of the system flow.

Now, a strange thing that I observer is what's going on with the system flow at the deployment of the branch is runing from. The system flow does not have the checkbox for having multiple instances running at the same time (which is ok since i want it to run once at a time) but still i have a flag that is checked and marked for each run. The value of the flag is treated for both  succesfull or failed runs so at the end of system flow, the flag should be released. Now, sometimes I see that my flag remains unreleased and the only ideea I got is because I was running multiple deploys of the branch and the system flow was running at the same time (the flag value is from the branch in cause) this causing the flow to terminate in the middle of execution without any other options.

For now, other than the issue with the deploy of the branch and the live tracing of the data on running system flows, it works fine for me.

 


Reply