Skip to main content

We are running our entire Thinkwise platform in Azure.

To keep cost under control we are looking into doing automated scaling of the App Service and Azure SQL. For the App Service this is a really easy story and has no real downsides due to how scaling is handled by Azure

Our ideal situation would be to scale based on usage, look at the DTU usage in a sliding window and scale up or down based on the observation. However this leads to challenges/problems, during the scaling operation the connection gets dropped and we have found that currently running scheduled jobs by IAM do not always get recognized as ended/abandoned when the connection is lost. In turn this locks that entire schedule leading to missed job executions etc.

Besides IAM not handling it well it also is a challenge/issue for any long running jobs in the application itself.

 

I am curious how others have done this.

I am also curious if there are any known issues with correctly ‘abandoning’ running jobs in IAM and if not, how this was intended to work? Should Indicium already be able to handle these Azure migrations that can happen all the time, or is this something that is maybe not yet implemented? Scaling would drop current open connections, and if Indicium is not able to handle this correctly, dyamic scaling would be out of the question.


Ow yeah even without scaling Azure will drop the connections at (for users) random moments because of their internal processes (optimizing hardware usage, maintenance etc etc).

So Indicium SHOULD be able to handle these disconnects some what gracefully. 


I have hard time believing we are the only one having this challenge? 


I have hard time believing we are the only one having this challenge? 

@AndreKemmeren I guess you are. It is really hard to scale (or perform deployments for that matter) on a SQL server with zero downtime. 

As you can see in the Azure documentation they don't state ‘zero downtime’ but ‘minimal downtime’, which could be up to a minute as they state in one of the links on the page: Scale resources - Azure SQL Database | Microsoft Learn 

Microsoft does suggest ways for .NET to deal with a SQL Database that's temporarily down. I suppose Indicium could support that, so it might be worthwhile to raise an Idea accordingly: Working with transient errors - Azure SQL Database | Microsoft Learn


Ow yeah even without scaling Azure will drop the connections at (for users) random moments because of their internal processes (optimizing hardware usage, maintenance etc etc).

So Indicium SHOULD be able to handle these disconnects some what gracefully. 

You should be able to minimize this impact by using configured Maintenance windows and/or High availability setup.

NOTE: I don't know exact details for Azure, but we use both of these options for AWS RDS. We do a weekly maintenance window during Sunday-Monday night, which is not typically a time with much user activity.

Also be sure to schedule (large) System flows outside of this Maintenance window.