Convergence Speed for Cloud Auto-scaling

Convergence is a process where two or more entities approach each other to get closer and closer. These entities could be rivers in a field, or, lines on a graph. For cloud auto-scaling, convergence is achieved when resource availability matches resource demand. In this blog post, we will look at how quickly auto-scalers can help your cloud deployment converge, and what impact that has on your cloud application.

Convergence speed

Resource demand for your cloud application are constantly changing as users come and go, and your cloud deployment needs to be correctly sized to serve your users. In essence, convergence is the process of having resource availability catch up to resource demand. Ideally, one would like perfect convergence, all the time. In practice, however, most do not achieve this. So what is the problem?

First, let’s make a thought experiment. At a single given point in time, we serve X number of users. This can be measured easily through the application logs. Knowing the limitations of our application, we know X users translates to requiring a deployment of size Y. So we know our resource demand. Say that our resource availability is Z (current deployment size), which is not found to not equal Y. Assuming we can immediately detect this, what needs to happen to achieve convergence?

  • The cloud provider needs to allocate new resources for your deployment. The time this takes varies, but Amazon Web Services states that [i]t typically takes less than 10 minutes from the issue of the RunInstances call to the point where all requested instances begin their boot sequences.
  • The resource needs to boot, configure itself, (possibly synchronize with other parts of your deployment), and become fully operational. Time consumption for this depends entirely on what software you are running. A new database instance will have significant work to do before it is operational, whereas a stateless new Apache web server will be much quicker. We can assume a few minutes for this, on average.

This means that, once we have detected that there is a problem, we might need up to 15 minutes before a new instance has become available in our cloud application deployment. This is the best a perfect reactive auto-scaling solution can give us. Perfection, in this case, means that it accurately determines the difference between resource availability and demand and immediately makes the entire adjustment as a single operation in response.

…but it gets trickier

Users face an application that is not performing well, but we will have it sorted in 15 minutes! 15 minutes is not that bad, right?

Wrong.

Or at least partially wrong. Remember that we assumed that we knew how to make our deployment go from its current size Z to its ideal size Y, and that we could start off that process immediately. In most auto-scaling systems, this is not the case. We can typically only determine that the system is suffering, and in response, ask that the cloud starts a set number of instances (or a percentage of the current deployment size). We then have to wait until the instances are operational, and then investigate if the effects are what we had hoped for. If not, we have to repeat the process until we have caught up.

Essentially, this means the convergence process is more like:

  1. Detect that there are performance issues and that we need to scale (up or down).
  2. Initiate scaling with set number of resources.
  3. Wait up to 15 minutes until the effects can be investigated.
  4. Investigate if modification had desired effect. If not, return to Step 1.

Assume that we had a sudden spike that, at the start, requires us to scale up to 4 instances. If our auto-scaler is configured to have us grow by 1 instance at a time, we would need in excess of a full hour to catch up! That time may even be longer, if we use a monitoring framework that does not give us data immediately or frequently. AWS CloudWatch by default supplies new data only every 5 minutes upon which we can base scaling decisions (1 minute resolution available at additional cost). To make sure we do not scale up based on a fluke, we would need to wait a few such periods to be sure that we are consistently under-provisioned. That makes our reaction even slower.

…but wait, it gets even tricker

It gets even tricker, though! The application that saw such a popularity spike is not just sitting there while scaling is going on. The spike, which was caused by end-users and their usage, is dynamic, and can fluctuate further while we are scaling our deployment. Resource demand is a constantly moving target.

The solution

Essentially, the problem is two-fold: (a) scaling actions take time, and (b) we need to determine the correct amount to scale (instead of adding/removing single instances and seeing if it worked).

The solution is two-fold, too. Elastisys Cloud Platform outperforms all its competitors by:

  • predicting future resource demand based on past usage data, and
  • pro-actively scaling with the mathematically determined correct amount, rather than piecemeal scaling operations.

In our thought experiment above, we noted that a perfect reactive auto-scaling solution can at best achieve convergence in about 15 minutes. In reality, few do, because they do not calculate the number of instances required for convergence. Because the Elastisys Cloud Platform predicts future resource demand, what would happen if we set it to look 15 minutes ahead? It would pro-actively make sure the resources available as they are needed. This key difference reduces the convergence speed to a theoretical minimum of zero minutes under ideal conditions. In practice, results depend on organic usage patterns from end-users. But it is still a world of difference to the slow 15-minute-per-iteration search for the correct deployment size that competing auto-scalers use.

How has poor auto-scaling affected your cloud application? Let us know in the comments below!

Leave a Reply