Autoscaling in multi-cloud environments

Say that your online service or web site suddenly gets a 10x increase in users. What is your reaction? If you are in marketing or sales, you are likely overjoyed. If you are in operations, and need to keep the service responding smoothly, you might be rather worried. In particular since everybody, including your users and your CEO, will be quite angry if your service fails to respond in a timely fashion (or at all).

The problem is hardly new. The first FIFA soccer World Cup to meet serious Internet traffic was in 1998, and during important matches, the load shot through the roof.

fifa-load-curve

The solution back then was to run a large (for the time) number of servers, and just hope that none of them would break down. Today, we have the cloud and more modern solutions. Enter autoscaling.

Autoscaling

Autoscaling is about adding and removing servers from a pool based on demand. Multi-cloud autoscaling is the same, but involves using different clouds (or regions/zones from a single cloud provider). It is obvious that the use a large number of servers and hope everything will work out has drawbacks. It is not only unable to deal with loads that are higher than you imagined, it is also prohibitively expensive. So what are the main motivations for autoscaling?

The first argument that comes to most people’s minds is the most obvious, but not the strongest: cost-savings. Obviously, if you stop paying for servers that you don’t need, you save money. We would like to argue that there are much more important reasons for wanting to use autoscaling, however.

Using autoscaling gives you as a service operator peace of mind, as you know that your deployment is always rightsized. More users, more servers. Fewer users, fewer servers. All users are met with a responsive service. This is perhaps the biggest gain, but there are others, too.

Because you spend less time on manually managing servers, and keeping your deployment reasonably sized, you have more time for other matters. If your ability to manage servers is not what makes your service or business unique, your time is better spent elsewhere, enhancing your service.

Your service deployment is more robust, since it is more capable to deal with sudden spikes in demand. And with automatic termination and replacement of servers that are misbehaving, which you get with our autoscaler, you know that your instances are always operational.

All these motivations lead to increased user satisfaction, which is key in offering a service — online or otherwise.

Clarobet case study

In our case study with Clarobet, we see many of these motivations for using autoscaling in real life. A typical month looks like this, in terms of load:

clarobet-running-instances-case-study

Using autoscaling to closely follow these variations, Clarobet saves 55%, compared to running 200 server instances all the time. This is what they would have to do, if we were back in the 1998 mind-set and had its technology limitations.

Because the Elastisys autoscaler always ensures that capacity is available to Clarobet as it is needed, their peace of mind has increased to the point where they almost forget that they are even running in the cloud. Capacity is just there for them, when they need it, as evident by this quote from their CFO and Lead Developer, Lars Cardon:

[…] We feel very confident in the Elastisys solution, at times even forgetting that we have a lot of virtual machines running in the cloud — everything just works.

Because their server administration is so simplified, they no longer have to spend hours every week on manual administration tasks. Instead, they spend mere minutes per week, just to make sure everything is as it should be. For a small agile team of devops like that at Clarobet, with only 4 full-time employees, that type of reduction is huge. They can spend so much more time on improving their service because of it, which furthers their core business. Server administration skills was never their unique selling point, so any time spent on it was wasted time, compared to more important tasks. With Elastisys autoscaling, they are able to focus on what their customers want, rather than what is required to deliver that service to them.

Configuring multi-cloud autoscaling

Does it have to be cumbersome to configure autoscaling? No. Even if it uses multiple clouds? No! To show that this is so, we have created a very simple to use responsive web interface that makes this task a breeze for a typical load-balanced web application. The configuration steps are simply to:

  • select which cloud zones to use,
  • configure the server template for each zone (set which size, image, and initialization data it needs),
  • select a load balancer to register with,
  • set the capacity of each server in terms of requests per second, and that’s it!

You might also want to configure notification recipients based on severity. Notification recipients can be either email addresses or web hook addresses, to which the autoscaler will post alerts of the chosen severity level. So you could hook in to PagerDuty for hairy error notifications that immediately notify your operations team, and send emails for more routine information about scaling actions. Your choice.

The components we configure with this system are shown here in this figure:

splitter_demo_setup

Requests from users come in to the load balancer, shown at the top. An agent software asks the load balancer for the number of requests it has seen, and reports the requests per second rate into a monitoring database. The autoscaler reads from the monitoring database, and figures out what a suitable number of web servers would be. It sends this value to the splitter, which splits the deployment across multiple cloud zones in City Cloud according to a user-specified ratio. There is then an Elastisys cloud pool for each cloud zone that keeps the deployment at the size specified by the splitter.

Failure handling

Failure is inevitable. What matters is how we deal with them. If a software error is found, what do we do? Well, we fix it, and replace the faulty version with a new, improved version where the error is absent. In case of hardware failure, we replace the hardware with new hardware where the error is absent. What do we do when a cloud outage happens? Can we replace those? Yes, just like in the other cases, we replace what is broken with something that works. Another new cloud zone, in this case!

To see how this works, assume that a cloud zone stops working. There is an API endpoint error, or something. Or the network might be down. For whatever reason, we can’t deploy our application there, even though we wanted to. We still have whatever number of users we had, irrespective of cloud outage. This means that the required capacity is still the same. Our splitter comes to the rescue, since it automatically starts using a different cloud zone (one that has been configured as such, of course) and calls in replacement instances from there. When the original cloud zone becomes operational again, the splitter notices this, and terminates the replacement instances in the fail-over cloud zone. Simple, and effective.

Designing for autoscaling

Can we apply autoscaling to any server that we have right now? No, sadly not. autoscaling only works for service components that are ready to be replicated across multiple servers. Luckily, this is an area that has seen huge improvement in the last decade or so, as scaling out (adding servers) rather than up (increasing the capacity of a single server) has become more common.

Applications that autoscale well have been designed for automation. Since autoscaling will add and remove server instances all day and night, we want to avoid human interaction completely. This means, among other things, that we must have a good solution for service discovery — letting new server instances make their presence known to already deployed server instances. At the very least, this means registering in some shared registry such as a load balancer. It could mean transferring a lot of data, if we add a database shard server to a clustered database. Regardless, automation is key.

The online service should be split into a service-oriented architecture (SOA), which is a fancy way of saying your service should be split into components that do just one task (or at least few highly related tasks). By adhering to that principle, you can scale each component individually. Do we have a component that transcodes videos uploaded by users? Good, perhaps that needs considerably more capacity than the web server components, or the send a monthly newsletter components. Obviously, it makes good sense to have this type of separation, since our deployment becomes that much more manageable — both by humans, and by an automated system.

We should strive for statelessness, since stateful information by definition makes server instances unique. We want to avoid that as much as possible, because uniqueness means we cannot easily replace them. Say that we have important sessions (for an online shopping cart, maybe) in our web servers. We cannot terminate such web server instances until all sessions have timed out or been closed by our users. This means that we could have a web server instance sitting around for a long time, just because a single user might come back at some point in the next hour or so. We also don’t want to initiate new sessions there, because then we can never get rid of it. It’s clumsy, and problematic.

State information does have its place, of course! A truly fully stateless service is likely of limited practical value (although there are examples). It’s just that that place is not within components we are trying to autoscale aggressively. Web session information can be placed in a memcached server cluster, for instance.

If we always make sure to have two of everything during development, that automatically puts us in the right type of mindset. Because if you can handle having two of something, the step to handling three or four of the same thing is much, much smaller, than going from one to two. Gone are assumptions about a single master instance that does all the work. Gone is also the dreaded single point of failure, which a single instance of something is guilty of being. Our applications are in much better shape if we always assume that there are multiple copies of everything, as this is the norm for distributed systems, anyway.

If all of this sounds complicated, we can help you design your cloud- and autoscaling-ready application. We have been doing cloud since the very start, and have a lot of software architecture experience. Contact us if you would like to discuss your cloud application.

Takeaway message

With Elastisys autoscaling, you get increased peace of mind and more time to focus on what makes your business unique. You spend much less time on managing servers, and more time developing your service. Your application deployment is more resilient and your performance is safe, even in the face of sudden usage spikes. You even gain fault tolerance against cloud outages! And, as icing on the cake, you will reduce your cloud costs. Are you ready to automate your cloud deployment, and start gaining all these benefits? We are ready when you are.

This blog entry is a textual adaptation of the presentation our Software Architect Lars Larsson gave at #CloudBeerStockholm on November 5, 2015.

Leave a Reply