We can do better with capacity planning and alerting for microservice-based, container-managed clusters of compute resources.
Take your time and walk through the presentation deck for yourself.
Slideshow PDF file with speaker notesWe are entering the age of the third platform in IT. The way we run applications is changing rapidly and significantly.
"Apps" were run on mainframes, but they were too big and too expensive.
Apps ran on commoditized computer hardware such as a PC, but this came with a high degree of waste. CPUs, memory and disk remained underutilized as single apps sat relatively idle to provide excess capacity for workloads that never came.
Computers were virtualized so that several would be able to run at once on one hardware computer. This better used the CPU, memory, and disk of the inter-connected computers, thus creating a cloud of infrastructure resulting in much less waste.
The third platform moves the focus to driving business value and improving applications, instead of the increasing cost and infrastructure. We now virtualize applications into slices of CPU, memory, and disk of virtualized machines called containers. This process produces far less waste because the application containers can be spun-up or down in reaction to demand nearly instantly. We can scale applications to multiple nodes in places where capacity is available now, whether in Boston or Hong Kong. To scale well, applications are being redesigned as microservices, little units of work providing well-defined services to the whole of the modern distributed application.
The key element enabling this is capacity planning and load alerting for the container managed clusters. Do we need to scale now? If so, do we scale the app out because we need more resources to serve an avalanche of demand? Or do we scale in, because Black Friday has passed and our retail app no longer needs to use so many resources?
Our project is motivated to reduce waste by "right-sizing" energy consumption by linking it more
directly to demand. Not many have seen the inside of a large Data Center, but these are colossal
energy-hungry ventures[1] on a scale that is difficult to appreciate
without seeing one in person.
Everything in a data center is planned around watt usage. Over half of the wattage goes to server
load and most of the rest to cooling equipment. If we can better predict and scale compute resource
load more efficiently we have the opportunity to effect change in the United States alone at about
the 9.1 TWhr (Terawatts per hour) level and, perhaps, slow the projected rise in consumption by 2020
to 13.9 TWhr [2].
Looking for solutions
Our inferential goals are to understand how better to do capacity planning and alerting for microservice-based, container-managed clusters of compute resources. We tried to learn what makes a good alert threshold, such that we can predictably recommend actions that will keep an application highly available under various demand scenarios ranging from low to high. Effectively, we want to investigate the elasticity of supply and demand on compute resources so that we can make ongoing dynamic recommendations about the proper scale for a given set of inputs. Benefits include:
Learning what a low usage state is for applications and provide information on the scale (how many compute nodes) that should be set on a dynamic basis
Reduction of energy usage by fitting the supply and demand more appropriately over time
The ability to prescriptively maintain application availability with the minimum amount of compute resources
Datacenters around the world
Click
the image above to view our interactive map
Datacenters by country
We started by analyzing a small subset of data provided to us by Pivotal from
their Pivotal Web Services platform.
This is largest public Cloud Foundry platform currently
in operation. Essentially, Cloud Foundry is and open-source well orchestrated container platform
that allows many
applications to scale easily in and out.
This data provided us insight into the nature of a completely container-based approach.
We started by analyzing a general container metric which gave around 3,000 instances (containers)
running on
this particular Cloud Foundry foundation. We found that the mean instance CPU utilization was very
low at around 2%.
Each instance consumed the about 500 Mb or memory.
The next level of consolidation in the third platform will occur through application virtualization.
Our goal was to identify how we could concentrate workloads to reduce energy consumption, so we
began looking for application data we could use to identify patterns.
We were able to get a number
of applications running on Pivotal Web Services. We added application performance monitoring by
binding all our microservices to a free trial of New Relic. The
New Relic suite is superb at
tracking web and service transaction metrics such as response time and throughput. However, after
initial exploration, we found it very difficult to export the data in a way that we could identify
individual application utilization.
Next we have a free subscription to DataDog. We found the metrics were too high level for what we needed. They tended to be at the host/server level.
Since each application runs in it’s own container, we thought we could get equivalent data from a container solution. Coincidentally, we were given a demo of an internal Pivotal APM tool.
Although the product was not available, we found we were able to obtain container level metrics
through a command line interface through the same application “nozzle” that this beautiful dashboard
was using.
We installed a plugin that would export the metrics following the instructions at the firehose-plugin site. Then we
ran the following commands
cf nozzle -f ContainerMetric > ContainerMetricPaaS4.txt cf nozzle -f ValueMetric > valueMetricSaaS2.txt which piped the output to a file.
Just to get a sense of general performance across the different services, we plotted the three different readings: memory usage, disk usage, and CPU utilization. For easy readibility, we have removed the exact timestamps from our x-axis. Below is an example lot of disk usage for PaaS:
We took a look at our container data again, this time grouped by application, IP address, and Instance Index. Cloud Foundry assigns each application a UUID (Universally Unique Identifier) that we can use to identify a single application. Below is a sample plot of memory consumption, grouped by application.
We then analyzed some of the relationships between memory/disk usage and CPU utilization in more detail on an app by app basis. For further analysis, we should be concerned with instances and applications that appear in the top right or bottom left corners of these plots - areas where both CPU Utilization and Memory/Disk Usage are considered to be high or low. When a process is in this area, we can begin to consider scaling out compute resources. Likewise, if we are residing in an area with low usage for both, we can start to ratchet down compute resources to scale them in.
Our limited model illustrates the possibility of reducing energy usage by a terawatt in the near future.
FindingsOur data allowed us to group high and low utilization applications. We will be able to segregate the high usage services, with common patterns, through anti-affinity rules while letting the others fill in the gaps. Moreover, these pattern identification methods allow us to predict scaling needs before thresholds are met. This will allow us to scale out, along with scaling in as demand fades. The capability to reduce capacity will cut down on future electricity consumption.