We are living in a containerized world. Increasingly, when you deploy an application in the cloud, what you're deploying is an ever-changing swarm of container instances. You need to know how well that swarm is functioning, and how well individual containers are working. You need to know what they are doing, and if they are operating as expected. You need to detect and understand anomalous behavior and potential sources of stress on system resources. In short, you need container analytics.
In this post, we'll talk about the types and sources of data that are available from a containerized environment, and how you can use container analytics to gain valuable insight into the performance (and potential problems) of your containerized application.
What is a Container?
First, consider what a container is. It's basically a process running within a protected environment. It exists as a container image, and in operation, as multiple instances of that image. These instances need to be managed and orchestrated in order to function together as an application. Container images, container instances, and the container management infrastructure are all sources of data for container analytics.
The Code in the Shell
Let's begin by taking one step down from the level of the container itself. A container is just that —a container for code, and it's the code that actually runs processes and provides services. How do you capture performance at the process level?
One way is to include monitoring features in the code itself—Push performance metrics out to a service, or simply to a log. Your containerized system can include dedicated logging containers which receive log data from other containers, writing the logs to persistent storage so that they can be picked up by analytics applications. Needless to say, this involves a certain amount of coding overhead, but there are times when it may be worth the effort.
Another option is to include a monitoring agent in the container. This saves you the trouble of writing process-specific monitoring code, and the agent will typically be configured to send data to a monitoring and analytics application.
Much of the really valuable performance data, however, is available at the level of containers, container clusters, and container infrastructure. Even if your application code performs flawlessly at the process level, you still need insight into how those processes work together, and how the application performs as a whole.
Container platforms such as Docker generally provide basic performance metrics for individual containers. These metrics can include:
This can consist of system CPU time (for system processes called by the application process) and user CPU time (application CPU use). Individual containers may operate under enforced CPU throttling. When this is the case, metrics can include the throttling time and count.
Memory use data returned for an individual container should include total memory used. Docker also reports such things as swap and cache memory, and memory used but not cached or stored (Resident Set Size). Memory monitoring can include more detailed statistics, such as page faults (minor and major) and active/inactive memory.
Container network metrics may include volume of traffic and packet count, as well as transmit and received errors and dropped bytes.
Input and Output
Container I/O activity can be reported in volume and in number of operations, broken down by reads and writes, and by whether operations are synchronous or asynchronous.
Note that all of these metrics are valuable both in terms of overall application performance and analyzing individual container behavior. They can be equally valuable in providing insight into performance at the process level. If a container's CPU or memory metrics are significantly out of the expected bounds, for example, it is reasonable to suspect code/design-level problems with either a process within the container, or a process making requests to a container.
Anomalous behavior at the container level can also be an indication of intrusion or malicious attack, particularly if the container has not previously shown signs of such behavior. In particular, this can show up as unusual/unexpected patterns of network or I/O traffic, although these things may also be the result of unanticipated patterns of user activity (which may have analytic value by itself).
Open-source tools such as Prometheus and Telegraf can collect such container-level metrics; Prometheus, for example, works with Grafana to provide a variety of data visualizations.
System and Orchestration-Level Metrics
For a full, wide-screen view of application performance, you need to monitor containers at the aggregate level as well as individually. Orchestration tools are likely to include their own metrics collectors. Kubernetes, for example, includes Heapster, which can provide monitoring data to analytics and dashboard services.
In the case of Kubernetes, the most important metrics are typically at the pod level. These can include the number of pods that are currently running, the number that are available, the number that are unavailable, and the total number specified at deployment.
Application and Node Metrics
Orchestration-level metrics also include the overall application equivalent of the basic single-container metrics: CPU, memory, I/O, disk storage and network use. In Kubernetes, much of this data is also reported on a per-node basis. Orchestration tools may additionally report per-container metrics—in Kubernetes' case, this includes the minimum CPU that a container needs, as well as the maximum that it is allowed. Kubernetes can also report on individual container health.
Open-source tools such as Prometheus, or commercial tools like CA APM, can aggregate and analyze container data at the orchestration level. In combination with APIs and visualization tools such as Grafana, such aggregated, multi-level analytics can provide you with rich insights into the performance and behavior of your container-based application.
Can you really get good analytic data out of a container system? You'll find that it's a gold mine, once you know where to look.
To learn more about container monitoring and management, download this complimentary eBook.