APM The Many Data Sources of Containers
April 03, 2018

Nicolas Bohorquez
Software Developer

One of the great things about containers is that the use cases and workflows associated with them are so broad. There is no one single toolset or group of processes that you need to use when working with containers.

Nor is there a specific way to connect containerized applications to data storage. The types of data sources that exist in a container environment can vary widely, as can the ways in which containerized applications connect to those data sources.

This blog discusses varying data sources for containerized applications, starting with the obvious (SQL-style databases) and moving into data sources that you may not expect to see in containerized applications.

SQL Databases

At first sight, a stateful application like a relational database might not seem to be exactly the kind of use case that a container platform could manage with good results. But as relational databases evolve, offering more resilient approaches to requirements for data partitioning, and clustering and replication in container deployment, scaling and management tools also provide new features that improve this schema.

In the Kubernetes world, pods (groups of containers that share specifications about how they run, persistent storage, and network identification) can be used to run a highly available, resilient database cluster. This could be considered as an alternative to other solutions related to scaling relational databases offered by providers like AWS or Alibaba Cloud. (If you want to experiment, you can check here or here.)

ARM and Beyond

But not all container environments are equal. Containers are being deployed on different hardware platforms, including alternative architectures like ARM chipsets found in Raspberry Pi motherboards. This flexibility allows for the processing of data from data-gathering sensors and actuators, and enables the connection of small hardware pieces into data processing pipelines that benefit container environments.

The Land Information System, developed by the Hydrological Sciences Laboratory at NASA's Goddard Space Flight Center, is another example of containerized software that leverages interesting data sources. The platform integrates data from satellites and ground-based observations to perform analysis and run models on climate change research. Other data sources from sensors include air quality or sound and noise.

Big Data Transformations

Transforming data from an Online Transaction Processing (OLTP) environment to an Online Analytical Processing (OLAP) environment is another perfect option for a containerized solution. OLTP is a software system that processes a high number of small actions simultaneously with small pieces of data—for example, a ticketing system. You register yourself (give small pieces of data like your name, card number, etc.) for an event (location, time, description, etc.) and the OLTP system creates a small new piece of information in real time that links your profile to the event. These kinds of transactions should be small and atomic because the levels of concurrency could be important.

The most common applications used on the web are those that capture information and provide real-time results from atomic transactions. Those systems usually produce huge amounts of data that are stored in data lakes or a similar kind of architecture, usually in an unstructured way.

An OLAP system produces information after a non-trivial analysis, taking data from several data sources to create a model from several dimensions. Those dimensions could be user profiles, social network behaviors, historical records, data derived from other models, etc. That data usually must be transformed, standardized, cleaned and pushed to new structures or pipelines, or clusters to train or test machine learning models, or used as a source for analytical tools. All these operations benefit from a general container tool-agnostic approach.

Containers are now being used to help in the ETL (Extraction Transformation Load) process. The CEINPA's software architecture is another example of how containers can be used to help with Big Data transformations.

Conclusion

Given the flexibility of containers, the sky is the limit when it comes to the ways in which data sources can be connected to containerized applications.

Of course, with this flexibility comes new challenges. Developers and admins must ensure that the data sources they use with containerized applications are effectively monitored, no matter where the data is stored or how it is connected to the container environment. For this reason, your monitoring tools need to be just as flexible as your container software stack.

Nicolas Bohorquez is a Software Developer from Colombia, currently earning a Master's in Data Science for Complex Economic Systems at the Collegio Carlo Alberto in Turin, Italy
Share this