A multi-cloud application is simply one that uses two or more different cloud services, regardless of provider. For example, this could include a cloud-based database such as AWS S3 with application code that uses it deployed on AWS Lambda. However, the more common use of the term describes applications that use cloud services from two or more cloud providers (e.g., Oracle Database CS and AWS EC2 IaaS) and the complexities that may come along with that use case.
In either case, multi-cloud performance and overall management need to be considered carefully. Performance concerns and considerations tend to apply more to the multi-provider cloud application scenario, but they apply in hybrid scenarios as well. Let’s take a look at how to maximize performance for an application that is hosted in a multi-cloud environment.
The first concern most organizations have when moving to the cloud is security. Once those concerns are abated, the next is usually fear of cloud vendor lock-in. In fact, entire cloud enterprises (often called cloud brokerages or cloud brokers) have been created to solve this. After this, the remaining (large) concern is typically around performance.
The top cloud providers (e.g. Amazon Web Services, Microsoft Azure, IBM Cloud, and so on) do a very good job ensuring their own infrastructure and platform cloud services interoperate seamlessly with predictable performance. However, once you mix cloud services from different vendors, you risk introducing issues around response latency, overall scalability, and end-to-end performance due to round-trip data processing. Let’s examine these one at a time.
Performance Consideration: Latency
Latency is addressed by cloud vendors by closely controlling their network backbone, even across data centers distributed geographically. Addressing multi-cloud performance begins with measuring latency — both the average over time as well as the worst-case. These outliers are most important because they represent a percentage of users who experience the longest response and wait times, and the worst perceived cloud application performance. Latency testing and measurement involve sophisticated planning with real and simulated remote user logins across the regions where your real users reside.
Performance Consideration: Connectivity and Reliability
In general, Internet connectivity and routing can be unpredictable by nature. Often, network performance varies from region to region, across states, and even within a single city, from hour to hour. In the worst case, overall connection reliability can be a concern, as outright outages represent rock-bottom in terms of cloud application performance.
Dedicated Internet connectivity across cloud vendors requires telecom support, such as leased lines and bandwidth guarantees from a commercial Internet provider agreement. Working closely with Internet service providers can help control the routing between geographic regions your application supports, eliminating network hops, and including guaranteed service-level agreements (SLAs) for both reliability and bandwidth.
Performance Consideration: Scalability
Once latency, routing, and bandwidth concerns between providers have been mitigated, the next concern is scalability. Stress testing with application monitoring are needed to ensure that an individual cloud provider can spin up enough instances quickly enough to meet demand coming from services outside of their domain. When it comes to the top-tier cloud providers, choosing the right level of service, with matching SLAs, will address this concern. Choose a matching cloud service from a vendor that offers an SLA that matches your requirements.
Performance Consideration: Data Localization
Moving beyond pure processing scalability, there are data considerations to be made for performance reasons, but also for regulatory and legal reasons. For example, based on application type and local government rules, data may need to be kept in the country it derives in. To meet both performance and regulatory challenges, an option is to keep data either local to the organization (sometimes referred to as a data lake), within a single cloud provider’s database offering, or with different cloud vendors, each chosen to be in the same country as your individual users. Routing requests to cloud providers who have data centers in regions closest to customers, geographically, solves both concerns.
Other Cloud Provider Performance Considerations
For any cloud provider with multi-tenant services, a basic performance concern is characterized as “noisy neighbors.” In any virtualized environment, the performance of a single virtual machine (VM) or container can be affected by other VMs running on the same hardware. A solution is to request dedicated and bare metal infrastructure cloud instances, and choosing a cloud vendor who offers this service can help the decision-making process.
It’s also important to choose cloud providers on more than price and point-by-point features. Consider overall reliability, expertise, and domain familiarity as well. For instance, many organizations have achieved top database performance from Oracle Database; therefore, a move to the Oracle Database Cloud Service might be wise. But moving to Microsoft Azure instead of Oracle’s IaaS offerings is likely a better choice for performance to host your C# code, even though it involves a multi-cloud provider solution.
Hybrid Cloud Performance Considerations
Ground to cloud communication, where on-premises processing is offloaded to public cloud instances, is a good solution to handle spikes in demand and a controlled migration to the cloud. However, this can introduce an entirely new and varying performance profile to your application processing. Some cloud providers offer dedicated connectivity from private data center to cloud (ground to cloud), which is tailored more to their services than dedicated leased telecommunication lines. Oracle FastConnect, AWS Direct Connect, and Microsoft’s Azure options around VPN tunneling and other solutions are examples of choices that will help solve hybrid performance issues.
For Performance, Knowledge is Power
In some ways, a multi-cloud model of cloud deployment can actually help performance. For instance, this approach guards against total outages due to denial-of-service (DoS) attacks against a single cloud vendor. It can also help you avoid inheriting all of the internal performance issues of a single cloud vendor. Netflix famously avoided performance issues and outages by taking a multi-cloud approach to its service offering. In fact, some consider multi-cloud solutions as the future of cloud overall.
The best strategy you can apply to handle all of these performance considerations is one centered on a robust cloud-specific monitoring solution. A tool that offers root-cause analysis will help isolate performance bottlenecks to individual cloud instances and will even drill down to the root causes within them (see Figure 1).
Figure 1 - CA Unified Infrastructure Management offers deep cloud service monitoring insight.
This includes infrastructure monitoring that goes beyond basic capacity issues (for example, storage, CPU, and bandwidth), and includes measuring memory capacity, max container deployments and virtual server instances per physical server, and the reaction and spin-up rates of instances to meet spikes in user demand (see Figure 2).
When public cloud services are involved, be sure to measure performance at the transaction level, looking at what affects your users most directly. It doesn’t matter if your cloud-based database has 100% uptime and low latency if bandwidth or noisy cloud neighbors often affect the experience end-to-end. Instead, measure, report, and visualize how many users are dissatisfied with your quality of service, and why.
Figure 2 - Cloud performance can vary by region, so be sure to measure that way as well.
Finally, look to further break down monolithic applications beyond cloud infrastructure and platform services with a Function-as-a-Service (FaaS) deployment, and scale microservices to an even higher degree. Monitoring tools allow you to see, in real time, which functions are scaling better than others, then dynamically re-route requests and events to deployments on other faster and more scalable cloud providers. This gets you out of the CPU, memory, and server game altogether.