Preventing a slow application caused by probable resource contention requires a rigorous methodological approach and an appropriate toolset. IT managers, working with business owners, should prioritize critical apps for multi-tenancy and maximum performance.
Resource contention is what happens when demand exceeds supply for a shared resource, such as memory, CPU, network or storage. In modern IT, where cost cuts are the norm, addressing resource contention is a top priority. The main concern with resource contention is the performance degradation that occurs as a result.
When two or more transactions are racing for the same resource, one of them will get it and the others will have to wait in line until the resource is available, meanwhile causing user frustration. This problem is not new, considering the common scenario of two processes on the same machine competing for the same physical CPU or memory. Another typical scenario involves two database transactions fighting for I/O on the same physical disk.
Resource contention problems have always been challenging to identify and to fix. Contention issues may come and go, only to return again when performance is most critical.
Here are the three basic steps for IT managers when it comes to resolving resource contention:
- First, IT needs to determine that the performance problems are indeed resource-related.
- Next, is to identify which transactions are competing for resources.
- Finally, to resolve the problem typically involves prioritizing one transaction above the other.
But which should you prioritize? This is a zero-sum game and one party will have to “lose” so ideally linking back to business priorities helps IT make informed decisions in the resolution process.
The Role of Virtualization in Resource Contention
With the advent of virtualization technology and cloud computing, however, resource contention is becoming harder to resolve.
First, there are new places where resource contention may occur. For example, CPU contention now comes in two forms: two processes racing for the same virtual machine CPU, and that virtual machine racing for physical CPU with other virtual machines. Another example is in storage pools, when data is competing for the fast but expensive Flash storage.
Second, environments are becoming more dynamic with virtualization and cloud technologies. As IT makes a transformation to IT-as-a-Service, new resources are constantly being provisioned and consumed. It is not uncommon to provision new VMs for hours with high workloads and then decommission these VMs when the load subsides. Mobile access and BYOD are other factors affecting the dynamic environment, since access patterns are changing and load is becoming less predictable.
Third, automation is a mixed blessing. The vendors of virtualization hardware and software are aware of the resource contention challenge and have introduced automatic algorithms to address it, which move workloads around to distribute the load more evenly and prioritize according to the load they are generating. This approach works well only if the busiest workloads are the most important ones. Yet this is not always the case, so the system prioritizes the less-important transactions at the expense of the more critical ones. Another implication of automation is that IT now has less visibility and less control of the environment.
Let’s revisit the steps for resolving resource contention, and factor in the impact of virtualization and cloud technologies:
1. Identify that the problem is related to resource contention
2. Identify the competitors
3. Prioritize the workloads according to business considerations
The first step is already problematic, since resource contention issues can manifest in any number of ways: what seems to be a large chunk of time spent in the Java tier may actually be a result of the Java VM not getting enough CPU.
The second step is even harder. Analysis of resource contention issues is after-the-fact. By then, the culprits may have already stopped competing, started using other resources or have been decommissioned altogether.
The third step is the hardest, since IT is hard-pressed to prioritize applications if they are unsure which processes/transactions/applications are competing.
Best Practices to Resolve Resource Contention in Virtual Environments
The number of possibilities for resource contention problems and ways to overcome them is substantial. Every IT organization has its own particular landscape and idiosyncrasies. Below, however, are some general guidelines which can be tailored to an organization’s unique needs.
The main considerations are the dynamic and multi-tier characteristics of resource contentions. An efficient approach must include cross-tier views, the ability to baseline and compare historical data and tying the resources to their business users:
Side-by-Side View of Performance Across Multiple Tiers: There are plenty of APM products and services that provide dashboards, but few of these solutions will perform complete end-to-end monitoring from the user’s end device to the storage disk, across physical and virtual infrastructure. To solve resource contention, you need to create a dashboard that collects and displays performance data curated from the various monitoring tools. This gives an indication of which resources are over-utilized and whether their over-utilization trend matches the workload trend of the tiers which access said resources. While not perfect, in a typical setting these matching trends would give you a big clue as to who’s using the resources and the resulting impact on performance.
Baselines and Reference Timeframes: When a performance problem occurs, IT should be able to compare the behavior of all components across the IT stack to their behavior in a previous reference timeframe or baseline. This will help you nail down what’s changed and, as a result, understand why a new performance problem has occurred.
Business Context of Performance: Integrating business context into performance metrics requires knowing, for each resource, which transactions are accessing that resource and when. Having the business context in each tier means that you can segregate performance according to the originating user calls and understand the business implications of each tier. Unfortunately, most APM tools have a technical focus today and do not connect the performance of individual tiers to the business transactions and implications. Hence you may need to technically enable passing some context or token between different tiers, for example by overriding the HTTP protocol between two JVMs to contain the original referring business transaction.
Beyond tools, there are needed changes to the IT culture and organization to ensure reliability and quality of service in cloud computing. The Cloud was supposed to break up the silos within IT, yet clearly those silos are still alive. It may take many years before the full transition to cloud and services-based IT forces down those walls.
What helps measurably for now, is if people from those different areas - the Java, network, database and storage tiers - are able to view the same data around infrastructure performance. Easily accessible and comprehensive data helps teams work together better because it eliminates any finger-pointing as to who should take the blame when users start to complain.
As with most problems in IT, teamwork with highly-skilled problem-solvers is still the best way to solve complex issues. Instead of shooting in the dark, it is time for IT departments to think proactively and strategically about how to resolve and manage resource contention, so that their companies can realize all the flexibility and productivity benefits of virtualization and cloud computing.
ABOUT Assaf Sagi
Assaf Sagi is Director of Product Management at Precise Software Solutions. He has more than 16 years of experience in enterprise software development and management. Prior to Precise, Assaf worked for IBM Research and for an advanced ComSec unit in the Israeli Defense Force.
Modernization projects using an incremental and continuous improvement model achieve superior results when compared to other project-based approaches including the ripping and replacing of core business applications, according to the CHAOS2020 Report from Micro Focus and Standish Group ...
Enterprise IT infrastructure never ceases to evolve, as companies continually re-examine and reimagine the network to incorporate new technology advancements and meet changing business requirements. But network change initiatives can be costly and time-consuming without a proactive approach to ensuring the right data is available to drive your initiatives ...
Data can be hard — knowing where to get it, where to store it, and most importantly, how to use it, are all questions enterprises need to answer. For most companies, this is an ongoing process in which multiple factors and challenges have arisen. In the Actian Datacast 2020: Hybrid Data Trends Snapshot, we shed light on the challenges of cloud migration and how organizations are leveraging data ...
With the COVID-19 pandemic causing economic disruptions all over the world, business organizations are further pressed to accelerate their migration to the cloud. As recovery begins and enterprises resume operations, experts expect to see increased spending on cloud services ...
Following up the list of Application Performance Management Predictions, APMdigest also asked IT industry experts for their 2021 network performance predictions. The results span 5G, NPM, SD-WAN and more ...
Gartner highlighted the six trends that infrastructure and operations (I&O) leaders must start preparing for in the next 12-18 months ...
As the global pandemic continues, it has become increasingly clear that companies across every industry are planning the "next normal" of their workplace with a much longer-term view. They have moved from serially extending temporary work-from-home (WFH) arrangements to establishing permanent policies focused on empowering people to WFE — work-from-everywhere ...
The New Year means it is time for DEVOPSdigest's annual list of DevOps predictions. Industry experts offer thoughtful, insightful, and often controversial predictions on how DevOps and related technologies will evolve and impact business in 2021 ...
Industry experts offer thoughtful, insightful, and often controversial predictions on how APM and related technologies will evolve and impact business in 2021. Part 6, the final installment in the series, covers ITSM ...
Industry experts offer thoughtful, insightful, and often controversial predictions on how APM and related technologies will evolve and impact business in 2021. Part 5 covers the ITOps team ...