Skip to main content

Best Practices to Resolve Resource Contention in the Cloud

Preventing a slow application caused by probable resource contention requires a rigorous methodological approach and an appropriate toolset. IT managers, working with business owners, should prioritize critical apps for multi-tenancy and maximum performance.

Resource contention is what happens when demand exceeds supply for a shared resource, such as memory, CPU, network or storage. In modern IT, where cost cuts are the norm, addressing resource contention is a top priority. The main concern with resource contention is the performance degradation that occurs as a result.

When two or more transactions are racing for the same resource, one of them will get it and the others will have to wait in line until the resource is available, meanwhile causing user frustration. This problem is not new, considering the common scenario of two processes on the same machine competing for the same physical CPU or memory. Another typical scenario involves two database transactions fighting for I/O on the same physical disk.

Resource contention problems have always been challenging to identify and to fix. Contention issues may come and go, only to return again when performance is most critical.

Here are the three basic steps for IT managers when it comes to resolving resource contention:

- First, IT needs to determine that the performance problems are indeed resource-related.

- Next, is to identify which transactions are competing for resources.

- Finally, to resolve the problem typically involves prioritizing one transaction above the other.

But which should you prioritize? This is a zero-sum game and one party will have to “lose” so ideally linking back to business priorities helps IT make informed decisions in the resolution process.

The Role of Virtualization in Resource Contention

With the advent of virtualization technology and cloud computing, however, resource contention is becoming harder to resolve.

First, there are new places where resource contention may occur. For example, CPU contention now comes in two forms: two processes racing for the same virtual machine CPU, and that virtual machine racing for physical CPU with other virtual machines. Another example is in storage pools, when data is competing for the fast but expensive Flash storage.

Second, environments are becoming more dynamic with virtualization and cloud technologies. As IT makes a transformation to IT-as-a-Service, new resources are constantly being provisioned and consumed. It is not uncommon to provision new VMs for hours with high workloads and then decommission these VMs when the load subsides. Mobile access and BYOD are other factors affecting the dynamic environment, since access patterns are changing and load is becoming less predictable.

Third, automation is a mixed blessing. The vendors of virtualization hardware and software are aware of the resource contention challenge and have introduced automatic algorithms to address it, which move workloads around to distribute the load more evenly and prioritize according to the load they are generating. This approach works well only if the busiest workloads are the most important ones. Yet this is not always the case, so the system prioritizes the less-important transactions at the expense of the more critical ones. Another implication of automation is that IT now has less visibility and less control of the environment.

Let’s revisit the steps for resolving resource contention, and factor in the impact of virtualization and cloud technologies:

1. Identify that the problem is related to resource contention

2. Identify the competitors

3. Prioritize the workloads according to business considerations

The first step is already problematic, since resource contention issues can manifest in any number of ways: what seems to be a large chunk of time spent in the Java tier may actually be a result of the Java VM not getting enough CPU.

The second step is even harder. Analysis of resource contention issues is after-the-fact. By then, the culprits may have already stopped competing, started using other resources or have been decommissioned altogether.

The third step is the hardest, since IT is hard-pressed to prioritize applications if they are unsure which processes/transactions/applications are competing.

Best Practices to Resolve Resource Contention in Virtual Environments

The number of possibilities for resource contention problems and ways to overcome them is substantial. Every IT organization has its own particular landscape and idiosyncrasies. Below, however, are some general guidelines which can be tailored to an organization’s unique needs.

The main considerations are the dynamic and multi-tier characteristics of resource contentions. An efficient approach must include cross-tier views, the ability to baseline and compare historical data and tying the resources to their business users:

Side-by-Side View of Performance Across Multiple Tiers: There are plenty of APM products and services that provide dashboards, but few of these solutions will perform complete end-to-end monitoring from the user’s end device to the storage disk, across physical and virtual infrastructure. To solve resource contention, you need to create a dashboard that collects and displays performance data curated from the various monitoring tools. This gives an indication of which resources are over-utilized and whether their over-utilization trend matches the workload trend of the tiers which access said resources. While not perfect, in a typical setting these matching trends would give you a big clue as to who’s using the resources and the resulting impact on performance.

Baselines and Reference Timeframes: When a performance problem occurs, IT should be able to compare the behavior of all components across the IT stack to their behavior in a previous reference timeframe or baseline. This will help you nail down what’s changed and, as a result, understand why a new performance problem has occurred.

Business Context of Performance: Integrating business context into performance metrics requires knowing, for each resource, which transactions are accessing that resource and when. Having the business context in each tier means that you can segregate performance according to the originating user calls and understand the business implications of each tier. Unfortunately, most APM tools have a technical focus today and do not connect the performance of individual tiers to the business transactions and implications. Hence you may need to technically enable passing some context or token between different tiers, for example by overriding the HTTP protocol between two JVMs to contain the original referring business transaction.

Beyond tools, there are needed changes to the IT culture and organization to ensure reliability and quality of service in cloud computing. The Cloud was supposed to break up the silos within IT, yet clearly those silos are still alive. It may take many years before the full transition to cloud and services-based IT forces down those walls.

What helps measurably for now, is if people from those different areas - the Java, network, database and storage tiers - are able to view the same data around infrastructure performance. Easily accessible and comprehensive data helps teams work together better because it eliminates any finger-pointing as to who should take the blame when users start to complain.

As with most problems in IT, teamwork with highly-skilled problem-solvers is still the best way to solve complex issues. Instead of shooting in the dark, it is time for IT departments to think proactively and strategically about how to resolve and manage resource contention, so that their companies can realize all the flexibility and productivity benefits of virtualization and cloud computing.

ABOUT Assaf Sagi

Assaf Sagi is Director of Product Management at Precise Software Solutions. He has more than 16 years of experience in enterprise software development and management. Prior to Precise, Assaf worked for IBM Research and for an advanced ComSec unit in the Israeli Defense Force.

Related Links:

www.precise.com

Hot Topics

The Latest

An overwhelming majority of IT leaders (95%) believe the upcoming wave of AI-powered digital transformation is set to be the most impactful and intensive seen thus far, according to The Science of Productivity: AI, Adoption, And Employee Experience, a new report from Nexthink ...

Overall outage frequency and the general level of reported severity continue to decline, according to the Outage Analysis 2025 from Uptime Institute. However, cyber security incidents are on the rise and often have severe, lasting impacts ...

In March, New Relic published the State of Observability for Media and Entertainment Report to share insights, data, and analysis into the adoption and business value of observability across the media and entertainment industry. Here are six key takeaways from the report ...

Regardless of their scale, business decisions often take time, effort, and a lot of back-and-forth discussion to reach any sort of actionable conclusion ... Any means of streamlining this process and getting from complex problems to optimal solutions more efficiently and reliably is key. How can organizations optimize their decision-making to save time and reduce excess effort from those involved? ...

As enterprises accelerate their cloud adoption strategies, CIOs are routinely exceeding their cloud budgets — a concern that's about to face additional pressure from an unexpected direction: uncertainty over semiconductor tariffs. The CIO Cloud Trends Survey & Report from Azul reveals the extent continued cloud investment despite cost overruns, and how organizations are attempting to bring spending under control ...

Image
Azul

According to Auvik's 2025 IT Trends Report, 60% of IT professionals feel at least moderately burned out on the job, with 43% stating that their workload is contributing to work stress. At the same time, many IT professionals are naming AI and machine learning as key areas they'd most like to upskill ...

Businesses that face downtime or outages risk financial and reputational damage, as well as reducing partner, shareholder, and customer trust. One of the major challenges that enterprises face is implementing a robust business continuity plan. What's the solution? The answer may lie in disaster recovery tactics such as truly immutable storage and regular disaster recovery testing ...

IT spending is expected to jump nearly 10% in 2025, and organizations are now facing pressure to manage costs without slowing down critical functions like observability. To meet the challenge, leaders are turning to smarter, more cost effective business strategies. Enter stage right: OpenTelemetry, the missing piece of the puzzle that is no longer just an option but rather a strategic advantage ...

Amidst the threat of cyberhacks and data breaches, companies install several security measures to keep their business safely afloat. These measures aim to protect businesses, employees, and crucial data. Yet, employees perceive them as burdensome. Frustrated with complex logins, slow access, and constant security checks, workers decide to completely bypass all security set-ups ...

Image
Cloudbrink's Personal SASE services provide last-mile acceleration and reduction in latency

In MEAN TIME TO INSIGHT Episode 13, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses hybrid multi-cloud networking strategy ... 

Best Practices to Resolve Resource Contention in the Cloud

Preventing a slow application caused by probable resource contention requires a rigorous methodological approach and an appropriate toolset. IT managers, working with business owners, should prioritize critical apps for multi-tenancy and maximum performance.

Resource contention is what happens when demand exceeds supply for a shared resource, such as memory, CPU, network or storage. In modern IT, where cost cuts are the norm, addressing resource contention is a top priority. The main concern with resource contention is the performance degradation that occurs as a result.

When two or more transactions are racing for the same resource, one of them will get it and the others will have to wait in line until the resource is available, meanwhile causing user frustration. This problem is not new, considering the common scenario of two processes on the same machine competing for the same physical CPU or memory. Another typical scenario involves two database transactions fighting for I/O on the same physical disk.

Resource contention problems have always been challenging to identify and to fix. Contention issues may come and go, only to return again when performance is most critical.

Here are the three basic steps for IT managers when it comes to resolving resource contention:

- First, IT needs to determine that the performance problems are indeed resource-related.

- Next, is to identify which transactions are competing for resources.

- Finally, to resolve the problem typically involves prioritizing one transaction above the other.

But which should you prioritize? This is a zero-sum game and one party will have to “lose” so ideally linking back to business priorities helps IT make informed decisions in the resolution process.

The Role of Virtualization in Resource Contention

With the advent of virtualization technology and cloud computing, however, resource contention is becoming harder to resolve.

First, there are new places where resource contention may occur. For example, CPU contention now comes in two forms: two processes racing for the same virtual machine CPU, and that virtual machine racing for physical CPU with other virtual machines. Another example is in storage pools, when data is competing for the fast but expensive Flash storage.

Second, environments are becoming more dynamic with virtualization and cloud technologies. As IT makes a transformation to IT-as-a-Service, new resources are constantly being provisioned and consumed. It is not uncommon to provision new VMs for hours with high workloads and then decommission these VMs when the load subsides. Mobile access and BYOD are other factors affecting the dynamic environment, since access patterns are changing and load is becoming less predictable.

Third, automation is a mixed blessing. The vendors of virtualization hardware and software are aware of the resource contention challenge and have introduced automatic algorithms to address it, which move workloads around to distribute the load more evenly and prioritize according to the load they are generating. This approach works well only if the busiest workloads are the most important ones. Yet this is not always the case, so the system prioritizes the less-important transactions at the expense of the more critical ones. Another implication of automation is that IT now has less visibility and less control of the environment.

Let’s revisit the steps for resolving resource contention, and factor in the impact of virtualization and cloud technologies:

1. Identify that the problem is related to resource contention

2. Identify the competitors

3. Prioritize the workloads according to business considerations

The first step is already problematic, since resource contention issues can manifest in any number of ways: what seems to be a large chunk of time spent in the Java tier may actually be a result of the Java VM not getting enough CPU.

The second step is even harder. Analysis of resource contention issues is after-the-fact. By then, the culprits may have already stopped competing, started using other resources or have been decommissioned altogether.

The third step is the hardest, since IT is hard-pressed to prioritize applications if they are unsure which processes/transactions/applications are competing.

Best Practices to Resolve Resource Contention in Virtual Environments

The number of possibilities for resource contention problems and ways to overcome them is substantial. Every IT organization has its own particular landscape and idiosyncrasies. Below, however, are some general guidelines which can be tailored to an organization’s unique needs.

The main considerations are the dynamic and multi-tier characteristics of resource contentions. An efficient approach must include cross-tier views, the ability to baseline and compare historical data and tying the resources to their business users:

Side-by-Side View of Performance Across Multiple Tiers: There are plenty of APM products and services that provide dashboards, but few of these solutions will perform complete end-to-end monitoring from the user’s end device to the storage disk, across physical and virtual infrastructure. To solve resource contention, you need to create a dashboard that collects and displays performance data curated from the various monitoring tools. This gives an indication of which resources are over-utilized and whether their over-utilization trend matches the workload trend of the tiers which access said resources. While not perfect, in a typical setting these matching trends would give you a big clue as to who’s using the resources and the resulting impact on performance.

Baselines and Reference Timeframes: When a performance problem occurs, IT should be able to compare the behavior of all components across the IT stack to their behavior in a previous reference timeframe or baseline. This will help you nail down what’s changed and, as a result, understand why a new performance problem has occurred.

Business Context of Performance: Integrating business context into performance metrics requires knowing, for each resource, which transactions are accessing that resource and when. Having the business context in each tier means that you can segregate performance according to the originating user calls and understand the business implications of each tier. Unfortunately, most APM tools have a technical focus today and do not connect the performance of individual tiers to the business transactions and implications. Hence you may need to technically enable passing some context or token between different tiers, for example by overriding the HTTP protocol between two JVMs to contain the original referring business transaction.

Beyond tools, there are needed changes to the IT culture and organization to ensure reliability and quality of service in cloud computing. The Cloud was supposed to break up the silos within IT, yet clearly those silos are still alive. It may take many years before the full transition to cloud and services-based IT forces down those walls.

What helps measurably for now, is if people from those different areas - the Java, network, database and storage tiers - are able to view the same data around infrastructure performance. Easily accessible and comprehensive data helps teams work together better because it eliminates any finger-pointing as to who should take the blame when users start to complain.

As with most problems in IT, teamwork with highly-skilled problem-solvers is still the best way to solve complex issues. Instead of shooting in the dark, it is time for IT departments to think proactively and strategically about how to resolve and manage resource contention, so that their companies can realize all the flexibility and productivity benefits of virtualization and cloud computing.

ABOUT Assaf Sagi

Assaf Sagi is Director of Product Management at Precise Software Solutions. He has more than 16 years of experience in enterprise software development and management. Prior to Precise, Assaf worked for IBM Research and for an advanced ComSec unit in the Israeli Defense Force.

Related Links:

www.precise.com

Hot Topics

The Latest

An overwhelming majority of IT leaders (95%) believe the upcoming wave of AI-powered digital transformation is set to be the most impactful and intensive seen thus far, according to The Science of Productivity: AI, Adoption, And Employee Experience, a new report from Nexthink ...

Overall outage frequency and the general level of reported severity continue to decline, according to the Outage Analysis 2025 from Uptime Institute. However, cyber security incidents are on the rise and often have severe, lasting impacts ...

In March, New Relic published the State of Observability for Media and Entertainment Report to share insights, data, and analysis into the adoption and business value of observability across the media and entertainment industry. Here are six key takeaways from the report ...

Regardless of their scale, business decisions often take time, effort, and a lot of back-and-forth discussion to reach any sort of actionable conclusion ... Any means of streamlining this process and getting from complex problems to optimal solutions more efficiently and reliably is key. How can organizations optimize their decision-making to save time and reduce excess effort from those involved? ...

As enterprises accelerate their cloud adoption strategies, CIOs are routinely exceeding their cloud budgets — a concern that's about to face additional pressure from an unexpected direction: uncertainty over semiconductor tariffs. The CIO Cloud Trends Survey & Report from Azul reveals the extent continued cloud investment despite cost overruns, and how organizations are attempting to bring spending under control ...

Image
Azul

According to Auvik's 2025 IT Trends Report, 60% of IT professionals feel at least moderately burned out on the job, with 43% stating that their workload is contributing to work stress. At the same time, many IT professionals are naming AI and machine learning as key areas they'd most like to upskill ...

Businesses that face downtime or outages risk financial and reputational damage, as well as reducing partner, shareholder, and customer trust. One of the major challenges that enterprises face is implementing a robust business continuity plan. What's the solution? The answer may lie in disaster recovery tactics such as truly immutable storage and regular disaster recovery testing ...

IT spending is expected to jump nearly 10% in 2025, and organizations are now facing pressure to manage costs without slowing down critical functions like observability. To meet the challenge, leaders are turning to smarter, more cost effective business strategies. Enter stage right: OpenTelemetry, the missing piece of the puzzle that is no longer just an option but rather a strategic advantage ...

Amidst the threat of cyberhacks and data breaches, companies install several security measures to keep their business safely afloat. These measures aim to protect businesses, employees, and crucial data. Yet, employees perceive them as burdensome. Frustrated with complex logins, slow access, and constant security checks, workers decide to completely bypass all security set-ups ...

Image
Cloudbrink's Personal SASE services provide last-mile acceleration and reduction in latency

In MEAN TIME TO INSIGHT Episode 13, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses hybrid multi-cloud networking strategy ...