Skip to main content

3 Steps to Avoid Service Level Disagreements

John Lucania

You ask a friend to "check" on your dog while you're away. Obliging, your friend goes to your house, rings the doorbell to listen for a bark and then returns to their car. However, when you made the request you really wanted your friend to go into the house for a bit, make sure there were no issues and immediately notify you if something was wrong. A perfect case of a poorly negotiated SLA!

What Are SLAs and Why Do We Have Them?

A Service Level Agreement is a contractual agreement between a service provider and a customer regarding the level of service that will be provided. SLAs are beneficial for both parties – they define what is being purchased and also the roles and responsibilities to remediate any issues. A well-constructed SLA strengthens the customer relationship by bridging the gap between the vendor services and customer expectations. With software services, websites and applications becoming increasingly complex, negotiating and adhering to SLAs is more important than ever.

What Do SLAs Typically Cover?

It is very important to keep the SLA simple, measurable and realistic. SLAs typically cover:

■ Description of overall services

■ Service performance metrics

■ Financial aspects of service delivery

■ Responsibilities of service provider and customer

■ Disaster recovery process

■ Review process and frequency of review

■ Termination of agreement process

The specific performance metrics that manage the compliance of service delivery are called Service Level Objectives (SLOs). In the context of web services, SLOs would cover availability, uptime and response time for the service; probably accessibility by geography and problem resolution metrics such as mean time to answer and/or mean time to repair.

Is a service really available if the customer cannot use it? A well-constructed SLA should include a unit of measurement that defines availability to align with the customer's critical business process, and not just the availability of the servers URL/URI or log in process.

Using our doorbell analogy in web services context, a poorly negotiated SLA will ring the doorbell equivalent of looking for the 200 OK from the server. The 200 code, like the dog's bark, will just tell you that someone is home and not the actual condition i.e. health of the service. Checking a website or authenticating without validating the business process you rely on, exposes you to downtime without financial leverage.

Step One: Measure What You Have

What can you, the service provider, do to get most out of SLAs? Let's say you are providing a marketing automation system to an enterprise that will run its global web activities over your system. You have promised them 95% availability and suitable performance from the USA east and west coasts, UK, Germany and India.

Before you commit to an exact performance target, hopefully you have measured what you have now. You need to baseline the performance of your service in order to understand what you can offer. No sense promising 95% availability in India if your system typically only is available 80% of the time in India. However, when it comes to SLAs, under committing can lead to lost business opportunities and lost revenue. You can use your SLA as a competitive advantage, only if you know what you can and cannot deliver. Baselining performance will help you commit not too much, not too little but just right!

Using a synthetic performance monitoring tool, you can baseline your services. Ex. Let's say you want to measure performance of a user log in activity from UK during business hours. You can record this multi-step user transaction and use that script to create a monitor. Next, you can create an SLA for that monitor by setting desired response time and availability objective. A quality synthetic tool will not only see if the service is up and running but also measures the response times and functional correctness from its global monitoring nodes; assuring SLA compliance by comparing the actual performance with SLA objectives.

By observing your monitors in real time , as well as from the SLA summary, you get the realistic and complete picture of your performance.

Step Two: Include What Applies to Your Customer, Exclude the Rest

If your agreement states that you will provide a certain level of service for east coast, west coast, UK, Germany and India, don't provide the data regarding the Netherlands and Africa. You also need to account for operational time for you, clearly mention the descriptions of your maintenance windows and/or upgrades. When building the service-level-agreement, keep in mind the operating periods as well as both ongoing and one-time events.

Customers are getting used to the multi tenancy nature of service providers. So be open to SLA negotiations, however calculate the cost associated with customization and make sure it aligns with your aggregate business interest in that customer. Many times the customer can also be found in over/under demanding situations. Baselining customer's performance requirements will lead to more realistic SLAs and a win-win situation for both parties.

Step Three: Monitor Aggressively

In order to make realistic availability and performance goals and keep them, you have to take enough measurements so that a single failure doesn't skew the overall results.

I want to talk a little bit about the law of large numbers: which is a principle of probability and statistics. The law of large numbers states that as a sample size grows, its mean will get closer and closer to the average of the whole population.

This is an important context for monitoring and setting SLAs. If you run an availability test from 5 locations once an hour, one time, and one of those tests fails. Your availability is down to 80 percent. If you run tests from 10 locations every 5 minutes for an hour that is 50 tests – and if 1 fails then your availability is now 98%! Less aggressive monitoring leaves you vulnerable to an SLA violation for a brief outage.

In conclusion, service level agreements are valuable for you and your customers. These three steps will help you look at SLAs as an opportunity than a restriction.

■ Make the right agreement based on baseline performance

■ Measure the correct things with the correct frequency

■ Take enough measurements to smooth out variability

John Lucania is Senior Sales Engineer at SmartBear Software.

Hot Topics

The Latest

An overwhelming majority of IT leaders (95%) believe the upcoming wave of AI-powered digital transformation is set to be the most impactful and intensive seen thus far, according to The Science of Productivity: AI, Adoption, And Employee Experience, a new report from Nexthink ...

Overall outage frequency and the general level of reported severity continue to decline, according to the Outage Analysis 2025 from Uptime Institute. However, cyber security incidents are on the rise and often have severe, lasting impacts ...

In March, New Relic published the State of Observability for Media and Entertainment Report to share insights, data, and analysis into the adoption and business value of observability across the media and entertainment industry. Here are six key takeaways from the report ...

Regardless of their scale, business decisions often take time, effort, and a lot of back-and-forth discussion to reach any sort of actionable conclusion ... Any means of streamlining this process and getting from complex problems to optimal solutions more efficiently and reliably is key. How can organizations optimize their decision-making to save time and reduce excess effort from those involved? ...

As enterprises accelerate their cloud adoption strategies, CIOs are routinely exceeding their cloud budgets — a concern that's about to face additional pressure from an unexpected direction: uncertainty over semiconductor tariffs. The CIO Cloud Trends Survey & Report from Azul reveals the extent continued cloud investment despite cost overruns, and how organizations are attempting to bring spending under control ...

Image
Azul

According to Auvik's 2025 IT Trends Report, 60% of IT professionals feel at least moderately burned out on the job, with 43% stating that their workload is contributing to work stress. At the same time, many IT professionals are naming AI and machine learning as key areas they'd most like to upskill ...

Businesses that face downtime or outages risk financial and reputational damage, as well as reducing partner, shareholder, and customer trust. One of the major challenges that enterprises face is implementing a robust business continuity plan. What's the solution? The answer may lie in disaster recovery tactics such as truly immutable storage and regular disaster recovery testing ...

IT spending is expected to jump nearly 10% in 2025, and organizations are now facing pressure to manage costs without slowing down critical functions like observability. To meet the challenge, leaders are turning to smarter, more cost effective business strategies. Enter stage right: OpenTelemetry, the missing piece of the puzzle that is no longer just an option but rather a strategic advantage ...

Amidst the threat of cyberhacks and data breaches, companies install several security measures to keep their business safely afloat. These measures aim to protect businesses, employees, and crucial data. Yet, employees perceive them as burdensome. Frustrated with complex logins, slow access, and constant security checks, workers decide to completely bypass all security set-ups ...

Image
Cloudbrink's Personal SASE services provide last-mile acceleration and reduction in latency

In MEAN TIME TO INSIGHT Episode 13, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses hybrid multi-cloud networking strategy ... 

3 Steps to Avoid Service Level Disagreements

John Lucania

You ask a friend to "check" on your dog while you're away. Obliging, your friend goes to your house, rings the doorbell to listen for a bark and then returns to their car. However, when you made the request you really wanted your friend to go into the house for a bit, make sure there were no issues and immediately notify you if something was wrong. A perfect case of a poorly negotiated SLA!

What Are SLAs and Why Do We Have Them?

A Service Level Agreement is a contractual agreement between a service provider and a customer regarding the level of service that will be provided. SLAs are beneficial for both parties – they define what is being purchased and also the roles and responsibilities to remediate any issues. A well-constructed SLA strengthens the customer relationship by bridging the gap between the vendor services and customer expectations. With software services, websites and applications becoming increasingly complex, negotiating and adhering to SLAs is more important than ever.

What Do SLAs Typically Cover?

It is very important to keep the SLA simple, measurable and realistic. SLAs typically cover:

■ Description of overall services

■ Service performance metrics

■ Financial aspects of service delivery

■ Responsibilities of service provider and customer

■ Disaster recovery process

■ Review process and frequency of review

■ Termination of agreement process

The specific performance metrics that manage the compliance of service delivery are called Service Level Objectives (SLOs). In the context of web services, SLOs would cover availability, uptime and response time for the service; probably accessibility by geography and problem resolution metrics such as mean time to answer and/or mean time to repair.

Is a service really available if the customer cannot use it? A well-constructed SLA should include a unit of measurement that defines availability to align with the customer's critical business process, and not just the availability of the servers URL/URI or log in process.

Using our doorbell analogy in web services context, a poorly negotiated SLA will ring the doorbell equivalent of looking for the 200 OK from the server. The 200 code, like the dog's bark, will just tell you that someone is home and not the actual condition i.e. health of the service. Checking a website or authenticating without validating the business process you rely on, exposes you to downtime without financial leverage.

Step One: Measure What You Have

What can you, the service provider, do to get most out of SLAs? Let's say you are providing a marketing automation system to an enterprise that will run its global web activities over your system. You have promised them 95% availability and suitable performance from the USA east and west coasts, UK, Germany and India.

Before you commit to an exact performance target, hopefully you have measured what you have now. You need to baseline the performance of your service in order to understand what you can offer. No sense promising 95% availability in India if your system typically only is available 80% of the time in India. However, when it comes to SLAs, under committing can lead to lost business opportunities and lost revenue. You can use your SLA as a competitive advantage, only if you know what you can and cannot deliver. Baselining performance will help you commit not too much, not too little but just right!

Using a synthetic performance monitoring tool, you can baseline your services. Ex. Let's say you want to measure performance of a user log in activity from UK during business hours. You can record this multi-step user transaction and use that script to create a monitor. Next, you can create an SLA for that monitor by setting desired response time and availability objective. A quality synthetic tool will not only see if the service is up and running but also measures the response times and functional correctness from its global monitoring nodes; assuring SLA compliance by comparing the actual performance with SLA objectives.

By observing your monitors in real time , as well as from the SLA summary, you get the realistic and complete picture of your performance.

Step Two: Include What Applies to Your Customer, Exclude the Rest

If your agreement states that you will provide a certain level of service for east coast, west coast, UK, Germany and India, don't provide the data regarding the Netherlands and Africa. You also need to account for operational time for you, clearly mention the descriptions of your maintenance windows and/or upgrades. When building the service-level-agreement, keep in mind the operating periods as well as both ongoing and one-time events.

Customers are getting used to the multi tenancy nature of service providers. So be open to SLA negotiations, however calculate the cost associated with customization and make sure it aligns with your aggregate business interest in that customer. Many times the customer can also be found in over/under demanding situations. Baselining customer's performance requirements will lead to more realistic SLAs and a win-win situation for both parties.

Step Three: Monitor Aggressively

In order to make realistic availability and performance goals and keep them, you have to take enough measurements so that a single failure doesn't skew the overall results.

I want to talk a little bit about the law of large numbers: which is a principle of probability and statistics. The law of large numbers states that as a sample size grows, its mean will get closer and closer to the average of the whole population.

This is an important context for monitoring and setting SLAs. If you run an availability test from 5 locations once an hour, one time, and one of those tests fails. Your availability is down to 80 percent. If you run tests from 10 locations every 5 minutes for an hour that is 50 tests – and if 1 fails then your availability is now 98%! Less aggressive monitoring leaves you vulnerable to an SLA violation for a brief outage.

In conclusion, service level agreements are valuable for you and your customers. These three steps will help you look at SLAs as an opportunity than a restriction.

■ Make the right agreement based on baseline performance

■ Measure the correct things with the correct frequency

■ Take enough measurements to smooth out variability

John Lucania is Senior Sales Engineer at SmartBear Software.

Hot Topics

The Latest

An overwhelming majority of IT leaders (95%) believe the upcoming wave of AI-powered digital transformation is set to be the most impactful and intensive seen thus far, according to The Science of Productivity: AI, Adoption, And Employee Experience, a new report from Nexthink ...

Overall outage frequency and the general level of reported severity continue to decline, according to the Outage Analysis 2025 from Uptime Institute. However, cyber security incidents are on the rise and often have severe, lasting impacts ...

In March, New Relic published the State of Observability for Media and Entertainment Report to share insights, data, and analysis into the adoption and business value of observability across the media and entertainment industry. Here are six key takeaways from the report ...

Regardless of their scale, business decisions often take time, effort, and a lot of back-and-forth discussion to reach any sort of actionable conclusion ... Any means of streamlining this process and getting from complex problems to optimal solutions more efficiently and reliably is key. How can organizations optimize their decision-making to save time and reduce excess effort from those involved? ...

As enterprises accelerate their cloud adoption strategies, CIOs are routinely exceeding their cloud budgets — a concern that's about to face additional pressure from an unexpected direction: uncertainty over semiconductor tariffs. The CIO Cloud Trends Survey & Report from Azul reveals the extent continued cloud investment despite cost overruns, and how organizations are attempting to bring spending under control ...

Image
Azul

According to Auvik's 2025 IT Trends Report, 60% of IT professionals feel at least moderately burned out on the job, with 43% stating that their workload is contributing to work stress. At the same time, many IT professionals are naming AI and machine learning as key areas they'd most like to upskill ...

Businesses that face downtime or outages risk financial and reputational damage, as well as reducing partner, shareholder, and customer trust. One of the major challenges that enterprises face is implementing a robust business continuity plan. What's the solution? The answer may lie in disaster recovery tactics such as truly immutable storage and regular disaster recovery testing ...

IT spending is expected to jump nearly 10% in 2025, and organizations are now facing pressure to manage costs without slowing down critical functions like observability. To meet the challenge, leaders are turning to smarter, more cost effective business strategies. Enter stage right: OpenTelemetry, the missing piece of the puzzle that is no longer just an option but rather a strategic advantage ...

Amidst the threat of cyberhacks and data breaches, companies install several security measures to keep their business safely afloat. These measures aim to protect businesses, employees, and crucial data. Yet, employees perceive them as burdensome. Frustrated with complex logins, slow access, and constant security checks, workers decide to completely bypass all security set-ups ...

Image
Cloudbrink's Personal SASE services provide last-mile acceleration and reduction in latency

In MEAN TIME TO INSIGHT Episode 13, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses hybrid multi-cloud networking strategy ...