What is an SLA?
A service-level agreement (SLA) is a legally enforceable contract between a service provider and its customers. It documents the level of service expected from the service provider and outlines the expectations and requirements the customer has. An SLA defines the metrics by which the quality of services is measured, the penalties that may be incurred should the agreed-on service levels not be achieved, and resolutions to any SLA failures.
The penalties for a business resulting from SLA failures can be high. Downtime has a significant impact on a company’s bottom line, including higher labor costs, lost customers, lost revenue, and the possibility of paying SLA penalties, making it is an important metric to monitor and keep an eye on. An ITIC survey found that the hourly cost of commonly measured SLA metrics like downtime can exceed $300,000 for 91 percent of SMEs and large enterprises.
SLAs are a crucial component for any business, but especially so for customer-service provider partnerships in the technology sphere. They can be designed to support both parties in these partnerships. However, standard SLAs provided by service providers are often generic, applied broadly to all their customers, and may sometimes be one-sided. Therefore, businesses need to employ legal advisors to ensure the inclusion of customer-specific interests in an SLA, for instance, escape clauses.
Types of SLAs
There are three main types of SLAs: corporate, customer, and service. These can be combined to create multi-level agreements.
A corporate-based SLA defines the level of service for specific business requirements, like security, across an organization. A customer-based SLA describes the services a specific customer, for instance, the finance department in an organization, may use. A service-based contract covers a single service, for instance, the use of an application, provided by a service provider to a customer or corporation. Most organizations use multiple types of SLAs to create a multi-level or hierarchical SLA structure.
Services may be provided by external providers or internally between departments in an organization. For instance, an organization’s IT department may draw up an SLA between itself and another department to provide hardware and support services.
What is an SLO?
A service-level objective (SLO) is an agreement in an SLA about a specific, quantifiable metric, like system availability or response time. An SLO defines SLA metrics, for instance, that uptime shall be 99.99%.
SLAs are used externally to define an agreement between the two parties. SLOs are objectives that are measured internally to evaluate whether an SLA benchmark is being met. SLOs are tracked by measuring specific service-level indicators (SLIs).
What is an SLI?
An SLI measures compliance with an SLO. An SLI is an internal measurement that provides actual metrics to compare with the SLA. For instance, if the level of uptime is specified as 99.99% and the actual uptime, the SLI, is 99.5%, it means the agreed level of service has not been met.
Unlike SLAs, SLOs do not carry contractual consequences, which are penalties that occur when benchmarks specified in an SLA are not met. SLAs are legally enforceable.
Start monitoring your network with PRTG and see how it can make your network more reliable and your job easier.
What is included in an SLA?
An SLA may include a service description, service objectives, contract terms and termination clauses, and performance targets and service-level metrics. To ensure a means of accountability, an SLA defines responsibilities and roles for both the customer and provider. Crucial to creating an enforceable SLA are procedures for issue management, dispute resolution, and security breaches. An SLA is only effective if processes for tracking, monitoring, measuring, and reporting on metrics are in place.
The three main areas of SLA metric measurement are system performance, response performance, and customer satisfaction. The first two measurements impact end-user satisfaction.
Successful SLAs are ones that achieve defined performance and response metrics and result in meeting end-users’ expectations.
The prioritizing of SLA metrics will be different for different types of services. For instance, uptime and availability would be most important for a web hosting service, while call response times and resolution rates would be most relevant for a call center.
What performance metrics are measured?
Common performance metrics include latency, packet loss, throughput, bandwidth, network error rate, network response rate, uptime, reliability, issue resolution rate, cost metrics (for instance, measured by resource utilization and operational expenses), backup and recovery success rates, incident alerts, number of compliance breaches, resource usage trends, software bugs, security breaches, and the performance of infrastructure components.
How are customer satisfaction levels measured?
An important metric for businesses to consider is customer satisfaction feedback as this falls in line with the high-level goal of an SLA: to retain old and engage new customers. Aligned with this is measuring the number of “self-serve resolutions”, or situations where customers resolve their support tickets using tools and information provided on a company’s website.
Penalties and service-level credits
Penalties are specified in an SLA so that if the service provider does not meet their contractual obligations, the customer may be compensated for business disruption and loss of productivity. Compensation may take the form of financial penalties or service credits, like extended license terms or additional customer support from a provider. However, customers need also to weigh up the cost of legal remediation against financial losses; for this reason, an SLA should include processes for managing disputed issues.
To resolve disputed issues, tracked and measured metrics using an SLA reporting tool can be used as evidence of an SLA failure.
The figures for the cost of SLA penalties differ, but they are always high.
In 2011, Virgin Blue (now Virgin Australia) made reservations systems vendor Navitaire pay almost 15 million dollars as damages when their SLA agreement was breached. Navitaire outages had lasted 11 days, affecting thousands of Virgin Blue passengers.
Why are SLAs used?
Originally, the reason for creating SLAs in the IT industry was to establish expectations and accountability for service providers, particularly when outsourcing services. Today, most businesses have SLAs in place to ensure customer expectations are met, to indemnify themselves from the consequences of system failures, and to enforce accountability for poor service levels. SLAs reduce misunderstandings between service providers and customers and serve as a yardstick for measuring and improving service performance.
People have high expectations from digital businesses and demand high standards, like uptime, speed, availability, and ease of use. Businesses include SLAs in their contracts to lay out the expectations and responsibilities for themselves and their service providers so that they can provide high levels of service to their customers. These standardized agreements are used in many industries, including marketing, e-commerce, and SEO.
The goal when establishing a metric is not to wave a stick, but to motivate positive behavior on the part of the service provider. Therefore, when creating an SLA, a baseline is established with measurements defined for reasonable and attainable performance levels.
SLAs are not the same as terms and conditions (T&Cs). SLAs focus on service delivery and standards, while T&Cs focus on the terms of sale, licensing, and use.
SLAs answer questions like: How quickly will an issue be resolved? How is customers’ data protected? When will a system be available?
SLAs in network monitoring
SLAs were originally used mainly by network service providers to improve system performance and meet customer expectations in the digital space.
Metrics that need to be measured and analyzed in network monitoring include network response times, system availability, issue resolution times, customer satisfaction, and performance-related metrics. Performance- and response-related metrics include resource utilization, code error rates, throughput, application response times, data security, and mean time to respond (MTTR).
One of the main foci of network monitoring is system availability. It includes diverse factors, like how long it takes to recover a system, whether data has been compromised in a cyber attack, and whether the business has suffered reputational damage.
Measuring, tracking, and reporting on SLA metrics
SLA metrics can be difficult to measure, track, and report on. For instance, if a business promises to resolve an issue within an hour, it may not be taken into account that the help desk does not have all the information they need to diagnose the problem, for instance, if the customer who logged the problem has gone on vacation.
Automated network monitoring can help, allowing service providers and IT teams to resolve issues before they become a problem for customers.
Service providers
SLA reporting ensures service providers are accountable for delivering an agreed level of service. SLA reporting also protects service providers from potential scope creep in their customers’ demands and helps resolve contractual disputes.
Developers
SLA reports help developers identify compliance breaches and system errors. They help developers pinpoint system bottlenecks and areas for improvement.
Management
SLA reports help finance departments, DevOps, and management teams to strategize, plan, and make decisions with useful information about resource allocation, capacity, and potential for scaling the business.
SLA metrics and key performance indicators (KPIs) are not the same. SLAs define contractual obligations, whereas KPIs track performance metrics for continuous improvement. SLAs can be used to identify areas where improvements in system performance may be made, but it is not their main focus.
Start monitoring and reporting on SLAs with PRTG and see how it can make your network more reliable and your job easier.
Who uses SLAs?
Most organizations that have or use services have SLAs. While they typically exist with external vendors, it is also possible that an organization uses SLAs between departments. SLAs are implemented wherever an issue could potentially arise, however small.
They are used by numerous types of service providers, including network service providers, cloud service providers, managed service providers (MSPs), and data centers. Digital businesses, e-commerce services, and businesses offering security monitoring applications all implement SLAs.
SLAs are not one-size-fits-all agreements and must be negotiated to fit the needs and requirements of an organization.
Where are SLAs used?
Contracts between businesses and service providers usually have an associated SLA to avoid deliberate or unintentional misinterpretation.
SLAs are used where there is a need to formalize the relationship between service providers and their customers. Initially used in IT environments, they are also relevant in industries like marketing, sales, and SEO, where businesses often rely on external agencies. For example, online marketplaces employ SEO consultants to improve their search engine rankings.
Key features
SLAs cover two main areas, service and management. Service elements may include acceptable standards of services, excluded services, and cost/service tradeoffs.
Management elements may include a dispute resolution process, an indemnification clause protecting the customer from third-party litigation in the event of service-level breaches, and a method for updating the agreement. SLAs are a work in progress, where provider offerings and capabilities, and business requirements may often change. For instance, more reliable equipment or more reliable SLA monitoring tools could make better service guarantees possible. To take into consideration strategic business and technical changes, SLAs should be revised regularly.
SLA benefits
Reference point for performance
SLAs enable businesses to maintain service standards, identify service gaps, and add new business requirements as a company scales up or adds new features to their systems.
SLAs ensure compliance with relevant industry regulations, set penalties for poor performance, and may offer incentives for excellent performance.
A performance matrix can help to foster customer loyalty by meeting customer expectations.
SLAs ensure business continuity by laying out processes to follow in the event of a disaster, and describe best practices.
SLAs hold service providers accountable for their commitments. For instance, an SLA may specify that a customer may reduce the provider's fee by a given percentage if an uptime of 99.99% is not achieved as evidenced by the company’s SLI.
An SLA can be used as a reference point for resolving disputed issues between businesses and their service providers, and between businesses and their customers.
Monitoring SLAs
Monitoring SLAs help service providers to detect potential problems early, allowing them to take proactive corrective action.
The best way to keep tabs on performance is to deploy an end-to-end network monitoring tool, like with Paessler PRTG. This tool allows organizations to set up and monitor business services, covering user, infrastructure, and application services. The services are based on sensors that monitor sensors associated with devices or components. SLOs are defined for these sensors, for instance, what are acceptable failure rates for a device and what priority is the reliability of a device. If an SLO value is reached, the associated SLA may be considered breached and remedial action taken.
Monitoring tools give employees in different roles visibility into system performance through user-friendly GUIs.
SLA limitations
External factors may disrupt a service and influence SLA performance and should be covered in an SLA. These factors include cyber attacks, weather conditions (e.g. damage to equipment from flooding or fire), unusual user behavior (e.g. high volumes during competitions or sales on an e-commerce platform), and changes in compliance regulations.
Measuring too many metrics can be counterproductive and may lead to too many SLA breaches. SLAs are meant to cover core processes and having too many may dilute the impact of one, confusing the provider about what to prioritize. Too many metrics may divert attention from mission-critical services to those of a lpwer priority. Whether or not a service provider can control whether a metric performs as specified must be taken into consideration.
Where there are separate SLAs for cross-vendor processes, it may be difficult to hold vendors accountable for parts that are out of their scope.
Changes to an SLA may negatively impact support teams, making it impossible to fulfill their obligations unless they have been notified about the changes in time and been properly trained.
Standard SLAs are good at measuring IT processes and metrics, but they don’t measure end-user satisfaction.
The answer may be a combination of SLAs and experience-level agreements (XLAs) to factor in the quality of the end-user experience.
SLA reporting
Paessler PRTG SLA Reporter is a powerful tool that allows you to keep track of your SLAs. It allows organizations to select elements from the PRTG Device Tree and define what monitoring objects are used to monitor an SLA. Monitoring objects include items like sensors, probes, and devices.
Once up and running, which can be done in five easy steps, PRTG SLA Reporter generates SLA reports automatically. One of the top features of PRTG SLA Reporter is that it allows IT teams to retroactively re-classify outages as planned or unplanned, and refine their SLAs as requirements change.
References
- https://itic-corp.com/tag/hourly-cost-of-downtime/#:~:text=ITIC's%202021%20Hourly%20Cost%20of,(SMEs)%20and%20large%20enterprises.
- https://obkio.com/blog/sla-monitoring-and-reporting/#:~:text=SLA%20Monitoring%20(or%20Service%2DLevel,the%20agreed%20upon%20SLA%20requirements.
- https://seo.ai/faq/service-level-agreement-sla
- https://www.atlassian.com/incident-management/kpis/sla-vs-slo-vs-sli
- https://www.cio.com/article/274740/outsourcing-sla-definitions-and-solutions.html
- https://www.gartner.com/smarterwithgartner/how-to-measure-customer-experience
- https://www.givainc.com/blog/index.cfm/2023/2/22/what-are-key-sla-metrics-help-measure-performance
- https://www.givainc.com/blog/index.cfm/2023/2/22/what-are-key-sla-metrics-help-measure-performance
- https://www.itweb.co.za/article/the-sneaky-sla/YKzQenqjLrZ7Zd2r
- https://www.nobl9.com/resources/an-easy-way-to-explain-slos-slas-to-biz-execs
- https://www.pagerduty.com/resources/learn/what-is-slo-sla-sli/#:~:text=SLAs%20vs.-,SLOs%3A%20What's%20the%20Difference%3F,prevent%20from%20breaking%20the%20SLA.
- https://www.smh.com.au/business/virgin-blue-navitaire-reach-agreement-20110404-1cw18.html
- https://www.solarwinds.com/service-desk/use-cases/service-level-management
- https://www.techtarget.com/searchitchannel/definition/service-level-agreement
- https://www.top.legal/en/knowledge/sla-metrics
- https://www.zenoss.com/blog/the-cost-of-downtime