Disaster Management

What is RTO and RPO in Disaster Recovery Planning?

Mark Lukehart

Jan. 7, 2021

What are RTO and RPO?

RTO and RPO are both critical benchmarks for a Disaster Recovery Plan. RTO (Recovery Time Objective) is a measure of application downtime that can be tolerated without any significant harm to the business. Whereas, RPO (Recovery Point Objective) is based on the amount of data that can be lost without causing irreparable damage to the business, essentially dictating the acceptable time interval between backups.

Knowing RTO and RPO is indicative of a business’s preparedness to recover from a disaster. While it may be tempting to set low RTO and RPO, not all businesses, especially SMBs, can afford that. SMBs operating in the finance or banking sector cannot tolerate service unavailability or data loss. It is reasonable for them to have an RTO of under an hour and an RPO of 10 minutes. But the same will be overkill for a small software development company. Losing a day’s worth of code ( 24 hour RPO) or 6 hours of network downtime ( 6 Hour RTO) will not put them out of business, but investing all their resources in an expensive disaster recovery solution could. So it is important first to understand RTO and RPO and learn how to calculate them for optimum disaster recovery costs.

Understanding RTO and RPO

RTO and RPO are both seemingly similar but entirely different metrics of a comprehensive and effective Business Continuity Plan (BCP) or a Disaster Recovery Plan (DRP). Understanding these concepts and their main difference is equally important for enterprises with dedicated business continuity teams and SMB owners with cross-functional teams. Otherwise, you may end up with a strategy that fails to deliver optimal outcomes or exceeds your budget and resources.

So, here’s a comparison table to help you understand how RTO and RPO target different objectives within a Disaster Recovery Plan. Recovery Time Objective (RTO)

Focuses on app/service unavailability.
Provides acceptable downtime for apps, services, and operations.
Requires looking into the future to estimate the time it may take for the servers to be up and running after an outage.
Helps in prioritizing business-critical applications and resource allocation.

Recovery Point Objective (RPO)

Focuses on data loss.
Provides the acceptable amount of irretrievably lost data.
Requires looking back in the past to estimate the amount of data lost from the time of a disaster up until the last backup.
Helps in determining the frequency of data backups based on data-priority.

It is important to note that RTO and RPO are not directly or indirectly related to each other. This means that lower RTO does not necessitate a lower or higher RPO. They’re both independent metrics. It is very much possible to set a high RTO (more acceptable downtime) for less critical applications and business operations while keeping a low RPO (frequent backups) and vice versa.

Is Zero RTO and RPO the Goal?

Well, you could strive to achieve zero downtime and enable continuous replication. That is if you’re living in an ideal world with virtually endless resources at your disposal. While achieving near-zero RTO and RPO is possible, it should not always be the goal. The goal is to set RTO and RPO based on your business requirements, budget, and application priority.

It makes little sense to exhaust your resources in ensuring instant failover services for low-priority applications when your business can easily tolerate a few hours of downtime and data loss without any significant impact. Instead, it’s better to reserve those resources only for those rare mission-critical applications that need high-availability.

The goal is to optimize your recovery objectives to be in-line with your business needs and available resources. A successful disaster recovery strategy should strike that perfect balance between setting a viable RPO and RTO and resource utilization.

RTO (Recovery Time Objective)

What is RTO?

Techopedia defines RTO as the maximum desired length of time allowed between an unexpected failure or disaster and the resumption of normal operations and service levels.
In simple terms, recovery time objective, or RTO, is the maximum amount of time that a particular system, network, or application can remain unavailable in the event of a disaster without causing an unbearable or significant loss to a company’s reputation or bottom line.
Measured in units of time—seconds, minutes, hours, days, and even weeks—RTO is more than just an indication of time between service loss and retrieval. It is also indicative of the efforts and resources that IT must allocate for possibly recovering each app or service after a potential outage.

Your goal should be to set RTO based on the potential revenue loss and app priority. Remember, more resources are required for shorter RTO. For less critical apps, RTO can be several hours. Whereas, the mission-critical applications may require you to set an RTO in seconds. For these rare applications, you can consider investing in failover services from a third-party provider. Critical operations can seamlessly shift to a standby server or network within seconds while the IT team restores the primary system after a failure.

Consider the example of an e-commerce store dealing with sales and transactions worth millions of dollars each hour during the peak holiday season. Even a few seconds of downtime can result in opportunity loss and a bad customer experience. Keeping their web servers up and running at all times should be a business priority for E-commerce websites. And as such, they should plan for a near-zero RTO and invest in clustering and standby servers.

If you’re leveraging the services of a cloud vendor or a managed service provider, you must include your RTO and RPO in the service level agreements (SLAs).

Calculating RTO

RTO must be calculated as part of your Business Impact Analysis (BIA). But before you calculate RTO, consider the following:

Your IT team’s capability to restore a system.
The minimum possible time required for the recovery process.
Your Service Level Agreements (SLAs) with your customers.

These will help you realistically determine an RTO that can be met by your IT team or service providers. Now, consider all of your databases, applications, computer systems, and network. And determine the potential business loss associated with the inaccessibility of each.

Based on the above factors, you can determine RTO for different applications or groups of applications.

RPO (Recovery Point Objective)

What is RPO?

We’ve already discussed RTO in detail. Business functions and applications that deal with data require RPO in addition to RTO. The Recovery Point Objective, or RPO, is based on how much data a business can afford to lose in the event of a significant disruption or an outage. RPO is also measured in units of time and dictates the frequency of data backups. So, an RPO of 4 hours would mean that all disk volumes must be backed up every 4 hours.

Just like RTO, RPO must be determined based on loss toleration and data priority. And IT can devise a data recovery strategy accordingly. For critical application data, consider a zero or near-zero RPO. To achieve this, you may have to implement a solution like continuous data replication across geographically distributed data centers. Not only would this back up data in real-time, but it will also offer data protection in the event of complete hardware failure at a particular site.

For example, any loss of data in a busy healthcare facility can have drastic consequences. Patient data and medical records are critical and irreplaceable. Therefore, healthcare IT infrastructure requires high resilience data availability and security. Zero or near-zero RPO seems to be the only viable option for healthcare disaster recovery solutions.

Calculating RPO

Just like RTO, RTO is also determined as part of your Business Impact Analysis (BIA). For calculating RPO, you must consider:

The amount of data you can afford to lose
The cost of data recovery solutions and available resources
Your SLAs with your customers

It is easy to consider a minimum possible RPO for each business process and application. However, it will be resource-intensive and costly.

Consider all of your business functions and applications that deal with data. And determine an acceptable threshold for data loss for each. Next, your IT personnel can choose the backup frequency and put a data recovery plan in place to make sure that a loss event stays tolerable.

FAQ

What are RTO and RPO in AWS?

RTO and RPO in AWS are the key metrics that come under the availability and disaster recovery plan of workloads and applications running in the AWS cloud. Organizations can set these parameters based on their business needs. They help them in determining backup and redundancy requirements for their AWS workloads.

What are RTO and RPO in SQL Server?

RTO for SQL server is the measure of time for which an SQL database can remain inaccessible without causing considerable harm to the business. RPO in SQL Server refers to the point in time to which the data can be recovered should a database disruption occur.

How do you get zero RTO and RPO?

Achieving zero RTO and RPO requires synchronous mirroring. It involves simultaneously writing data to secondary storage in addition to a primary one. In the event of an outage, the business processes can shift to the secondary setup. However, attaining zero RTO and RPO is not viable in most scenarios.