AWS Disaster Recovery and Backup

Disaster recovery (DR) focuses on how a workload responds when a disaster occurs, impacting the business. The response aims to minimize data loss (Recovery Point Objective or RPO) and downtime (Recovery Time Objective or RTO). DR planning should consider natural disasters, technical failures, and human actions. AWS Elastic Disaster Recovery (DRS) is a service that helps recover on-premises and cloud-based applications. It utilizes affordable storage, minimal compute, and offers point-in-time recovery.

Disaster Recovery Options in the Cloud

  • Backup and restore: This approach involves replicating data to another AWS region. In a disaster, the infrastructure, configuration, and application code need to be redeployed in the recovery region.
  • Pilot Light: This strategy keeps core infrastructure components (like databases) running in the recovery region, while application servers are “switched off” but ready for quick deployment. This minimizes costs while providing a foundation for rapid recovery.
  • Warm Standby: This strategy involves partially running your application in the recovery region. This reduces recovery time but comes at a higher cost. This information about the warm standby approach is not from the sources provided.
  • Multi-Site Active/Active: In this approach, your application runs actively in multiple regions, providing the highest level of availability and resilience. This information about the Multi-Site Active/Active approach is not from the sources provided.

Shared Responsibility Model

AWS is responsible for the resiliency of the underlying infrastructure (the “Resiliency of the Cloud”). Customers are responsible for configuring and managing the resiliency of their workloads on AWS (the “Resiliency in the Cloud”). The level of customer responsibility varies depending on the AWS services used. For example, with Amazon EC2, the customer manages all aspects of resiliency, while for managed services like S3 and DynamoDB, AWS handles more of the underlying infrastructure.

Key Considerations

  • Recovery Objectives (RTO and RPO): Define acceptable downtime and data loss limits based on business needs.
  • Data Plane vs. Control Plane: Use data plane operations during failover for better availability, as data planes generally have higher availability design goals compared to control planes. Remember: Building a robust disaster recovery strategy involves a combination of these approaches and careful consideration of your specific application requirements.