A recent power outage outage at an Amazon AWS data facility and the resulting data loss for some customers shows that storing data in the cloud does not mean you do not also need a backup.
This came to light after a tweet from author/programmer Andy Hunt went viral as he reminded people that hardware failure can happen anywhere and that hosting data in the cloud does not automatically make it safe
On August 31st, 2019, an Amazon AWS US-EAST-1 datacenter in North Virginia experienced a power failure at 4:33 AM, which led to the datacenter’s backup generators to kick on. Unfortunately, these generators started failing at approximately 6:00 AM , which led to 7.5% of the EC2 instances and EBS volumes becoming unavailable.
“1:30 PM PDT At 4:33 AM PDT one of ten data centers in one of the six Availability Zones in the US-EAST-1 Region saw a failure of utility power. Our backup generators came online immediately but began failing at around 6:00 AM PDT. This impacted 7.5% of EC2 instances and EBS volumes in the Availability Zone. Power was fully restored to the impacted data center at 7:45 AM PDT. By 10:45 AM PDT, all but 1% of instances had been recovered, and by 12:30 PM PDT only 0.5% of instances remained impaired. Since the beginning of the impact, we have been working to recover the remaining instances and volumes. A small number of remaining instances and volumes are hosted on hardware which was adversely affected by the loss of power. We continue to work to recover all affected instances and volumes and will be communicating to the remaining impacted customers via the Personal Health Dashboard. For immediate recovery, we recommend replacing any remaining affected instances or volumes if possible.”
After the power was restored, Amazon determined that some EC2 instances and EBS volumes incurred hardware damage and the data stored on them were no longer recoverable.
Amazon Elastic Block Store is an Amazon service that allows you to create block-level storage volumes that can then be attached to Amazon EC2 virtual machine instances as storage.
After being affected by this outage, Hunt told BleepingComputer that he found the whole experience frustrating as he “kept getting nonsense from Amazon” for days as he tried to get status updates.
“Our engineers are currently investigating the affected instances such as yours, so this will take some on their end to investigate the ongoing issues will all instances affected by this incident. Feel free to message us for an update. However, since there is no ETA at the moment, please keep in mind that we won’t have any information until the engineers have done their investigation on their end, which can take awhile. Let us know if you have any further questions or concerns.”
Finally on September 3rd, Hunt was told that his data could not be recovered.
“due to the damage from the power event, the EBS servers underlying these volumes have not recovered. After further attempts to recover these volumes, they were determined to be unrecoverable.”
For Hunt, this loss of data was not catastrophic as he had working backups to restore from, but for other who may rely on Amazon’s EBS advertised features of redundancy and durability, the loss of data could mean big problems.
Always perform backups, regardless of where data is stored
Hunt’s experience is a good lesson for anyone who hosts their data in the cloud.
No matter what features are being advertised by a service, it is always important to incorporate a secondary backup strategy for your data.
For example, Amazon EBS advertises itself as being “designed to protect against failures by replicating within the Availability Zone (AZ), offering 99.999% availability and an annual failure rate (AFR) of between 0.1%-0.2%. “
Even with these advertised features, Amazon protects themselves by specifically stating that they will only issue credits for loss of service availability and that they are not responsible for data loss.
“As part of using Amazon EC2, you agree that your Amazon EC2 resources may be terminated or replaced due to failure, retirement or other AWS requirement(s). We have no liability whatsoever for any damages, liabilities, losses (including any corruption, deletion, or destruction or loss of data, applications or profits), or any other consequences resulting from the foregoing. “
Amazon is not alone. For example, DropBox states that they offer “120 days of file recovery” for all their plans, including the free one. To most users this would mean that they would not need to worry about accidental deletions or hardware damage as the data is being backed up.
Even with this feature in place, DropBox states that they too are not responsible for loss of data.
At most, users who experience data loss will receive a couple of months of credit for their loss, while they potentially lose far more due to the data loss.
The reality is that hardware failure happens no matter how well-designed a service or facility is and it is important to be prepared for any eventuality.
Even after the experience Hunt went through, he admits that “Now, in Amazon’s defense, we’ve hosted this app and data here for many years without incident.”
So be smart and invest in a secondary backup provider for any mission-critical data in the event of loss. Furthermore, this backup should be hosted at a completely different provider that does not share any facilities with your primary data hosting provider in order to add true redundancy.
Article by channel:
Everything you need to know about Digital Transformation
The best articles, news and events direct to your inbox
Read more articles tagged: Cloud