Incident Report: 25/01/2024 RDS instance got inaccessible due to "Storage-Full" Status.

  • Print

The following is the incident report for the RDS Instance got inaccessible due to Storage-full  that occurred 25 Jan, 2024. We understand this issue has impacted our valued customers and their employees, and we apologise to everyone who was affected.

 

We understand it’s an ordeal and has been a very frustrating and tiring time for you. But now we have rectified the issue and everything is working fine.

 

Incident Details
Parent Incident #INC_DB_00002
# of Child Incidents 0
Date(s) 25/01/2024
Start Time 11:30.AM IST
End Time 5.00 PM IST

Description of Incident:

One of our newly set up RDS instances became inaccessible on 25th Jan 2024 at 11:30 AM IST. This was brought to our notice immediately as we were migrating some accounts from MySQL 5.7 to this newly setup MySQL 8 RDS.

Root Cause:

We were migrating some accounts from MySQL 5.7 RDS to MySQL 8 RDS. During this activity the newly setup MySQL 8 RDS instance ran out of storage space and due to auto scaling was enabled on this RDS, it got rebooted and tried to increase the storage space and the status changed to "Storage Full". Further we were unable to modify, reboot the instance and the instance went to inaccessible state. 

Resolution & Recovery:

We tried to increase the storage space manually but we were unable to reboot or modify the instance. It was in cooling off state and showing the message that modifies the instance after 6 hrs. 

AWS has a policy that, we can't modify allocated storage for a DB instance if it's been modified in the last six hours.(ref. https://repost.aws/knowledge-center/rds-autoscaling-low-free-storage). So we had to wait till that time, after 6 hours, the instance automatically got rebooted and changed its status to "storage-optimization".

Further it took some more time and within half an hour RDS instance was up and running successfully and instance status changed to "Available" again.

Data Loss:

There was no data loss on RDS. After the 6 hours of cooling period ends RDS instance gets up and running successfully.

Actions and Recommendations:

Though storage autoscaling is already enabled on the RDS, we have now taken below preventative measures to avoid this in future.

1. We shall also manually check the status of all DB instances daily.

2. We have additionally enabled Amazon CloudWatch alarm to monitor the free storage space and notify when the storage space reaches a threshold value so that our team can proactively also increase the space manually if needed.

uKnowva HRMS is committed to continually and quickly improving our technology and operational processes to prevent such incidents. We appreciate your patience and again apologise for the impact to you, your users, and your organization. We thank you for your business and continued support.

In case you face any problems, then please write to This email address is being protected from spambots. You need JavaScript enabled to view it., our awesome support team will surely help you!