Tonight our database suffered an irreparable data corruption and was unable to be repaired.
Several different methods were attempted to salvage the data, so that nobody had the hard work they put into posting lost. Unfortunately none of these methods were able to bring the database back into a stable state.
We then reviewed our options, which included two different forms of database backups, one from ~4 hours prior to the corruption, and one from ~10 hours. The more current backup was chosen (a text based SQL backup) as it was newer, and being as that it would ensure the database was restored "from the ground up" with no possible corrupted vestiges lurking in the background that might cause another issue down the line, we (slowly) restored everything from this SQL backup.
The good news is the backup appears to have worked perfectly, the bad news is we have lost ~4 hours of posts. We have since also upgraded our database server software to the latest patch level, which hopefully will eliminate whatever was the cause of the occasional database hangs we've experienced for the past few months, as well as tonight's data corruption. It's also possible that the binary database files themselves contained some issues created a long time ago, which then came to a head tonight, in which case restoring from SQL will let us start with a "clean slate". Hopefully this approach will lead us to improved stability in the future.
I will use this chance to perform a review and do a few other upgrades that have been lagging for various reasons.
This post will be updated as more news arrives and as I watch to make sure things "settle in" ok.
It was great to see the first of several layers of backups worked successfully in this emergency situation. However I do extend apologies to those affected and who lost their posts and will do everything I can to ensure this doesn't happen again.
Several different methods were attempted to salvage the data, so that nobody had the hard work they put into posting lost. Unfortunately none of these methods were able to bring the database back into a stable state.
We then reviewed our options, which included two different forms of database backups, one from ~4 hours prior to the corruption, and one from ~10 hours. The more current backup was chosen (a text based SQL backup) as it was newer, and being as that it would ensure the database was restored "from the ground up" with no possible corrupted vestiges lurking in the background that might cause another issue down the line, we (slowly) restored everything from this SQL backup.
The good news is the backup appears to have worked perfectly, the bad news is we have lost ~4 hours of posts. We have since also upgraded our database server software to the latest patch level, which hopefully will eliminate whatever was the cause of the occasional database hangs we've experienced for the past few months, as well as tonight's data corruption. It's also possible that the binary database files themselves contained some issues created a long time ago, which then came to a head tonight, in which case restoring from SQL will let us start with a "clean slate". Hopefully this approach will lead us to improved stability in the future.
I will use this chance to perform a review and do a few other upgrades that have been lagging for various reasons.
This post will be updated as more news arrives and as I watch to make sure things "settle in" ok.
It was great to see the first of several layers of backups worked successfully in this emergency situation. However I do extend apologies to those affected and who lost their posts and will do everything I can to ensure this doesn't happen again.
Last edited:
Thanks Jason, four hours is not too bad really (24 hours is not uncommon in situations like this). I suspect it was not a lot of posts. I've sent you a couple of screen shots which show the front page from just after you got it back online the first time. There were only 32 posts in the three hours prior to that time.
Tony.
Tony.
Except that I lost my best post ever! Really! 😡
Just pulling your leg. Well done Jason, you must have shed some sweat.
Jan
Just pulling your leg. Well done Jason, you must have shed some sweat.
Jan
Below is an almost complete list of what was lost minus a few posts that managed to be made in between Jason posting the original outage message and it going down again (which I didn't screen shot). I've checked and the Led drivers post in the second attachment is still there, but the one above it is not.
Tony.
Tony.
Attachments
Thanks for the kind words and glad as Tony has pointed out it happened in the wee hours of Friday night / Saturday morning for the US/EU, so posting was at a relative lull.
- Status
- Not open for further replies.
- Home
- Site
- Forum Problems & Feedback
- 2017-07-01: Server Downtime / Database Restore / Lost Posts