2017-07-01: Server Downtime / Database Restore / Lost Posts

Jason · 2017-07-01 1:03 pm

Tonight our database suffered an irreparable data corruption and was unable to be repaired.

Several different methods were attempted to salvage the data, so that nobody had the hard work they put into posting lost. Unfortunately none of these methods were able to bring the database back into a stable state.

We then reviewed our options, which included two different forms of database backups, one from ~4 hours prior to the corruption, and one from ~10 hours. The more current backup was chosen (a text based SQL backup) as it was newer, and being as that it would ensure the database was restored "from the ground up" with no possible corrupted vestiges lurking in the background that might cause another issue down the line, we (slowly) restored everything from this SQL backup.

The good news is the backup appears to have worked perfectly, the bad news is we have lost ~4 hours of posts. We have since also upgraded our database server software to the latest patch level, which hopefully will eliminate whatever was the cause of the occasional database hangs we've experienced for the past few months, as well as tonight's data corruption. It's also possible that the binary database files themselves contained some issues created a long time ago, which then came to a head tonight, in which case restoring from SQL will let us start with a "clean slate". Hopefully this approach will lead us to improved stability in the future.

I will use this chance to perform a review and do a few other upgrades that have been lagging for various reasons.

This post will be updated as more news arrives and as I watch to make sure things "settle in" ok.

It was great to see the first of several layers of backups worked successfully in this emergency situation. However I do extend apologies to those affected and who lost their posts and will do everything I can to ensure this doesn't happen again.

Duck-Twacy · 2017-07-01 2:39 pm

Thanks for the hard work done

wintermute · 2017-07-01 3:01 pm

Thanks Jason, four hours is not too bad really (24 hours is not uncommon in situations like this). I suspect it was not a lot of posts. I've sent you a couple of screen shots which show the front page from just after you got it back online the first time. There were only 32 posts in the three hours prior to that time.

Tony.

gpapag · 2017-07-01 3:02 pm

Thank you Jason!

George

jan.didden · 2017-07-01 3:03 pm

Except that I lost my best post ever! Really! 😡

Just pulling your leg. Well done Jason, you must have shed some sweat.

Jan

zman01 · 2017-07-01 3:26 pm

Jason,

Thanks for the update, and appreciate the hard work to get things up and running again!

wintermute · 2017-07-01 3:37 pm

Below is an almost complete list of what was lost minus a few posts that managed to be made in between Jason posting the original outage message and it going down again (which I didn't screen shot). I've checked and the Led drivers post in the second attachment is still there, but the one above it is not.

Tony.

jackinnj · 2017-07-01 4:21 pm

Tnx for your hard work, jack

Jason · 2017-07-02 4:02 am

Thanks for the kind words and glad as Tony has pointed out it happened in the wee hours of Friday night / Saturday morning for the US/EU, so posting was at a relative lull.

Search

Amplifiers

Source & Line

Loudspeakers

Design & Build

General Interest

Live Sound

Member Areas

Site

Featured Vendors

Members Market

Vendors Market

Vendors

Search

2017-07-01: Server Downtime / Database Restore / Lost Posts

Jason

Duck-Twacy

wintermute

gpapag

jan.didden

zman01

wintermute

Attachments

jackinnj

Jason

2017-07-01: Server Downtime / Database Restore / Lost Posts

Founder

Member

Just another Moderator

Member

Retired

Member

Just another Moderator

Attachments

Member

Founder