Inventive Dingo forums Inventive Dingo forums
News:
 
*
Welcome, Guest. Please login or signup. November 24, 2024, 12:18:40 pm


Login with username, password and session length


Pages: [1]
  Print  
Author Topic: Downtime log  (Read 24831 times)
Chris
Administrator
*****
Posts: 410


Developer


View Profile WWW
« on: February 21, 2009, 05:10:47 am »

In the spirit of being as open as possible, here is a log of all the server downtime we've had that I'm aware of. Most people don't publish this stuff (in the interests of not making themselves look bad), so you're often left wondering why you couldn't access the server at X time. I prefer not to be left in the dark, myself, and perhaps some of you feel the same way. Hence, this official downtime thread.

Also, hopefully it's entertaining for you guys to read about my battles with tech gremlins. Embrace the schadenfreude! Grin

-------
Original post:

If anyone was getting a 500 Internal Server Error when trying to access the website or the Internet games list during the past hour or so, it's fixed now. Dynamic (run-time) linking spontaneously decided it didn't want to play ball, causing complete failure of most of the programs on the server. A server reboot magically cured everything.

We (ato, who is a genius at Linux stuff, and myself) have no idea why this happened. Embarrassed  Our best theories are some kind of memory corruption bug in Xen, which is normally really stable. Let's hope there's no repeat performance.

That's Xen as in the server virtualisation software, not Xen as in the alien planet from Half-Life. The downtime was not caused by a resonance cascade. As far as we know.



* our_server_room_does_not_look_like_this.jpg (37.9 KB, 620x465 - viewed 2644 times.)
« Last Edit: October 04, 2009, 11:27:09 am by Chris » Logged
Chris
Administrator
*****
Posts: 410


Developer


View Profile WWW
« Reply #1 on: March 06, 2009, 12:22:11 am »

Server was partially down for the past 15 minutes or so, after the server spontaneously decided to remount its file system read-only. We're looking into it. At first glance it appears to be a Xen bug with a known workaround, so this should be preventable in future.
Logged
Chris
Administrator
*****
Posts: 410


Developer


View Profile WWW
« Reply #2 on: March 07, 2009, 12:26:10 am »

And again. This is getting tedious! I've put in a support ticket and expect to have the issue resolved soon.

Update: Resolved, for good. Linode customer support rocks.
« Last Edit: July 01, 2009, 02:56:12 am by Chris » Logged
Chris
Administrator
*****
Posts: 410


Developer


View Profile WWW
« Reply #3 on: March 07, 2009, 04:59:28 am »

Server was down for maintenance to fix the read-only-mount problem for a period of 47 minutes, about an hour ago. Hopefully the above issue should be fixed now.
« Last Edit: July 01, 2009, 02:56:34 am by Chris » Logged
Chris
Administrator
*****
Posts: 410


Developer


View Profile WWW
« Reply #4 on: July 01, 2009, 02:53:53 am »

The forums were broken recently thanks to a SMF bug. Fixed now, obviously.

The bug is that the forum settings file (which contains important information like database passwords) can sometimes be spontaneously wiped blank if a database error occurs. This has been a known bug in SMF since at least 2007. The SMF team calls it a PHP bug and claims that there's no foolproof way to work around it. While this is broadly true, in my opinion they really shouldn't be writing database error codes to Settings.php, which is the usual cause of this problem. (The other possible cause is two people changing admin settings simultaneously.)

I've restored the settings file from a backup, and set it read-only to prevent this from happening again.
Logged
Chris
Administrator
*****
Posts: 410


Developer


View Profile WWW
« Reply #5 on: July 02, 2009, 02:33:09 am »

OK, so I guess I kind of asked for that - about 12 hours after my last post, the server spontaneously went totally haywire, for reasons I have yet to discover. Undecided  As a result, it wasn't accepting connections for some time. Weird.
Logged
Chris
Administrator
*****
Posts: 410


Developer


View Profile WWW
« Reply #6 on: July 14, 2009, 07:34:02 am »

Forums and news and a few other things were broken as of a few hours ago, thanks to the server running out of disk space. Resolved for now.
Logged
Chris
Administrator
*****
Posts: 410


Developer


View Profile WWW
« Reply #7 on: September 01, 2009, 04:54:47 am »

Site was down for about 60 hours just now. Apparently some random internet loser decided to prove their supreme lack of machismo, skill, ethics, brains, etc. by launching an unprovoked DDoS (Distributed Denial of Service) attack against my server. Hope it was good for you too, man.

If it happens again, I'm going to track him down to his (parent's) house and then let the dingo loose. Grin

Sorry for any inconvenience, all!
Logged
Kumlekar
Eats planets for breakfast
****
Posts: 140



View Profile
« Reply #8 on: September 01, 2009, 08:23:02 pm »

I hope you don't have to get inventive with him...
Logged

What is Six Times Nine
Forty-Two!

Jp may have played mayhem before it was cool, but I play while its cool! *

* "Cool" is defined as the period of time in which Kumlekar plays a game.
Chris
Administrator
*****
Posts: 410


Developer


View Profile WWW
« Reply #9 on: September 02, 2009, 03:57:41 am »

 Cheesy

Mine is an inventive laugh.
« Last Edit: September 02, 2009, 04:00:57 am by Chris » Logged
Chris
Administrator
*****
Posts: 410


Developer


View Profile WWW
« Reply #10 on: October 04, 2009, 11:18:42 am »

In case anyone noticed the series of short outages we've been having - we had a few more incidents of the server getting hacked and hijacked by nefarious malware, as the server was behind on its security updates (mea culpa!). Probably unrelated to the earlier DDoS, just random botnet activity. The first outbreak actually managed to suck over 200GB in bandwidth before I noticed. Yikes. Not quite sure what it was doing, but my main theories are participating in an outbound DDoS (retrospective karma, anyone?) or trying to break into more boxes.

ato has kindly rebuilt the server from scratch, copied over all the data from the old one, and instituted some new security measures to prevent a recurrence. Everything seems to be working and we'll be keeping our eyes peeled... actually, eww. Let me rephrase that: We'll be keeping our eyes open for breakages and more infestations, and do let me know if you spot anything broken and I'll jump on it ASAP.

Some of the backups we used to rebuild the server were slightly out of date, but I don't think we lost much. Except that eztrezet will have to re-register on the forums. Sorry eztrezet! Nothing personal! Embarrassed
Logged
Chris
Administrator
*****
Posts: 410


Developer


View Profile WWW
« Reply #11 on: November 29, 2009, 05:06:39 am »

Found our first broken something: Somewhere in the ongoing process of hardening the server against attack, the program that sends email was being blocked from running. Oops! This affected both the forums and the update downloading functionality. I've adjusted some security policies and it appears to be working now.
Logged
Chris
Administrator
*****
Posts: 410


Developer


View Profile WWW
« Reply #12 on: January 07, 2010, 09:58:51 am »

Just upgraded the database server, causing a few minutes of downtime. Seems like it all went smoothly apart from that.

Oh, and happy new year everyone. Grin
Logged
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.11 | SMF © 2006-2009, Simple Machines LLC

Valid XHTML 1.0! Valid CSS! Dilber MC Theme by HarzeM