Black Wednesday

Blog calendar

RSS feed from Michal Frackowiak's blog

subscribe to the RSS feed

— or —

get my blog posts via email

michal-frackowiakmichal-frackowiak
SquarkSquark
shark797039shark797039
Arotaritei VladArotaritei Vlad
clearekicleareki
RefutnikRefutnik
TRT- Vipul SharmaTRT- Vipul Sharma
Matt GentileMatt Gentile
HirelawyerHirelawyer
Helmut_pdorfHelmut_pdorf
Sven StettnerSven Stettner
michalf23michalf23
leigerleiger
srivercxsrivercx
Joshua DarbyJoshua Darby
lil g easylil g easy
Mr ShaggyMr Shaggy
Chen XXChen XX
Super Dr GreenSuper Dr Green
Co0olCo0ol
Watch: site | category | page

Blog tags

« Back to the Blog

05 Mar 2009 09:21

There are weeks that nothing exciting (or fatal) happens. But there are days that a lot happens that make you think if this is really a coincidence. Yesterday was one of such days, and it was not even Friday 13th. So here is what happened:

  • I stayed at home, Lukasz was at the office. All of sudden network went down. Later we learned that half of our city (that get internet from TP S.A.) had problems.
  • Later Piotr called that our office server is down and cannot boot up. It was one of the disks in RAID 10 array that failed and for some reason GRUB could not boot. It booted later after Piotr did some magic, now we just need to replace one drive asap.
  • At 15.30 local time I got an alert email that Wikidot.com is down. Immediately i tried to log-in to the server - nothing. Ping - yes. Alive. But all services went down.
  • After a few minutes we knew we must act. Piotr started re-assigning IP addresses of the web server to a backup server. Failed. Looks like the router could not handle this in real-time this time.
  • Main server restart - nothing helps. We had a similar issue some time ago, we started the rescue mode (server boots from a rescue linux image, this is greatly automated by SoftLayer). Server is up. A year ago what prevented the system from booting was a forced fsck on one of the drives and this required a key pressed or so (as told by the SoftLayer support team). So we started disk checks. And this took almost an hour! S#*t!
  • Meanwhile my friend called me as his car broke just 20 meters from our parking lot and he could not move it, so I went to help him.
  • Server got up, everything was back to normal. Situation under control.

I am not afraid of fatal Fridays any more. I fear of Wednesdays.


rating: 1, tags:

rating: +1+x

del.icio.usdiggRedditYahooMyWebFurl

Add a New Comment
asdad