2012-05-24 UPS cascade

From Hypertwins Community
Jump to navigation Jump to search


Cross-posted from Google+

BizA called this morning to say that there was a beeping noise coming from the computer closet.

It was easy enough to guess that this was probably an expiring UPS battery, which in fact turned out to be the case. Unfortunately there was no way to turn off the alarm without turning off the UPS, so I had to power down both servers in order to get it out of the circuit and plug everything directly into the A/C while I got new batteries.

I swear I checked to make sure everything was okay before I left. I remember pulling up a web page from the server, with no problem... but apparently about 5 minutes after I left, Something happened. The first I knew of this was about an hour later, when the office manager called to say that she couldn't get email via Outlook. (Not "nobody can get email", although apparently nobody else could either.)

Since it was Outlook, the problem could have been either with the email server (Linux) or the domain server (Windows) where all the .pst files are kept (Outlook has to have .pst files even when you're using IMAP and not keeping any of the data locally), so I told her to reboot and try again. She did that and called back to say there was no change, so I headed over there.

Unfortunately, this place is about a 15-25 minute drive from where I live, and I had to go back out in about an hour... so I had maybe 10 minutes to diagnose the problem and fix it before I'd have to be gone for another 2 hours.

Somewhere near the 10 minute mark, after I'd been fiddling around with Outlook trying to confirm whether or not it was connecting to the mail server (It's rarely very clear about exactly what it's doing or what happened when it tried), it was casually mentioned that someone else in the office (who was using Thunderbird) wasn't able to get email either...

...which led me immediately back to the server, where I confirmed that it wasn't seeing the network at all.

At that point I had about minus 3 minutes left, so after rebooting the server and trying a couple of other obvious things like switching the cable to a different port, I pulled a network card out of a disused computer and put it in the server... which didn't work.

By that time, apparently I had also done something which made the internet stop working too.

And then I had to go, leaving them with no email or internet for at least an hour.

When I got back, liveCD in hand, I was able to confirm that it wasn't an OS issue (I had allowed a minor update to go through while I was checking things earlier), so then I spent about an hour trying various different settings (including deactivating the onboard LAN card via the BIOS, in case it was conflicting somehow) and moving cables around and just trying all kinds of things to figure out why neither network card (the on-board or the borrowed one) could connect.

(Early on during that hour, I got the internet to work again... I don't quite remember how, but it had something to do with power-cycling the cable router. And then it went out again, though fortunately nobody was in the office at that point. As far as I can tell, the power plug came out of the back of the network switch -- it's in a precarious stack of small network devices which sit on top of the Linux server because there isn't anywhere else to put them. [Cue side-rant about small electronic devices with no mounting holes...])

In desperation, I went out to Intrex to get a new card, in case there was something about the borrowed card that wasn't compatible with the (not new, but still much newer than the desktop I had pulled it from) server.

On a hunch, I got the more expensive PCIe card ($18) instead of the low-end PCI ($8) card. I wasn't even sure the server had a PCIe slot, but I knew it had a slot that was about the right size, so it was probably PCIe... and I wanted something that was as possible from all the other things that were failing to work properly.

I honestly didn't expect that to work, but it did. I was all prepared to go home and steal our Minecraft server to use temporarily for their email server until I could replace their system board. (This is, of course, the real reason we have a Minecraft server.) But no, that fixed it. Just like that.