The Server North


Home Main Site NOC/Info Site Twitter Stream Archives Login
You are currently viewing archive for September 2009
That was quick. Looks like the 'blog is up and running already. Yay.

Posted by: Myke |

Webmail is now properly encrypted. Again, making no good excuses for taking so long, our webmail access is now properly protected with a proper/legitimate SSL certificate.

We're working on getting a proper cert on the actual IMAP/POP/SMTP angle, but that's actually a lot of work, because the server uses Cryptlib instead of OpenSSL, and pemtrans isn't working. Also, the stuck-mail-sending problem has yet to be properly resolved, though it's been worked around to some degree - it's happening with a much lower frequency now.

Posted by: Myke |

E-Mail glitches For the past few days, our mail-server has been acting up. Mail isn't being lost, but it's getting stuck.
We've contact the vendor of the software and they've been logged in and trying to catch the problem. (Benefits of paying for support on a open-source product.)

We're aware, we're sorry, and we're hoping it'll be better very soon.

Posted by: Myke |

2 Hour Outage this evening. We were partially offline for about 2 hours because a power-cord in one of our two core switches (the old one at that!) became dislodged this evening.
Because we have 2-of all our critical gear (2 core routers, 2 ADSL routers, 2 switches, 2.5 power-rails, 2+ Uplink/Transits) we weren't completely dead in the water, but unfortunately all of our ADSL customers were offline, as well many of our hosted & some VPS clients were down - this is because all those machines/services connect solely to the Gigabit switch.

How do we fix or prevent this problem? Well, in some ways we can't, and in others there's no point. The cause of tonight's failure has already been rectified (power cable rerouted/untangled), but the other factor to consider is that in almost 5 years of operation, we've never experienced a catastrophic switch failure before - so you have to weigh the likelihood of comparable failure vs. the risk of complications while trying to mitigate said failure.
We're far more likely to suffer a port or port-group failure rather than a complete switch failure again. If we go and get things like our ADSL L2TP connect doubled up, we'll drive up costs dramatically in terms of new cross-connects & their monthly costs, port-fees, needing more switch capacity - and finally - some sort of system to juggle the redundancy... which is remarkably not simple - especially when you consider most failures aren't catastrophic. More often 'failure' is in the form of a cable going bad, errors on the port, or auto-negotiation doing the wrong thing, or worse yet: human error.

Now, a number of clients & services did survive because we were able to connect them to both switches. Some of that was out of necessity (eg: a server needs lots of bandwidth, or participates on many network segments or doesn't support VLANs), and some of it is for redundancy. In the case of the VPSes, many of the servers are maxxed out for network ports and they have to be on Gig... as for the ADSL L2TP, we've only got one way to bring it in...

But you never know... we're resourceful... we might hack something stable together anyway :)

Sorry for the disruption this evening, we've got a good list of things that broke and things we can fix and we'll be getting through it in the next few days.

Myke Geiger

Posted by: Myke |

Upgrades & Maintenance This Weekend Starting Friday evening we'll be doing a series of upgrades and infrastructure changes... namely replacing some network and power equipment. This unfortunately does mean there's going to be some outages & restarts & disruption. We'll do our best to minimize how long they'll be of course. Hopefully everything will be completed between 7PM Friday and 3AM Saturday, if not, the work will continue with notice posted here.

Posted by: Myke |

Powered by NucleusCMS | Ported by VinhBoy | Designed by DemusDesign