Tuesday, March 31, 2009

Update from GoGrid Founders

It has been a long couple of days here at GoGrid. We are hearing from many of you that you want more information, more often. Engineers are often working so hard to fix an issue that you they don't give you enough visibility into what is going on.

In that spirit of improved communication and transparency, I want to continue sending updates via email.

While some of you have been unscathed by the network attacks and portal issues over the past few days, we know that many of you are frustrated at the downtime and impacts to your Internet infrastructure. We are frustrated as well. We've been in the hosting business for over 8 years now, and have generally been able to prevent most incidents from impacting customers as heavily as this attack did.

ONGOING DDoS ATTACK
Our network is currently the target of a large, distributed DDoS attack that began on Monday afternoon.   We took action all day yesterday to mitigate the impact of the attack, and its targets, so that we could restore service to GoGrid customers.  Things were stabilized by Monday, March 30, 2009 16:00 PDT and most customer servers were back online, although some of you continued to experience intermittent loss in network connectivity.

We had a maintenance window scheduled for Monday, March 30, 2009 21:00 PDT to do a major expansion of GoGrid's capacity and roll out some minor feature improvements and bug fixes. Because this maintenance window required the portal being down and support cases would have to be opened by phone, we considered postponing the maintenance to a time when things were calmer.

In the end, the decision was made to proceed with the maintenance because this capacity expansion had been planned for several months and would give us more flexibility in ensuring low utilization across our infrastructure. In hindsight this may have been a poor decision because the maintenance took longer to complete and the maintenance window had to be expanded by several hours.

ROUTING ISSUES THIS MORNING
We spent the night cleaning up servers that were still down, reboots that did not happen properly, and other issues, and continued to develop plans to establish a long term solution to this ongoing issue.

Beginning early Tuesday, March 31, 2009 PDT, our support team began to get more and more reports of servers that were unreachable from certain parts of the Internet. All of these servers were pingable and accessible from our testing connections outside the GoGrid network, but not to all locations worldwide. There appeared to be a routing issue with some networks not properly announcing GoGrid routes. Some of your web sites appeared offline to most or all of your own customers, while many were unaffected.

CURRENT STATUS
The routing issue was resolved around Tuesday, March 31, 2009 11:00 PDT.  Our network engineers localized the problem to an issue with our border routers improperly announcing some routes.  The issue was resolved by clearing our BGP cache on our border routers.  We are not certain at this time the root cause of the issue, and are continuing to investigate and will provide an RFO soon to customers who opened Cases.  We suspect the issue had something to do with the changes we implemented in an emergency maintenance window, yesterday, as part of our efforts to mitigate the DDoS attack.

If you are continuing to see any connectivity issues with your GoGrid servers, we ask that you run a traceroute to your servers' IP address so you can provide it to our support staff when logging an issue at http://my.GoGrid.com.

We appreciate your patience during this difficult time, and thank you for being a GoGrid customer.

Regards,
David Hecht
Co-Founder & Chief Marketing Officer

No comments: