Tuesday, March 31, 2009

Update from GoGrid Founders

It has been a long couple of days here at GoGrid. We are hearing from many of you that you want more information, more often. Engineers are often working so hard to fix an issue that you they don't give you enough visibility into what is going on.

In that spirit of improved communication and transparency, I want to continue sending updates via email.

While some of you have been unscathed by the network attacks and portal issues over the past few days, we know that many of you are frustrated at the downtime and impacts to your Internet infrastructure. We are frustrated as well. We've been in the hosting business for over 8 years now, and have generally been able to prevent most incidents from impacting customers as heavily as this attack did.

ONGOING DDoS ATTACK
Our network is currently the target of a large, distributed DDoS attack that began on Monday afternoon.   We took action all day yesterday to mitigate the impact of the attack, and its targets, so that we could restore service to GoGrid customers.  Things were stabilized by Monday, March 30, 2009 16:00 PDT and most customer servers were back online, although some of you continued to experience intermittent loss in network connectivity.

We had a maintenance window scheduled for Monday, March 30, 2009 21:00 PDT to do a major expansion of GoGrid's capacity and roll out some minor feature improvements and bug fixes. Because this maintenance window required the portal being down and support cases would have to be opened by phone, we considered postponing the maintenance to a time when things were calmer.

In the end, the decision was made to proceed with the maintenance because this capacity expansion had been planned for several months and would give us more flexibility in ensuring low utilization across our infrastructure. In hindsight this may have been a poor decision because the maintenance took longer to complete and the maintenance window had to be expanded by several hours.

ROUTING ISSUES THIS MORNING
We spent the night cleaning up servers that were still down, reboots that did not happen properly, and other issues, and continued to develop plans to establish a long term solution to this ongoing issue.

Beginning early Tuesday, March 31, 2009 PDT, our support team began to get more and more reports of servers that were unreachable from certain parts of the Internet. All of these servers were pingable and accessible from our testing connections outside the GoGrid network, but not to all locations worldwide. There appeared to be a routing issue with some networks not properly announcing GoGrid routes. Some of your web sites appeared offline to most or all of your own customers, while many were unaffected.

CURRENT STATUS
The routing issue was resolved around Tuesday, March 31, 2009 11:00 PDT.  Our network engineers localized the problem to an issue with our border routers improperly announcing some routes.  The issue was resolved by clearing our BGP cache on our border routers.  We are not certain at this time the root cause of the issue, and are continuing to investigate and will provide an RFO soon to customers who opened Cases.  We suspect the issue had something to do with the changes we implemented in an emergency maintenance window, yesterday, as part of our efforts to mitigate the DDoS attack.

If you are continuing to see any connectivity issues with your GoGrid servers, we ask that you run a traceroute to your servers' IP address so you can provide it to our support staff when logging an issue at http://my.GoGrid.com.

We appreciate your patience during this difficult time, and thank you for being a GoGrid customer.

Regards,
David Hecht
Co-Founder & Chief Marketing Officer

Resolved -- GoGrid Network Connectivity Issues

** UPDATE @ 13:16 PDT **
At this time we are no longer experiencing network connectivity issues with the public GoGrid Network Infrastructure. While the problem does seem to be resolved, we will not label it as such until a full Root Cause Analysis is complete. Please stand by for further updates.

** UPDATE @ 11:01 PDT **
We are making progress on the network connectivity issue which is related to specific ISP's and Routes coming into the GoGrid Network. You may notice that your systems are becoming accessible, though we are not noting this as "fixed" yet. If you are still experiencing a failure, please take the time to run a traceroute to your systems IP and send that to the GoGrid Support Staff

Customer Impact
We are currently experiencing an issue where our network is inaccessible from certain parts of the world and other networks. We are treating this as a top priority and expect resolution shortly. If you are opening a Case for this issue please try and include a traceroute so that we can use that information to help us isolate the issue.

Details
On Friday, March 27,2009 11:10 PDT, and again yesterday, Monday, March 30,2009 12:25 PDT, GoGrid suffered a series of large scale distributed denial of service (DDoS) attacks that affected the network connectivity of many GoGrid servers.

These network attacks were of a type that we had not seen before, and which our automated network attack prevention hardware was unfortunately unable to prevent.

We are still experiencing network connectivity issues resulting from the attacks, and a number of GoGrid servers remain unreachable from certain source networks.  Our network engineers are working continuously to resolve the problem.

Monday, March 30, 2009

Completed -- GoGrid Maintenance

** Update **
Maintenance was extended and completed at 06:45 PDT.  API and GoGrid Portal are now available.

Maintenance Window:
Monday, March 30, 2009 21:00 - 04:45 PDT
(Tuesday, March 31, 2009 04:00 - 11:45 GMT)

Customer Impact:
The GoGrid customer portal and API access will be unavailable for up to 4 hours while we upgrade the infrastructure. However, there will be no impact to existing customer servers and load balancers during this time.

Maintenance Details:
GoGrid is making software improvements to https://my.gogrid.com to support increased demand.

If there are any additional questions before, during, or following the maintenance please direct them to GoGrid Support at 1-877-9Go-Grid (946-4743) or 415-869-7444 option 2.

Resolved -- Add/Delete/Restart/Start Slowness

** Update @ 18:00 PDT **
Queue cleared and service is back to normal

Window
Monday, March 30, 2009 16:48 - current PDT
(Monday, March 30, 2009 23:48 - current GMT)

Customer Impact
Many customers are affected

Details
Due to our recent DDoS attack we have noticed a lengthy queue build up with start, restart, and delete jobs.  This is an unexpected side-effect of mitigating the DDoS attack.  We are working to resolve this issue and will update this post again.

Resolved -- GoGrid Network Connectivity

** Update @ 17:35 PDT **
Thank you for your patience while GoGrid network engineers troubleshoot a DDoS attack on the GoGrid network.  A recent QoS policy put in place is affecting legitimate traffic at this time and we're working to resolve the problem.  An RFO is still pending and will be following when available.

Window
Monday, March 30, 2009 12:22 - current PDT
(Monday, March 30, 2009 19:22 - current GMT)

Customer Impact
This event is affecting some customers

Details
We are currently experiencing a network issue that is affecting part of our GoGrid infrastructure. Our network team is working hard to get your servers back online and more information will be forthcoming as soon as it is available.  GoGrid virtual machines are still on and still running.

If there are any additional questions before, during, or following this event please direct them to GoGrid Support by opening a support case at https://my.gogrid.com.

Friday, March 27, 2009

RFO - Network Service Disruption 3/27 @ 19:05 PDT (GMT -7)

Window
Friday, March 27, 2009 19:05 - 19:13 PDT
(Saturday, March 28, 2009 02:05 - 02:13 GMT)

Customer Impact
This event affected many customers

Details
While performing a proactive maintenance on the infrastructure to increase security and improve resilience to malicious attacks against our network, an incorrect configuration command was entered on the border routers which started to gradually block legitimate traffic. As soon as the maintenance engineer was notified by our monitoring tool, the erroneous configuration was reverted, and connectivity was restored.

RESOLVED - Temporary Network Access Issue

***UPDATE: The issue listed below has been resolved. A detailed Reason for Outage (RFO) will be sent to customers affected.***

Some of our clients are experiencing a lack of network connectivity to their Virtual Machines. This is due to network difficulties only and does not impact the integrity or uptime status of the Virtual Machine itself.

We apologize for this inconvenience and have made this issue the highest priority.

If there are any additional questions before, during, or following the maintenance please direct them to GoGrid Support at 1-877-9Go-Grid (946-4743) or 415-869-7444 option 2.

Wednesday, March 25, 2009

Resolved -- Customer Portal (my.gogrid.com)

Some customers are experiencing intermittent behavior on the my.gogrid.com portal, where the website is inaccessible for brief periods of time. Our engineers are actively looking at this issue and expect to resolve problems shortly. Please note this does not impact your virtual machines or load balancers in any way. Thank you.

Wednesday, March 18, 2009

Completed -- System Maintenance

Maintenance Window (Re-scheduled):

Friday, March 20th, 01:00 - 06:00 PDT
(Friday, March 20th, 08:00 - 13:00 GMT)

Customer Impact:
The customer service portal my.gogrid.com and order/billing systems will be unavailable.

Maintenance Details:
We will be performing internal system upgrades to improve the customer experience.

If there are any additional questions before, during, or following the maintenance please direct them to GoGrid Support at 1-877-9Go-Grid (946-4743) or 415-869-7444 option 2.

Friday, March 6, 2009

Completed -- Cloud Storage Maintenance

Maintenance Window:
Tuesday, March 10, 2009 21:00 - 22:00 PDT
(Wednesday, March 11, 2009 04:00 - 05:00 GMT)

Customer Impact:
GoGrid Cloud Storage will be unavailable for up to 20 minutes.

Maintenance Details:
GoGrid is doing some maintenance work on the Cloud Storage network to improve performance and scalability.

If there are any additional questions before, during, or following the maintenance please direct them to GoGrid Support at 1-877-9Go-Grid (946-4743) or 415-869-7444 option 2. You may also open a support case at https://my.gogrid.com.

Tuesday, March 3, 2009

Completed -- Generator Run

Maintenance Window:
Thursday, March 5, 2009 20:00 - 00:00 PDT
(Friday, March 6, 2009 04:00 - 08:00 UTC)

Customer Impact:
There will be no customer impact.

Maintenance Details:
GoGrid will be performing a generator run and all power will be swapped over from city utility to our generators for up to four hours.

If there are any additional questions before, during, or following the maintenance please direct them to GoGrid Support at 1-877-9Go-Grid (946-4743) or 415-869-7444 option 2.  You may also open a support case at http://my.gogrid.com.