Resource file service is down, Monday, Jul 06, 2015, 12:12pm EST
The resource file service is currently having some problems. This will affect the images and files on client websites (displaying, downloading, uploading, accessing files in File management).
We're currently working to restore it. I will post more updates as they come in.
[2:10pm] Our System Operations team are still working to resolve the issue.
[4:35pm] We are unable to provide an ETA on this issue, but it is at our highest priority at the moment. We also had reports that some users are unable to open a 30 day trial.
[6:14pm] We're currently running a backup server while we fix the primary server, meaning our clients will have their files restored and functioning, however we recommend NOT making any changes to your site until tomorrow morning (approximately 8-9AM EST). Any file changes can cause conflicts, be unavailable, or lost.
I will continue to post updates as they come. We apologize for the inconvenience this incident may have caused.
(on behalf of Kevin Jervis, Support Manager)
Incident Report: Partial Service Disruption
Incident Date: July 6, 2015
Incident Time: 12:10 PM EST
Incident Number: WA20150706
Description:
On Monday, July 6th, 2015 our resource file server experienced a disruption in service.
Background:
At 12:10 PM EST our network monitoring system detected a malfunction with the file resource server. The file resource server is a device that controls access to separately stored files as part of a multi-user system. The effect was that client websites were impacted as noted below.
Summary of Incident Impact:
- Images not displaying
- Links not working
- File management (uploading and downloading)
- Custom Email templates not available
- Theme customization (CSS, Theme overrides were not applied)
- New trial signups not working
Membership information was not impacted!
Our team of engineers were unsuccessful in repairing the main file resource server and made the decision to execute our recovery procedure and deploy our backup server. Client websites were restored at 4:35 PM EST.
Important note:
Any changes that were made to websites during the outage was not lost. This recovery process to restore all the data was longer than expected and at 02:00 AM EST, July 7th, 2015 all client data was restored to reflect changes during the outage.
Root Cause:
The root cause is still under investigation and we will update our Service notices when the investigation is complete
Remediation Effort to Avoid Future Similar Incidents:
Wild Apricot engineers are in the process of developing a revised design plan to improve our file resource performance that creates maximum availability and faster disaster recovery time. This includes a redesign of our backup schema to provide minimum restore time for all user data.
Client Support Centre:
A recorded message describing the outage was placed on the help desk line as well as updating our “Service notices” in the Wild Apricot forums with a description of the incident. As a result of this outage, the support centre received a significant increase in email request which totaled 199 new tickets and moderate increase in phone calls which totaled 92 during the course of this incident.
We sincerely apologize for this disruption in service and any inconvenience this may have caused
-
Kevin Jervis commented
Wild Apricot (Outage Update)
Incident Report: Partial Service DisruptionIncident Date: July 6, 2015Incident Time: 12:10 PM ESTIncident Number: WA20150706
Description: On Monday, July 6th, 2015 our resource file server experienced a disruption in service.
Background: At 12:10 PM EST our network monitoring system detected a malfunction with the file resource server. The file resource server is a device that controls access to separately stored files as part of a multi-user system. The effect was that client websites were impacted as noted below.
Summary of Incident Impact:Images not displayingLinks not workingFile management (uploading and downloading)Custom Email templates not available Theme customization (CSS, Theme overrides were not applied)New trial signups not working
Membership information was not impacted!
Our team of engineers were unsuccessful in repairing the main file resource server and made the decision to execute our recovery procedure and deploy our backup server. Client websites were restored at 4:35 PM EST.
Important note: Any changes that were made to websites during the outage was not lost. This recovery process to restore all the data was longer than expected and at 02:00 AM EST, July 7th, 2015 all client data was restored to reflect changes during the outage.
Root Cause: The root cause is still under investigation and we will update our Service notices when the investigation is complete
Remediation Effort to Avoid Future Similar Incidents: Wild Apricot engineers are in the process of developing a revised design plan to improve our file resource performance that creates maximum availability and faster disaster recovery time. This includes a redesign of our backup schema to provide minimum restore time for all user data.
Client Support Centre: A recorded message describing the outage was placed on the help desk line as well as updating our “Service notices” in the Wild Apricot forums with a description of the incident. As a result of this outage, the support centre received a significant increase in email request which totaled 199 new tickets and moderate increase in phone calls which totaled 92 during the course of this incident.
We sincerely apologize for this disruption in service and any inconvenience this may have caused -
michael commented
Our System Operations engineers have synchronized all the data with backup server.
That means all files should be in place. It also means there should be no issues with uploading new files (CSS, images, etc.).
The main server has not been fixed yet, so we are still using our backup one. An additional maintenance procedure will be needed.
This will cause short User_Data unavailability again, but it will be planned as a usual maintenance. We’ll provide additional details later and make a notice in forum.