Websolr Status

Wed Mar 15 2017 23:26:00 GMT+0000 (Coordinated Universal Time)

Latency reports in US East

Mar 15, 23:26 UTC
Resolved - We have identified the source of the network bottleneck and addressed it. We will be conducting a RCA to better understand the failure case.

Mar 15, 23:16 UTC
Investigating - Looking into reports of latency for some indices in US-East.


Wed Mar 29 2017 14:20:00 GMT+0000 (Coordinated Universal Time)

Elevated 503s for some users in US-East

Mar 29, 14:20 UTC
Resolved - The node has been repaired and is confirmed to be serving traffic normally.

Mar 29, 14:10 UTC
Identified - We have identified an unhealthy node in the region and are working to resolve.

Mar 29, 14:00 UTC
Investigating - We are investigating reports of persistent HTTP 503 errors in the US-East region.


Mon Apr 24 2017 18:54:23 GMT+0000 (Coordinated Universal Time)

Elevated 503s for some users in US-East

Apr 24, 18:54 UTC
Resolved - Service has been restored to the impacted server.

Apr 24, 18:45 UTC
Identified - We have been automatically paged to respond to a server issue in the US-East region. Resolution will be forthcoming over the next several minutes.


Thu Apr 27 2017 07:02:00 GMT+0000 (Coordinated Universal Time)

Elevated 503s for some Cobalt/Staging indices in US East

Apr 27, 07:02 UTC
Resolved - The incident has been resolved.

Apr 27, 06:55 UTC
Identified - We have identified the issue and are working on a fix.

Apr 27, 06:30 UTC
Investigating - Our systems have detected an issue impacting about a dozen Cobalt and Staging indices in US East and we are investigating.


Fri May 05 2017 09:40:03 GMT+0000 (Coordinated Universal Time)

Increased rate of 503 errors for some indexes in US East region

May 5, 09:40 UTC
Resolved - This incident has been resolved.

May 5, 08:36 UTC
Update - Starting at 07:29 UTC, a server failure caused increased 503 errors for roughly 3% of indices in the US East region. The issue was detected and subsequently a fix was implemented at 08:04 UTC. Index traffic is now stable, and further follow-up maintenance is now being performed on those indices affected.

May 5, 08:04 UTC
Monitoring - A fix has been implemented and we are monitoring the results.

May 5, 07:52 UTC
Investigating - We are currently investigating this issue.


Tue May 23 2017 18:12:37 GMT+0000 (Coordinated Universal Time)

Elevated 503s for some users in US-East

May 23, 18:12 UTC
Resolved - This issue has been resolved.

May 23, 17:36 UTC
Investigating - We are currently investigating this issue.


Thu Aug 03 2017 00:45:54 GMT+0000 (Coordinated Universal Time)

Network connectivity and availability issues for some indices in Virginia

Aug 3, 00:45 UTC
Resolved - This incident has been resolved.

Aug 3, 00:07 UTC
Update - From AWS:

4:58 PM PDT We can confirm that some instances are unreachable and some EBS volumes are experiencing degraded performance in a single Availability Zone in the US-EAST-1 Region. Engineers are engaged and we are working to resolve the issue.


5:05 PM PDT We have identified the root cause and are beginning to see recovery for instances and EBS volumes in the affected Availability Zone in the US-EAST-1 Region. We continue to work toward full resolution.

Aug 2, 23:53 UTC
Monitoring - We're detecting a network connectivity event affecting a single Availability Zone in our AWS Virginia region. General system redundancy is operating as designed, with no impact to customer traffic at this time. However we are standing by to intervene if necessary. https://status.aws.amazon.com/


Tue Aug 29 2017 21:54:25 GMT+0000 (Coordinated Universal Time)

Elevated error rates in US-East

Aug 29, 21:54 UTC
Resolved - This incident has been resolved.

Aug 29, 19:45 UTC
Identified - We are observing an elevated error rate in US-East. The regression is limited in scope, affecting <0.1% of all requests. We have identified a root cause and are working on a fix.


Fri Sep 01 2017 09:15:36 GMT+0000 (Coordinated Universal Time)

Elevated error rates for some users in US-East

Sep 1, 09:15 UTC
Resolved - Service has been restored.


Mon Oct 02 2017 18:33:55 GMT+0000 (Coordinated Universal Time)

Elevated error rates for some users in US-East

Oct 2, 18:33 UTC
Resolved - We have identified and fixed the problem.