Stellate Services unavailable because of Cloudflare Worker KV outage
Incident Report for Stellate
Postmortem
  • Stellate currently relies on CloudFlare services for parts of our offerings.
  • Cloudflare had a global outage of their KV store for ~10 minutes on June 7th, from 6.51 pm to 7.01 pm. They provide a summary of this incident on their own status page at https://www.cloudflarestatus.com/incidents/1mj9jch1tqf9.
  • Any traffic that resulted in cache misses or cache passes triggered an HTTP/500 error page during that time frame. Traffic directly handled by the edge cache (i.e., cache hits) was not affected.

    • ~30% of traffic resulted in cache hits and was served correctly.
    • ~70% of traffic resulted in cache misses or passes; these requests returned an HTTP/500 error.
  • We are currently working on a larger infrastructure improvement that will remove the dependency on Cloudflare Worker KV.

  • Additionally, we will review all possible failure points that could make Stellate core services inaccessible (in the event of a third-party outage) and investigate options for additional redundancies for those services.

Posted Jun 08, 2023 - 11:34 UTC

Resolved
Cloudflare posted an update on their status page and marked the incident that caused this incident as resolved. See https://www.cloudflarestatus.com/incidents/1mj9jch1tqf9 for their update.
Posted Jun 07, 2023 - 19:46 UTC
Update
All services are back up and running again. We are monitoring the status of our services as well as Cloudflare Worker KV store.
Posted Jun 07, 2023 - 19:18 UTC
Monitoring
As far as we can tell, Cloudflare Workers KV service, which we depend on, was having a outage of about 5 to 10 minutes. They seem to be back up and running again. We are monitoring the situation and will update our status page as needed.
Posted Jun 07, 2023 - 19:07 UTC
Investigating
We are looking into an issue with Stellate right now. We will update this incident as we have more data available.
Posted Jun 07, 2023 - 19:01 UTC
This incident affected: GraphQL Edge Caching, GraphQL Metrics, GraphQL Rate Limiting, GraphQL Developer Portals, User API, and Admin API.