Stellate Services unavailable because of Cloudflare Worker KV outage
- Stellate currently relies on CloudFlare services for parts of our offerings.
- Cloudflare had a global outage of their KV store for ~10 minutes on June 7th, from 6.51 pm to 7.01 pm. They provide a summary of this incident on their own status page at https://www.cloudflarestatus.com/incidents/1mj9jch1tqf9.
Any traffic that resulted in cache misses or cache passes triggered an HTTP/500 error page during that time frame. Traffic directly handled by the edge cache (i.e., cache hits) was not affected.
- ~30% of traffic resulted in cache hits and was served correctly.
- ~70% of traffic resulted in cache misses or passes; these requests returned an HTTP/500 error.
We are currently working on a larger infrastructure improvement that will remove the dependency on Cloudflare Worker KV.
Additionally, we will review all possible failure points that could make Stellate core services inaccessible (in the event of a third-party outage) and investigate options for additional redundancies for those services.
All services are back up and running again. We are monitoring the status of our services as well as Cloudflare Worker KV store.
As far as we can tell, Cloudflare Workers KV service, which we depend on, was having a outage of about 5 to 10 minutes. They seem to be back up and running again. We are monitoring the situation and will update our status page as needed.
We are looking into an issue with Stellate right now. We will update this incident as we have more data available.
This incident affected: GraphQL Edge Caching, GraphQL Metrics, GraphQL Rate Limiting, GraphQL Developer Portals, User API, and Admin API.