A bug was released on Jan 8th at 1.43 pm UTC while improving Persisted Operation support. The two areas of code overlap, and unfortunately, the change broke support for APQs.
Our E2E test suite should have caught this bug.
Unfortunately, we recently made many improvements to our E2E test suite and silently broke the validity of the APQ E2E tests. These tests were running and reporting successes, but under the hood, they were erroneously being run against a server that does not support APQ.
The impact of this bug was not widespread enough to trigger alarms after release.
At 11.45 pm UTC, a customer raised an issue with APQs, and our engineering team started investigating.
On Jan 9th at 2.05 am UTC, a fix was deployed, and the issue was resolved.
Improvements
We’ve fixed the bug in our E2E test suite for APQ.
We’ve agreed on a path forward to start monitoring GraphQL errors. The work has begun and is being tracked but has yet to be completed.
We’ve scheduled a rollback dry run for our following incident dry run to improve our institutional knowledge of rollback procedures and find potential improvements.
Posted Jan 12, 2024 - 13:39 UTC
Resolved
- A bug was released on Jan 8th at 1.43 pm UTC while improving Persisted Operation support. The two areas of code overlap, and unfortunately, the change broke support for APQs. - Our E2E test suite should have caught this bug. - Unfortunately, we recently made many improvements to our E2E test suite and silently broke the validity of the APQ E2E tests. These tests were running and reporting successes, but under the hood, they were erroneously being run against a server that does not support APQ. - The impact of this bug was not widespread enough to trigger alarms after release. - At 11.45 pm UTC, a customer raised an issue with APQs, and our engineering team started investigating. - On Jan 9th at 2.05 am UTC, a fix was deployed, and the issue was resolved.