Summary
On November 25th at 9:50 am GMT, a deployment was executed to enhance performance by reducing task duplication. However, this caused issues in certain edge cases, leading to a subset of profiles getting stuck in automation. We reverted the deployment on November 26th at 10:50am GMT and remediated the stuck profiles throughout the day.
Root Cause
The core of the issue was a newly implemented deduplication feature designed to reduce task redundancy and improve performance. However, this inadvertently affected a select group of tasks, particularly those related to Worldcheck and custom automation system, that requeue themselves under specific conditions, causing them to halt rather than retry.
Actions
To avert similar incidents in the future, we plan to: