Delay in Passfort Automation

Incident Report for Moodys

Postmortem

Summary

On November 25th at 9:50 am GMT, a deployment was executed to enhance performance by reducing task duplication. However, this caused issues in certain edge cases, leading to a subset of profiles getting stuck in automation. We reverted the deployment on November 26th at 10:50am GMT and remediated the stuck profiles throughout the day.

 

Root Cause

The core of the issue was a newly implemented deduplication feature designed to reduce task redundancy and improve performance. However, this inadvertently affected a select group of tasks, particularly those related to Worldcheck and custom automation system, that requeue themselves under specific conditions, causing them to halt rather than retry.

 

Actions

To avert similar incidents in the future, we plan to:

  • Enhance the scope of our automated testing to include these unique cases.
  • Establish alerts to promptly notify us if automation is stalled for extended periods.
Posted Dec 02, 2024 - 13:30 GMT

Resolved

This incident has been resolved.
Posted Nov 26, 2024 - 17:59 GMT

Monitoring

A fix has been implemented to start remediation of the impacted profiles and we are currently monitoring the results.
Posted Nov 26, 2024 - 15:11 GMT

Update

Our engineers have reverted the change which caused the issue with the automation delays. This means that no further profiles will be impacted by this issue. We are currently working to resolve the profiles which have already been affected.
Posted Nov 26, 2024 - 10:52 GMT

Identified

We have identified an issue causing delays in automation.

Our engineers are in the process of applying a fix to resolve this.
Posted Nov 26, 2024 - 10:21 GMT
This incident affected: Maxsight Environments (🇪🇺 EU - eu.maxsight.com, 🇺🇸 USA - us.maxsight.com).