Error Processing Tasks: Resolved

We had a outage today (February 3rd) which cause the following behavior: tasks were queued/hanging and then forcibly expired after 2 hours (this is the default timeout).

Our Systems Admins were notified immediately, and, after restarting CH worker, things went back to normal. 

Please find all the details below:

Date & Time

Started: Feb, 3rd, 12:15 PM PST

Detected: Feb, 3rd, 12:25 PM PST

Fixed: Feb, 3rd, 12:55 PM PST

Executive Summary

We have confirmed errors processing tasks. This produces a partial outage on the task execution engine, preventing all tasks from being executed. The issue was resolved after restarting the CH worker for

Customers Impacted

All users

Root Cause

Due to the version of SQS connector used in (v2.2.4 - 02/14), it gets to a point where it can’t find free threads to put messages on the SQS queue for the execution engine.

Corrective Actions (immediate fix):

Restarted CH worker.

Improvements Needed

Upgrade the SQS connector in to the v4.1.0 (10/16), work with DevOps to get a working environment for testing and push to production. We are going to release a new version after that and create a Release Note KB Article to notify all our clients.

Have more questions? Submit a request


Please sign in to leave a comment.