We had a dataloader.io outage today (February 3rd) which cause the following behavior: tasks were queued/hanging and then forcibly expired after 2 hours (this is the default dataloader.io timeout).
Our Systems Admins were notified immediately, and, after restarting dataloader.io CH worker, things went back to normal.
Please find all the details below:
Date & Time
Started: Feb, 3rd, 12:15 PM PST
Detected: Feb, 3rd, 12:25 PM PST
Fixed: Feb, 3rd, 12:55 PM PST
We have confirmed errors processing dataloader.io tasks. This produces a partial outage on the task execution engine, preventing all tasks from being executed. The issue was resolved after restarting the CH worker for dataloader.io.
All dataloader.io users
Due to the version of SQS connector used in dataloader.io (v2.2.4 - 02/14), it gets to a point where it can’t find free threads to put messages on the SQS queue for the execution engine.
Corrective Actions (immediate fix):
Restarted dataloader.io CH worker.
Upgrade the SQS connector in Dataloader.io to the v4.1.0 (10/16), work with DevOps to get a working environment for testing and push to production. We are going to release a new dataloader.io version after that and create a Release Note KB Article to notify all our clients.