Error Processing dataloader.io Tasks: Resolved

We had a dataloader.io outage yesterday (January 26th) which cause the following behavior: tasks were queued/hanging and then forcibly expired after 2 hours (this is the default dataloader.io timeout).

Our Systems Admins were notified immediately, and, after restarting dataloader.io CH worker, things went back to normal. 

Please find all the details below:

Date & Time

Started: Jan, 26th, 11:30 AM PST

Detected: Jan, 26th, 11:40 AM PST

Fixed: Jan, 26th 11:43 AM PST

Executive Summary

We have confirmed errors processing dataloader.io tasks. This produces a partial outage on the task execution engine, preventing all tasks from being executed. The issue was resolved after restarting the CH worker for dataloader.io.

Customers Impacted

All dataloader.io users

Root Cause

Due to the version of SQS connector used in dataloader.io (v2.2.4 - 02/14), it gets to a point where it can’t find free threads to put messages on the SQS queue for the execution engine.

Corrective Actions (immediate fix):

Restarted dataloader.io CH worker.

Improvements Needed

Upgrade the SQS connector in Dataloader.io to the v4.1.0 (10/16), work with DevOps to get a working environment for testing and push to production. We are going to release a new dataloader.io version after that and create a Release Note KB Article to notify all our clients.

Have more questions? Submit a request

Comments

Please sign in to leave a comment.