We had a dataloader.io outage yesterday (January 26th) which cause the following behavior: tasks were queued/hanging and then forcibly expired after 2 hours (this is the default dataloader.io timeout).
Our Systems Admins were notified immediately, and, after restarting dataloader.io CH worker, things went back to normal.
Please find all the details below:
Date & Time
Started: Jan, 26th, 11:30 AM PST
Detected: Jan, 26th, 11:40 AM PST
Fixed: Jan, 26th 11:43 AM PST
Executive Summary
We have confirmed errors processing dataloader.io tasks. This produces a partial outage on the task execution engine, preventing all tasks from being executed. The issue was resolved after restarting the CH worker for dataloader.io.
Customers Impacted
All dataloader.io users
Root Cause
Due to the version of SQS connector used in dataloader.io (v2.2.4 - 02/14), it gets to a point where it can’t find free threads to put messages on the SQS queue for the execution engine.
Corrective Actions (immediate fix):
Restarted dataloader.io CH worker.
Improvements Needed
Upgrade the SQS connector in Dataloader.io to the v4.1.0 (10/16), work with DevOps to get a working environment for testing and push to production. We are going to release a new dataloader.io version after that and create a Release Note KB Article to notify all our clients.
Comments
Please sign in to leave a comment.