On Thursday, October 3rd at approximately 10:50am EST our solution stopped processing the following within our Blend Integration.
We identified the issue around 11:24pm EST however we also identified another issue with a single client and spent the next few hours identifying whether the issue was impacting all Blend Integration customers or if it was an isolated issue. Once we were able to determine the scope of the impact we reported the outage via our status center at 1:51pm EST.
While troubleshooting, it became clear a code change was made within the service providers platform that impacted all aspects of the Blend Integration, specifically the updating of tasks within the queue.
A ticket was issued to the service provider, escalated through our account manager, phone calls and email and was upgraded to an S1 Production Down severity. Incident response from the service provider:
During the above incidents we did test various application code changes in an isolated instance to try and provide a temporary workaround and restore service. None of those attempts were successful and we had to wait until the service provider rolled back the code change.
Around 7:00pm EST the code was successfully reverted and we started to see events successfully being processed. We had approximately 13,000 events that were captured during the outage that needed to be processed. Around 7:30pm EST all events were processed and we verified we were caught up. After the service was verified to be running and caught up we took some time to verify our findings and ensure we were processing all incoming events in real-time. We communicated restoration of the Blend Integration at 7:52pm EST.
Few areas that need immediate attention and in some cases are already implemented.
Partnership with service provider
Faster escalation channel(s)
We can now fast track any issue we maybe having to the VP of Product Development for the application platform they provide.
Code changes and reverting
We now have access to their development code branch / testing environment.
We appreciate your business and are committed to improvements that will ensure an outage of this duration doesn’t happen again. If you have any questions or would like further details, please email [email protected].