Hi everyone. We experienced a catastrophic failure in our Sidekiq setup, which handles background job processing. We had to clear its queue because it had become backed up by millions of duplicate jobs. The cause is still unknown.
Feeds for local users are rebuilding now. We probably lost some incoming federated posts, and some outgoing posts for the past hour won’t be delivered.
So far the new Sidekiq instance is stable. The ETA for rebuilding user timelines is about an hour.
Thanks for your patience – I know it sucks to lose any data at all. @ashfurrow will check in on this in the morning but currently needs sleep.
In case anyone was curious why the m.t instance went down for an hour last night, I've opened an issue on the Mastodon project with details of our unexpected downtime. I suspect it was a fluke and not a problem with the software, but I've included remediation steps in case they are helpful to another admin someday.
This Mastodon instance is for people interested in technology. Discussions aren't limited to technology, because tech folks shouldn't be limited to technology either!