Here are the m.t server resource graphs for the past two weeks. You can see halfway through when we got the initial extra load, stumbled a bit to reconfigure, and then levelled out.

We've had a few sudden spikes in load averages that corresponded to a sudden high number of Sidekiq jobs getting enqueued. I'm not sure where these jobs are coming from but the new configuration is handling them super-smoothly 😋

Here's the graph of Sidekiq jobs processed/failed over the same two-week time period. The peak last Monday is 1.7M jobs processed, 280k jobs failed.

The increase in failures is proportionally higher than the successes during that first spike before we reconfigured Sidekiq. I'm inferring that a slower rate of processing led to a higher number of failures (and subsequently, of retries, contributing back to the queue backlog problem).

A lot of the retries in that time were due to Mastodon::RaceConditionError exceptions and other admins reported this problem here: github.com/mastodon/mastodon/i

One other interesting thing to note is that, even though we've increased the Sidekiq throughput, we've actually *decreased* the total number of connections open to Postgres (~300 before and 160 now). I think this is due to Sidekiq connection pooling.

This goes to show that making better use of existing resources is often better than simply throwing more resources at a problem.

Follow

@ashfurrow Like in journalism, where it's the decision to "get it first" or to "get it right", management in IT far too often decides to "hit the problem with more metal", yeah. More often unnecessarily than not, and I can tell some stories about that…

You did the right thing, you also had the best support – I'm proud to be in "your house" here. Thanks for running the show – and thanks to crew & patrons to help you keep it running!

· · Web · 0 · 0 · 1
Sign in to participate in the conversation
Mastodon for Tech Folks

This Mastodon instance is for people interested in technology. Discussions aren't limited to technology, because tech folks shouldn't be limited to technology either!