a cool thing about Mastodon is that it's literally open source, you can just go poking around and see how it works
a user recently asked me why they couldn't add another reaction to an instance announcement. turns out there's a maximum of eight distinct reactions: https://github.com/mastodon/mastodon/blob/991353682d96cecd4695e150cb6030613d447844/app/javascript/mastodon/features/getting_started/components/announcements.js#L293
from an educational perspective, it's really useful to have a large, in-production Rails+JS app where anyone can pull down the code
it's great when the Mastodon upgrade notes are like "this update includes long-running migrations"
I was curious about this, since 3.5.2 is just a patch release. what could take so long? in this case, it's adding a single index to the status table (the largest table by large margin): https://github.com/mastodon/mastodon/blob/f17e73da09e6c63665aee4e9731df7808094960e/db/migrate/20220428112511_add_index_statuses_on_account_id.rb#L5
One other interesting thing to note is that, even though we've increased the Sidekiq throughput, we've actually *decreased* the total number of connections open to Postgres (~300 before and 160 now). I think this is due to Sidekiq connection pooling.
This goes to show that making better use of existing resources is often better than simply throwing more resources at a problem.
A lot of the retries in that time were due to Mastodon::RaceConditionError exceptions and other admins reported this problem here: https://github.com/mastodon/mastodon/issues/15525
Here's the graph of Sidekiq jobs processed/failed over the same two-week time period. The peak last Monday is 1.7M jobs processed, 280k jobs failed.
The increase in failures is proportionally higher than the successes during that first spike before we reconfigured Sidekiq. I'm inferring that a slower rate of processing led to a higher number of failures (and subsequently, of retries, contributing back to the queue backlog problem).
Here are the m.t server resource graphs for the past two weeks. You can see halfway through when we got the initial extra load, stumbled a bit to reconfigure, and then levelled out.
We've had a few sudden spikes in load averages that corresponded to a sudden high number of Sidekiq jobs getting enqueued. I'm not sure where these jobs are coming from but the new configuration is handling them super-smoothly 😋
has anyone made this joke yet
I'd just like to interject for a moment. What you're referring to as Fediverse/Mastodon, is in fact, ActivityPub/Fediverse/Mastodon, or as I've recently taken to calling it, ActivityPub plus Fediverse plus Mastodon.
Created mastodon.technology in 2017.
Canada -> New York -> Canada 🤩 Bisexual 🏳️🌈
You can support this instance here: https://www.patreon.com/ashfurrow Or send one-time donations to https://www.paypal.me/ashfurrow