Roblox was down for 73 hours in Oct, with 50M users/day unable to access the service.

The summary is up, and WOW, reading it is like a mystery novel you can't put down.

Concurrency, long polling, cluster leaders, and more in this excellent postmortem:

