Realized my AV1 encodings were at a higher quality & bitrate than necessary, which is also probabaly slowing down decoding. ;)

Twist knobs, re-encode, wait a day, repeat.

EC2's "vCPUs" are individual CPU _threads_. a "16-core" compute instance is 8 cores with 2 threads per core. Hope you didn't need to compute with those cores.

This paper on a malloc() replacement that DOES COMPACTION even on C/C++ is making the rounds:

Scarily beautiful.

In case anybody else runs into WebKit custom builds not quite working right run in Safari (broken URL bar, missing Developer menu), try 'run-minibrowser' instead of 'run-safari'.

Ok, I can enable SharedArrayBuffer in a custom WebKit build but there's still no WebAssembly threading support so the decoder can't run. ;_;

abort("Assertion failed: requested a shared WebAssembly.Memory but the returned buffer is not a SharedArrayBuffer, indicating that while the browser has SharedArrayBuffer it does not have WebAssembly threads support - you may need to set a flag"). Build with -s ASSERTIONS=1 for more info.

Also, WebKit still uses SVN? They do have a git mirror at least. :D

I'm manually building WebKit to test something with SharedArrayBuffer enabled (it's a compile time flag in source that you have to flip!) and something in the xcodebuild process keeps trying to connect to my iOS development devices over the network. O_O

Fixed another old weird bug in ogv.js. Sometimes after playing a file to the end, it would get stuck when you tried to re-play it.

Turned out to be that when flushing the demuxer's buffers, it got left thinking it was at the exact byte position where it had to seek to to read the keypoint cues at the end of the file... but since buffers had been flushed it had no data to read, and got confused.

Tweaking it to discard the position on flush means the demuxer issues an i/o seek which refills the buffer. \o/

Fixed a bug where a function was returning a 0 where it needed a 1. Single-bit bug? :)

Accidentally committed some time travel during this encoding... O_O

Performance of the dav1d AV1 decoder in WebAssembly is not great, but good enough to work if Apple doesn't end up shipping AV1 support in Safari.

iPad Air (first-gen 64-bit iOS device) top out around 240p at 24fps; newer iPad Pro is comfy with 360p but drops some frames at 480p.

On macOS, Safari also does significantly better than Chrome or Firefox, by like 25% or so! But Safari doesn't yet have threading, which can double throughput in Chrome/Firefox with the necessary flags enabled.

That feeling when frame-parallel threading is working in the AV1 decoder...

Aha, I have to set n_frame_threads as well as n_tile_threads to get dav1d to use 2 or 4 cores more effectively. Of course this breaks my packet-in-picture-out assumption, so I have to retool a little bit. :D

I think I found my packet corruption bug. Currently ogv.js's file fetching uses the old XHR "binary string" hack in order to do progressive downloads (since you can slice out parts of the string during progress events).

At a particular byte position we do an XHR for a chunk that starts with.... 0xFE 0xFF, which is interpreted as a UTF-16 BOM overriding the x-user-defined charset.

The chunk gets turned from 1 megabyte into 0.5 megabyte with every other byte copied.


Now that that bug is out of the way I can go back to the bug where a particular packet is corrupted and kills the decoder. Yayyyyyy

Had an out-of-bounds memory access bug in the emscripten-compiled AV1 decoder that I worried was an optimizer bug.

Turned out to be an unsafe optimizer option that I had explicitly enabled to shave a few bytes off the generated WebAssembly... it had just never broken anything on my other decoders.


Looks like AWS charges about $16/day for 16-core "c5.4xlarge" instances in US-Oregon region, or $8/day for the 8-core version. This is pretty good for rare usage spikes, but would add up if used frequently.

That's with a reasonably current Xeon with AVX512, which should help a lot with video encoding.

I also noticed AWS has ARM servers available now! This would not be wise for video encoding. ;)

(mostly looking for servers in US west coast region to minimize overhead copying files in and out)

Any recommendations for services providing elastic-style cloud servers (billed hourly or daily) that can have many fast cores for CPU-intensive work (video encoding, large compiles)?

I've used AWS before and found it worked but the management interface is confusing if you're not steeped in it. Curious if folks have a favored alternative!

I estimate my video encodings would run at least 5x faster on a more modern workstation (8 -> 16 cores, 2.26 -> 4.3 GHz clock, and SSE -> AVX)...

I'm not in a hurry though, I can wait 3 or 4 days for this batch to finish. ;)

And I probably shouldn't spend Valentine's Day trying to convince my wife I need a $3500 workstation upgrade when I can spend like $50 for a one-off cloud server for a few days to batch-run more files. ;)

Show more
Mastodon for Tech Folks

This Mastodon instance is for people interested in technology. Discussions aren't limited to technology, because tech folks shouldn't be limited to technology either!

We adhere to an adapted version of the TootCat Code of Conduct and follow the Toot Café list of blocked instances. Ash is the admin and is supported by Fuzzface and Brian! as moderators.

Hosting costs are largely covered by our generous supporters on Patreon – thanks for all the help!