If I pre-chunk input into groups of N chunks where N >= number of logical CPU cores, this seems to make sense and it simplifies the data deps a lot to run filtering for all N chunks before deflating, rather than trying to confirm another thread finished part of its work...

Still trying to grok how to work this into something like Rayon for rust; running over lists of chunks sounds like it would work.

I worry this will become unbalanced if system load isn't even, though. Maybe makes more sense to model as work-stealing from two queues (one of chunks to filter, one of chunks to deflate), but it maybe all gets more complicated if one wants to keep it even while streaming input and output. Have to read up more on the available interfaces in Rayon and if I need to implement something else. Anyway, it's late and it's Friday so -- more later! Blog post some time tomorrow with updates.

Show thread
Follow

ok just one more. ;) I think this works more cleanly for streaming and inconsistent load, and still feels conceptually clean. Has a model of enqueuing things to a work queue, and as long as they get grabbed in order the data dependencies stay correct. But then there's more state stuff that I have to grok the rust-y way to do it.

Sign in to participate in the conversation
Mastodon for Tech Folks

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!