Pinned post

Software "engineer" by trade and study. enthusiast, loves . Also looking to pivot away from pure software, and into , or . Or in any other way contribute something meaningful back to this world. is my OS. I may have made a for it once.

Yay, so with shutting down, I wonder where - and if - to migrate to. Or whether to bother self-hosting.

Meanwhile I decided to end my self-imposed break from the Orange Site, and I feel so happy. I missed that place.

Oh my, the truly won.

"I wonder if this also means future astronauts will have wifi when launching to moon/mars. Imagine watching watching a movie while taking off to somewhere distant"

Yes. Imagine the world when people no longer realize local storage is a thing and streaming is not a requirement for being able to play media on a computer.

I suddenly feel very, very old.

Oh look! Found one place where has a button!

Perfect to combat anti-pizza propaganda spread by "healthy" food crazies.

But I suppose the biggest realization I had is that... isn't fully satisfying. Not anymore.

I mean this in the sense, it's like eating wholesome dinners that are optimized for minimum prep / eating on the go. It's nourishing, energizing, and infinitely better than what little we had before...

but then those bits of reminded me what it is to have a wholesome meal in a good restaurant. Equivalent nutrition, takes much more time... but you're also more present. Deeper experience.

Show thread

BTW. major plot being split into subplots is something I think only did well enough.

As mentioned, I just watched Damar's rebellion story via a sequence of auto-playing YouTube clips (kudos for whoever made it so they get recommended in right order), and... it made me realize for the fist time, that isn't rewatchable - it's piece-wise rewatchable! To a degree meaningfully greater than other shows I mentioned. This is nice.

Show thread

By "such shows" I mean... and / (and perhaps late and ).

Ensemble cast, semi-serialized, big on world-building. A show that grows on you over time, makes you feel "at home". Capable of handling several overarching plots in parallel, including major plots splitting into subplots. Pacing them, not rushing (like the new shows do).

Show thread

is the bee's knees; Thursdays have been the highlight of my every week in the past 2 months. It's by far the best on air since ended.

That said, I watched a clip on YouTube, and then let it auto-play another, then another, and two hours later, I've pretty much rewatched the whole Damar's rebellion subplot from the Dominion War, and I must say...

By the Prophets, I so miss Tacky Cardassian Fascist Eyesore Nine. Why aren't such shows made anymore?

Is there a name for the situation where some hypothesis or phenomenon gets named after a person who believed/advocated for the exact opposite?


Ortega hypothesis, i.e. that science is mostly accumulation of small contributions of mediocre individuals, from which "geniuses" draw everything -- named after a guy who claimed it's the geniuses who move science and create a framework for mediocre masses to make secondary contributions.


What's going on is, in absence of a conflicting signal from the other eye (which you kept closed for this), the brain comes to the obvious conclusion: since it didn't tell the other eyeball to rotate, and yet it everything in the visual field is shifting, then the only logical explanation must be that the universe is rotating around you!

(If you try this again but with the other eye open, you can almost feel your brain fighting its inner Sherlock for a moment on this.)

Show thread

So now I know to make you experience that difference - feel the universe moving.

The trick is simple. Step one, you're . Look around with your eyes, but without moving your head.

Ok, step two, you're now . Close one eye, and keep it closed. Push at the other eye with your finger, at an angle, to force the eyeball to rotate slightly. E.g. press below lower lid to make the eyeball rotate up or down, and...

Experience the whole universe rotating around.

Show thread

I finally found an easy way of explaining the difference between and APIs for drawing and transforms - as of 15 years ago.

The joke back then was, when you arrange geometry and your viewing vantage, in Direct3D you do it directly - "this goes to (10,0,-20) relative to me, and then 10 degrees to the left".

In OpenGL, instead, you sit still and move the entire universe, so you always drop stuff at origin.

The result is the same, but it feels like it *should feel different*...

From the above follows that you might not have any redundancy in your weights at all! When you casually trim them down to a few most significant bits, you're not removing noise - you're throwing away the high frequency components of everything your network learned.

As a result, the generated image has its details all wrong in way that seems random, but is actually conceptually similar to what happens to an image after being run through a low-pass filter in the frequency domain.

Show thread

The point of the long-winded explanation is: deep learning models are fundamentally compressors, and they're unsophisticated ones.

This means that getting a usable image generation model to 4GB of weights, while impressive, is quite far from what could be achieved if the training generated programs instead of weights.

Conversely, being far from theoretical limit doesn't mean you have tons of redundancy in your weights - rather, the simplicity of the model makes your encoding suboptimal.

Show thread

So e.g. if your decompressor just implements RLE, there's only so much you can achieve on the input side. If you allow for more sophisticated algorithms, you can squeeze its input some more. At the limit, if your decompressor is just a simple interpreter for a Turing-complete language, then you could achieve much better compression ratios - you're free to exploit any pattern you're able to identify in the thing being compressed.

NN models all seem to have rather trivial decompressors.

Show thread

The art of compression is to find as short a program as possible.

Now, the current deep learning models seem to me to be similar to traditional file compression, in the sense that compressed program consists of a *fixed* decompressor and some input data for it. It's the latter that we optimize and it's what goes into "archive files" (.zip, et al.)

Obviously (for people), code = data, and so it's the fixed decompressor that determines how much compression you can do.

Show thread

What I'm getting at is that there's no free lunch. Information must be encoded somewhere.

Understanding data is fundamentally the same thing as compression (it's both trivial, non-obvious and very important insight).

In a general sense, compressing some data means finding *a program* that, when executed, will output the original data. Importantly, that program and the program that executes it must together be smaller than original data - otherwise you didn't compress anything.

Show thread

What's more insidious, IIRC some of that weight trimming is done during training. What if by doing that they're actually preventing the models from learning those tiny details in the first place?

I suspect this is what's happening because of the absurd "compression ratios" of current models. Like, we're down to 4GB of weights for useful image generation model.

Theoretically, we could do even better. But AFAIK those models are not structurally Turing-complete, so I don't think *they* can.

Show thread

Again, I know shit all about how these models are structured, but I do often hear that a common practice in deploying those models involves taking their weights and... cutting out high frequency bits.

Like, I see people saying you should take the 32 or 64-bit floating point weights, and downsample them into 8 bits of less. And that it's fine, because the model still seems to work after you trim 50-90% of its data.

But maybe it's not fine. Maybe that's why the model gets tiny deails wrong.

Show thread

I feel something similar is going on in the current tex2img generation models. Internally, they work on visual data using some idiosyncratic representation - one that's conceptually similar to a Fourier transform - and then transform the result to spatial domain.

I believe this is the case because the errors in generated images have this characteristic look of missing high-frequency data. Here, in the sense of hi-freq knowledge, i.e. knowledge how small details go together.

Show thread

In particular, the image you low-pass-filtered in the frequency domain will have details missing (or smoothed out) and softened edges across the entire image.

That is, you've deleted a small piece of information in a specific corner of image's frequency representation, and the result is nonobvious changes across the entire image.

That's because in frequency form, any single bucket/pixel actually maps to *every single* pixel in the original image. It's pretty mind blowing to see in action.

Show thread
Show older
Mastodon for Tech Folks is shutting down by the end of 2022. Please migrate your data immediately. This Mastodon instance is for people interested in technology. Discussions aren't limited to technology, because tech folks shouldn't be limited to technology either!