Follow

I might stop doing my semi-regular / / "tooling used by software industry is fundamentally broken on a philosophical level" / "organizing code in plaintext files is incredibly, ridiculously wasteful".

By accident, I found this:

feifan.blog/posts/the-database

...which covers 90% of the things I thought and ranted about over the last ~5 years, but better.

Seriously, go and read it.

(Also: news.ycombinator.com/item?id=2)

· · Web · 2 · 6 · 6

And this seems to be a proper attempt to make a programming environment that doesn't suck: gtoolkit.com/

It's , because of course it is.

Gonna push it to the front of my "to play with list". I can live with learning me some Smalltalk, or any other language for that matter, if it lets me work in an environment that doesn't make me want to stab myself in the eyes with a dull spoon, every single day.

I'm gonna say something blasphemous here: in context of these fundamental issues, also sucks hard. So does in general. Yes, they're immensely more ergonomic, malleable and powerful than their more mainstream competition, but they're still hindered by the same fundamental issue: they have the nature of writing code in plaintext files deeply embedded in their DNA.

(And unfortunately, I don't see a way for Emacs to improve here, as long as text buffers are its fundamental concept.)

This is turning into an unexpected thread 🧵, sorry. But there's one idea I couldn't put properly into words until now:

The problem with our tooling isn't plaintext representation per se. The problem is that it's simultaneously:

1) the ultimate, canonical representation of a program - the "single source of truth", and

2) the representation we work on directly when creating that program.

3) usually the *only* representation we work on.

The result is not powerful enough to manage complexity efficiently.

Here's why this is a problem: it makes us commit up front to a single view of a program, emphasizing some concepts, while making different - and often equally important - concepts implicit.

Because we have only one canonical representation of a program, it can support only a single way of understanding it.

The art of writing readable and maintainable code is necessary because of this: we can't express every concept properly at the same time, so we have to pick the ones we do, and let the rest be smeared.

The term "cross-cutting concerns" is used in our industry as an admission of defeat. Data transformation, execution order, security, logging, "happy path" vs. "failure path" - they're all equally valid concerns to focus on, but the "single plaintext representation" problem makes us commit to only *one* of those concerns, up front, and route the rest around it.

This is why I have to keep writing bullshit like:

fn foo(Data&, Logger&) -> Either<Result, Error>

because I have to commit to a single definition.

I have to spend time making bullshit decisions like:

- Exceptions or expected type?
- Pass logger as argument or use a global singleton?
- What to log here and how to get all the data I need for it?
- How many fake-monads I can stack in the return value before the C++ compiler tells me to maybe use Haskell instead?

And I have to deal with decisions made by others when reading or modifying their code - *always* deal, with *all of them*.

Even if the only thing I care about ATM is adding a log statement.

The way to look at it is: every one of those "cross-cutting concerns" is a dimension. Like in geometry.

Error handling. Parallelism. Traceability. Transformation pipelines. Being a part of an architectural concept A. And of a concept B. That's 6 dimensions already.

And the single plaintext codebase - that's just a one-dimensional medium. You can map 6D points to 1D - hell, that's effectively what modern software development is - but you do that by focusing on one arbitrary dimension, and mixing the rest.

The solution would be to allow the programmer to view and *edit* the code in multiple different representations - textural, graphical, tabular, whatever fits best. All those representations are just different ways of viewing the same underlying artifact - the program source code.

Of course, there must ultimately be a single, complete definition of the program stored on the computer. It may or may not be plaintext. But as programmers, we shouldn't care about it or look at it for 99% of our time.

The first step to a better environment is thus dropping the requirement for programmers to work with underlying "single source of truth" representation.

It's not unprecedented - we've already done this for assembly/bytecode. We can do it again, at a higher level.

Second step is probably a new programming "language" - one that isn't fundamentally a linear, human-readable plaintext, but something multi-dimensional. Yes, . So a database.

Third step - taking responsibility for representations.

That last step is a combination of DIY / Lisp /craftsman philosophy of making your own tools (and the tools to make tools), and a reification of .

When I define a domain concept, my tools must make it easy for me to express it in code, but also give me the ability to look at the code through the lens of that concept, without dragging in irrelevant details, and that's *especially* when a single piece of code factors into multiple different concepts simultaneously.

This means, the tooling must let me not just encode the concept, but also to define tools/representations for efficiently working with that concept. At every abstraction level - whether it's a domain concept or implementation detail.

Like, imagine looking at "WidgetController", and your tooling telling you it's simultaneously:
- a Widget (domain type)
- a bridge (design pattern)
- a piece of a state machine
- a queue

And you get dedicated tooling/visualisations for each, some of which you added yourself.

Ok folks, you know what? I'm rambling. So enough for tonight.

But I feel like turning this into a proper (and less ranty) article. If anyone would be interested in reading (or reviewing a draft of it), please let me know.

Addendum: some other bullshit I waste time on pretty much every day when coding professionally:

- Lots of small functions vs. fewer fat ones. The great readability drama that exists only because we have to commit, up front, to one or the other, even though different tasks call for viewing the exact same code at different levels of granularity.

- Where do I put that constant/function/class? How to organize code into files? This is pure, unadulterated incidental complexity. We should *not* care about this.

🆕 A little addendum for the whole subtree under parent I'm replying to.

I just got reminded about and did some refresher reading. Turns out the underlying idea behind is to address precisely the problems I've been ranting about here.

AFAIK AoP wasn't well received by the community at large. I'm going to study in more detail the arguments brought up against it, but so far my vague impression is that they're both correct and missing the point.

They're correct in that AoP is non-local, "spooky action at a distance", making codebases harder to comprehend and debug without special tooling assist.

They're missing the point in that the real problem, IMO, is the plaintext, file-oriented, tree-structured form in which we write code - and for which we design our tools.

"Cross-cutting concerns" are, by definition, cross-cutting. Non-locality is an artifact of code format & tooling. When you turn a graph problem into a tree problem, you lose some edges.

Another thing. The other day I stumbled on this on :

hyperfiddle.notion.site/Reacti

This is from ecosystem - people announcing - a "Reactive Clojure/Script" lib/macro that promises to erase the client/server split from your code.

It's worth reading the article to see what they're talking about.

In short, photon macro lets you write functions that arbitrarily mix code for the server and code for the client. "Arbitrarily" here includes control structures.

I'm attaching a screenshot of the code example from that article.

Notice the coloring. Purple parts run in the browser; red parts run on the server. Photon "magically" handles ensuring the two runtimes stay in sync and execute in lockstep, with as little overhead as possible. Hell, it's quite likely that their overhead is *lower* than typical server/client communication people roll by hand.

Photon is relevant to this thread in two ways.

One, their trick is to compile your code to an explicit DAG.

This is a good example of the insight about code. compiles your function to a DAG, and its runtime ensures the DAG stays in sync between the server and the browser - both in terms of how its executing, and its very shape (this is both and -y code; the DAG will change dynamically).

Secondly, notice what this abstraction does: it eliminates the cross-cutting concern of dealing with server/client bookkeeping - the kind of bullshit code that's majority of any codebase.

@temporal you can send it my way I'll try to take a moment to read it.

I feel the same way as you on many points. I think about people who work with data and how they invented the relational database to be able to view their data arranged in different logical ways because of course you need to have that. We programmers don't for some reason.

@temporal yes! let me know if you have managed to get this wtitten down in the mean time.

@woozong So far I didn't manage to turn it into a proper article. I have too many things going on in my life right now, so I don't expect to be able to write about this properly earlier than ~3-4 months from now.

@temporal
no worries, it was a very interesting thread that clarified a nagging feeling I've been having for quite a while.
\m/

@woozong I'm happy my little rant was helpful :). It's definitely not over yet - I'm continuously thinking about this topic, and looking for more interesting references.

In fact, I'm about to add another one to this thread.

@temporal For over a decade, I claim that #information representation should only depend on the specific #retrieval situation and not on its storage situation. I usually think of #files and bits of my #KnowledgeManagement.
You have provided an interesting thread on the same point of view but for #sourcecode and #programming.
Thanks!

@publicvoit Your perspective on / is insightful and I didn't realize how it connects - I feel now it's a different perspective on the same thing.

Representation and storage are two orthogonal (modulo efficiency) concerns, and the former should be driven by what you want to actually do with the data. That includes not just retrieval, but also updating. I want to work with and on high-level concepts, not just refer to them as "reports" on the underlying data.

@publicvoit In context of your writing on , in particular tags, making notes/todos anywhere in the system vs. my assertion that when coding, I shouldn't waste time deciding where in a file/filesystem a given function/class is supposed to be stored...

I find myself to be surprisingly resistant to placing tasks in random places. I feel more comfortable with a well-defined hierarchy. But then, I notice I waste a lot of time deciding "where should I put this item?". I'm inconsistent in this.

@publicvoit My current hypothesis is this:

I don't trust search. When searching, I keep having this feeling that results are not complete. That there may be something important the query is excluding.

Conversely, I find a canonical hierarchy reassuring - because I know I can just manually walk over it (or a relevant subtree of it), and either I find what I'm looking for, or I know for sure it doesn't exist in the system.

Searches are open-world, canonical hierarchy is closed-world.

@publicvoit Now the silly bit here is: half of my yesternight's rant was arguing in favor of interactions that are effectively open-world - querying and filtering and pivoting.

But then, thinking about applying the same approach to my todo list, I start to feel claustrophobic - having a thousand local views and no global view makes me feel I might be missing important information that just happens to not be covered by any of the queries.

Not sure how to reconcile it.

@temporal @publicvoit problem is categories aren't mutually exclusive nor do they have canonical definitions or depth order . Information is a graph. Knowledge objects are multidimensional vectors . Even the animal kingdom taxonomy has loops/overlap and phenotype/genotype discrepancy.

@hobson @temporal Any strict #hierarchy is flawed. If you order things in one way, you're not ordering them in infinite/many minus one possible ways.

@hobson

I agree, and this is what my / rants over the years have been fundamentally about.

The problem I described in this subthread is not about graphs and categories. It's about querying. And it's subjective. I find myself defaulting to working with the fundamental "storage-level" representation, because it's the only one I fully trust. Queries miss results - whether because of a bad search engine, or a wrong query. Storage representation is, by definition, complete.

@publicvoit

@hobson

Things that killed my trust in search are shitty search engines. Like search in Windows Explorer, which misses files I *know* I have and can manually find. Search in Slack, or Google Docs - they also have sometimes missed things for me in the past.

Or social media - Facebook, Twitter, etc. They're all eventually consistent, best-effort searches, making them pretty much useless: if you don't see something in results, it may be just because the search job gave up early.

@publicvoit

@hobson It's a similar story with programming tools, too. IDEs and language servers.

Like, I have clangd working over my work codebase, but I still frequently use (rip)grep for searching. That's because with clangd (and most IDEs I worked with), when I search for a code symbol / callers / callees and don't find what I expect, the most probable reason is... that the underlying engine failed to parse some code or is otherwise confused.

Grep, I trust. Because it walks the filesystem.

@publicvoit

@hobson

The irony here is that the main rant in this toot tree was that I'd like to use million different representations of the same underlying data set (code base), without ever dealing with the canonical storage-level representation explicitly.

Meanwhile, in practice, I don't trust many tools that offer some kind of higher level of querying / classifying of data.

This is inconsistent, so I'm trying to figure it out.

@publicvoit

@temporal @publicvoit yea I was just telling a coworker recently that it's a shame they are too young to remember a world where you could trust search results. We had desktop search. better than grep/rip/find, or any file system tree I could organize. They got us addicted to search, and lazy about organizing information, then sneakily polluted search with misinformation, boiled the frog. Sublime Text ctrl-F & ctrl-P restored my hope for humanity.

@temporal I once worked on an important commercial project written in Smalltalk where very few of the developers knew how to rebuild the Smalltalk image and this needed doing regularly, from time to time. Just saying...

@underlap That's a problem with long-living images, and the reason why I habitually rebuild Lisp images I work on after any significant change.

But it's an unrelated issue - one of ephemeral modifications to the application not being recoverable from the source code.

What I'm after isn't (primarily, at this point) sculpting a running program - it's working on a program model, aka. "source code", just through better means than the code itself.

@temporal If you find that Holy Grail, you'll be rich. Many have tried and failed over decades.

@temporal don't even have to go ask why we don't do things differently? Even under assumption of doing things the same way as now, i kindah feel i should have more ways to vizualize the code..

I have some scripts that use graphviz to show class derivation and connections between modules but it's limited. (also class derivation doesn't say anything about what instances-of-classes are passed where)

tbh i used it mostly to absent-mindly look at it in a dead moment...

@jasper I'm interested in exploring the topic of "why things are the way they are".

My current belief is that it's path dependence / short-term optimization: as an industry, we're dealing with increasing complexity by throwing more bodies at a problem. It works well enough to make everyone ridiculous amounts of money, so there's no pressure to make fundamental improvements. It's an inflation phase.

@jasper RE visualizations, I have some scripts for my own - but they're mostly limited by how much structured data I can get out of a codebase in limited time I can afford to write those scripts. Which isn't much, unfortunately - even the tools that build internal models of codebases are reluctant to expose it.

I mean, I could probably speed up my work 1.5x if clangd would expose its project cache for SQL querying (or equivalent).

I do make diagrams like these manually, too, when I'm figuring things out.

@jasper There's also an UX problem in here. It's visible even in your graphviz images: there's only so much data you can put in there before it gets incomprehensible. And I haven't found a good interactive graph exploration tool yet (those probably exist, but I don't know their names).

I also sometimes play with dumping "reports" into SQL(ite) databases, but again, tools I know for exploring a DB are lacking. Table view + query box isn't enough when the data represents a graph.

@jasper There's *loads* of low-hanging fruits there for visualizing, exploring, querying and editing codebases. I have some ideas of my own. This guy has plenty of great ideas: emilprogviz.com/.

I'd code some of that up, but I lack both the time and the focus necessary to do it :/. So the best I have is a random amalgamation of elisp snippets grepping things (because I can't for the life of me figure out how to query through lsp-mode) and occasionally dumping some PlantUML.

@jasper Of course visualizing is one thing, editing is another. There's loads of possible improvement at every level.

Even working with plaintext code, structural editing (like provided by Paredit, Parinfer, Lispy, etc.) would be helpful for any programming language.

But then I wish I could e.g. view my class definitions, or function definitions, as tables. Imagine how much better it would be to, on the fly, generate a table of functions you care about - one that's editable and bidirectional to the code.

@jasper

Now if I had that table, and it was editable in Emacs, I could quickly apply my edits across the code objects, using bulk regex replaces, macros or whatnot - and have them applied to the codebase. I'm thinking here of a structural equivalent of wdired / woccur / wgrep.

Note that this is still technically the realm of plaintext, with sufficiently powerful editor. But it's a matter of filtering the code to only the things you care about, and providing an efficient interface for working with it.

@jasper They say that typing efficiency is not the limiting factor in programming, but I disagree.

Maybe it's me not being neurotypical here, but at least for me, the efficiency & ergonomics of editing code limits the kind of changes I'm willing to make, and to an extent drives design/architecture choices.

More on this here: news.ycombinator.com/item?id=2

I feel my cognitive performance is limited by my editing speed, because it's the limit on the feedback loop between what's in my head and what's in the code.

@temporal like with a function, the concept of programming is that you have a specification and function name and use it.

If you change the specification/name you're gonna have to change every usage. Suppose an editor can help you go through the usages but that's it on that front, unless you change the concept?

Trying to keep LOC down is pretty important..

@jasper Right, but the thing is, changes to code are usually not arbitrary. There's usually a well-defined higher-level operation that I'd like to perform, but there's no easy way for current tooling to know it, or for me to teach it. So I have to implement it manually.

Some of that is covered in certain IDEs/languages under "automated refactoring", but it's not available everywhere, and I think the true utility of these tools is misunderstood. It's not about refactoring. It's about higher-level editing.

@jasper To use an example: imagine you have a module with a few functions exposed publicly, and many more as implementation details. Among those, you have:

MakeWidget(Stuff) -> Widget

that does lots of complex work, at some point calling:

Frobnify(Item) -> Datum

And then you realize Frobnify's implementation can fail, and that error needs to be propagated all the way to and through MakeWidget. So you want it to look like:

Frobnify(Item) -> Either<Datum, Error>.

Now you face lots of editing work.

@jasper This kind of stuff shows frequently in my daily work, and what kills me is, changes like these are trivial to explain. You want every function between point A and B to use Either<T, E> instead of T; there's a small set of modifications you'll need to do in each place. E.g.:

return Quuxify(Frobnify(sth);

turns into:

return Frobnify(sth).and_then(Quuxify);

etc.

What I wish is to be able to explain this to my editor, and then have "switch this function tree to/from Either<>" as a single operation.

@jasper Note that in this example, Either<> is not a language feature, it's a library. In some cases third-party, in some cases first-party.

And in my work, I do many more such operations that are easy to specify, but require lots of editing of actual code.

I'd like to have an easy way to teach my environment about those, on a case-by-case basis. Teach it how to recognize code that matches some custom concept, teach it a vocabulary for working with that concept directly, and have it do the editing for me.

@jasper This is just another way of reiterating the thrust of my original rant in this thread.

You can't express every concept in code, because the same code may simultaneously be a part of multiple concepts at different abstraction levels (e.g. this class is a queue, a state in a state machine, a part of the "shopping cart to purchase order" pipeline, etc.).

So most of (my) programming work is translating simple edits to those concepts into extensive edits to code, while trying to not break anything.

@temporal really concretely here, maybe try-catch is a solution?

Though i have read some bad things about it, don't see/remember them right now i guess..

@jasper I tried that too. Working through the expected/exceptions business was an important stepping stone in me developing these ideas.

Speaking concretely, some parts of our codebase uses exceptions, other parts use Either type (tl::expected). Different people on the team had different preferences.

I tried to do both; I even purposefully split a nontrivial component into two equivalent-ish parts, and developed one with exceptions, one with expected. Conclusion: they both suck about equally, in C++.

@jasper Either<>/Expected is the one that requires such silly large-scale code changes, because it takes over the return type. It's infectious in the same way async functions are in most languages that support them - see the famous article: journal.stuffwithstuff.com/201

... but also see:
patrickthebold.github.io/posts

which points out that Either<>, async and exceptions all have the same red/blue infectious qualities, for the same fundamental reason.

As for exceptions, they have bad ergonomics for different reasons.

@jasper The major problem with exceptions (in C++) are:

- They're not part of the function interface in any way;

- It cannot be statically determined whether you're handling all the exceptions you should be;

- They can't easily be batched and captured for later processing when you're iterating over a container with standard algorithms.

C++ as the language can express neither exceptions nor expected ergonomically, and their respective problems roughly cancel each other out.

Show newer
Sign in to participate in the conversation
Mastodon for Tech Folks

This Mastodon instance is for people interested in technology. Discussions aren't limited to technology, because tech folks shouldn't be limited to technology either!