We talk so much about Abstract Syntax Trees, but maybe we should talk about "abstract structure graphs"?
Say you have code like:
foo = 1;
foo += bar(foo);
In AST form, using Lisp notation:
'((:= "foo" 1)
(+= "foo" ("bar" "foo")))
In "ASG" form:
'((:= #1="foo" 1)
(+= #1# ("bar" #1#))
Where #1= and #1# mean all instances of #1# contain (via pointers) *the same* object as marked with #1=, not just an equal one.
Some pictures to clarify what I mean are attached.
1/?
Now I'm not a compiler guy, but I'm pretty sure the very next step after parsing some code into AST is turning it into "ASG" - whether represented as a graph structure directly, or by a combination of helper structures (e.g. symbol table) and dynamic state (e.g. "environments" carried by recursing functions). You need to do that in some way, to be able to correctly associate a thing (like variable, function) with all references to it in the AST.
I wonder, why not expose this abstraction to the user?
2/?
In a way, we already implicitly code in graphs, not in trees. Rules of most programming languages make it that when you write:
foo = 1; foo += bar(foo);
all three instances of "foo" are *meant to* refer to the same thing.
But then, for code transformation, we parse it into AST, and that "sameness" becomes implicit. The graph becomes unrolled into a tree, and it's hard to walk it. We need supplemental information to work with it.
But what if we let the transformer writers walk graphs instead?
3/?
The previous toot had a notation change on the diagram - function application is made explicit, and so is ordering of the edges (to compensate for PlantUML/Graphviz "optimizing" it).
But operators are functions too, so how about the same, but with all function applications explicit?
Attached is a picture of a proper graph form of the code.
I wonder, is that a more useful form to work with than AST?
I'm likely reinventing a branch of #compsci here; pointers to existing body of work appreciated.
5/?
@temporal I spent some time looking at the graph and I think can't help but think it doesn't contain enough information to recreate the original structure. Either that or I don't understand the notation.
The Lisp form is very clear though.
@loke It should contain all information of the original Lisp form. If you start at the top and descend recursively (against the direction of arrows; I drew them in the "data flow" order), depth-first, but ensuring that at each level you go through alpha-arrow before you go through beta-arrow - you should be able to reconstruct my final Lisp form exactly.
The "edge ordering" thing is probably not typical for graphs, I don't know the name of this.
@ColinTheMathmo, since I have you in this thread - do you know if there exists, and/or could you help me find, a branch of mathematics that provides a language for higher-level transformations of graphs?
E.g. I have two directed graphs (or multigraphs) and want to join them together, based on some rules of how edges are formed.
Example use case here: https://mastodon.technology/@temporal/107323429225039113.
There must be a relevant body of vocabulary for this. I took a look at "graph algebra" on Wiki, but only got confused.
@temporal My knowledge is a little (well, a lot) out-of-date, but to my knowledge there is no such general language. AFAIK, when mathematicians want to do something like "Prepend digraph G to digraph H" they simply write the operation. So P(G,H) = (V,E) where V = V(G)uV(H), and E = E(G) u E(H) u D where D={(u,v) for u in V(G) and v in V(H)}
They might then invent a notation like D = V(G)->V(H), note somewhere what that means, then use it freely elsewhere.
This probably doesn't help.
@ColinTheMathmo Thanks, this gives me some perspective.
I just stumbled upon relevant material, that just happened to be on HN's frontpage today:
https://valand.dev/blog/post/dont-bring-a-tree-to-a-mesh-fight
Can't decide if the article is wise or full of confusion, but it touches on some of the same things this toot thread did.
https://news.ycombinator.com/item?id=29301957
Associated HN thread, which is full of the kind of information I am interested in. Struck gold here.
@temporal There's a lot going on in both the article and the thread ... I hope they are helping you, even if only giving you more to think about.
I'm not sure I can help further, but feel free to ask. Don't forget, if it's helpful, you can ask @Chartodon to chart this conversation.
Quick addendum: on the previous toot's diagram, the "function call" node is the only node repeating, but it's repeating nonetheless.
It feels like this one will be impossible to get rid of - it feels like a "structural operator", a subgraph demarker. But maybe it could be represented in some other way? A different way to draw this graph?
It's an open question for me now.
6/end.