We talk so much about Abstract Syntax Trees, but maybe we should talk about "abstract structure graphs"?
Say you have code like:
foo = 1;
foo += bar(foo);
In AST form, using Lisp notation:
'((:= "foo" 1)
(+= "foo" ("bar" "foo")))
In "ASG" form:
'((:= #1="foo" 1)
(+= #1# ("bar" #1#))
Where #1= and #1# mean all instances of #1# contain (via pointers) *the same* object as marked with #1=, not just an equal one.
Some pictures to clarify what I mean are attached.
Now I'm not a compiler guy, but I'm pretty sure the very next step after parsing some code into AST is turning it into "ASG" - whether represented as a graph structure directly, or by a combination of helper structures (e.g. symbol table) and dynamic state (e.g. "environments" carried by recursing functions). You need to do that in some way, to be able to correctly associate a thing (like variable, function) with all references to it in the AST.
I wonder, why not expose this abstraction to the user?
In a way, we already implicitly code in graphs, not in trees. Rules of most programming languages make it that when you write:
foo = 1; foo += bar(foo);
all three instances of "foo" are *meant to* refer to the same thing.
But then, for code transformation, we parse it into AST, and that "sameness" becomes implicit. The graph becomes unrolled into a tree, and it's hard to walk it. We need supplemental information to work with it.
But what if we let the transformer writers walk graphs instead?
Imagine a more complex case:
a = bar(foo);
b = bar(foo) + bar(baz);
((:= a (bar foo))
(:= b (+ (bar foo) (bar baz))))
What if that parsed, in memory, into a graph structure like:
((:= a #1=(#2=bar foo))
(:= b (+ #1# (#2# baz))))
Would it be easier or harder to work with it?
I honestly have no answer; it's just some pondering that I had on the back of my mind for the last year or so.
The previous toot had a notation change on the diagram - function application is made explicit, and so is ordering of the edges (to compensate for PlantUML/Graphviz "optimizing" it).
But operators are functions too, so how about the same, but with all function applications explicit?
Attached is a picture of a proper graph form of the code.
I wonder, is that a more useful form to work with than AST?
I'm likely reinventing a branch of #compsci here; pointers to existing body of work appreciated.
@loke It should contain all information of the original Lisp form. If you start at the top and descend recursively (against the direction of arrows; I drew them in the "data flow" order), depth-first, but ensuring that at each level you go through alpha-arrow before you go through beta-arrow - you should be able to reconstruct my final Lisp form exactly.
The "edge ordering" thing is probably not typical for graphs, I don't know the name of this.
This Mastodon instance is for people interested in technology. Discussions aren't limited to technology, because tech folks shouldn't be limited to technology either!