We talk so much about Abstract Syntax Trees, but maybe we should talk about "abstract structure graphs"?
Say you have code like:
foo = 1;
foo += bar(foo);
In AST form, using Lisp notation:
'((:= "foo" 1)
(+= "foo" ("bar" "foo")))
In "ASG" form:
'((:= #1="foo" 1)
(+= #1# ("bar" #1#))
Where #1= and #1# mean all instances of #1# contain (via pointers) *the same* object as marked with #1=, not just an equal one.
Some pictures to clarify what I mean are attached.
1/?
Now I'm not a compiler guy, but I'm pretty sure the very next step after parsing some code into AST is turning it into "ASG" - whether represented as a graph structure directly, or by a combination of helper structures (e.g. symbol table) and dynamic state (e.g. "environments" carried by recursing functions). You need to do that in some way, to be able to correctly associate a thing (like variable, function) with all references to it in the AST.
I wonder, why not expose this abstraction to the user?
2/?
In a way, we already implicitly code in graphs, not in trees. Rules of most programming languages make it that when you write:
foo = 1; foo += bar(foo);
all three instances of "foo" are *meant to* refer to the same thing.
But then, for code transformation, we parse it into AST, and that "sameness" becomes implicit. The graph becomes unrolled into a tree, and it's hard to walk it. We need supplemental information to work with it.
But what if we let the transformer writers walk graphs instead?
3/?
Imagine a more complex case:
a = bar(foo);
b = bar(foo) + bar(baz);
In pseudo-Lisp,
((:= a (bar foo))
(:= b (+ (bar foo) (bar baz))))
What if that parsed, in memory, into a graph structure like:
((:= a #1=(#2=bar foo))
(:= b (+ #1# (#2# baz))))
(picture attached)
Would it be easier or harder to work with it?
I honestly have no answer; it's just some pondering that I had on the back of my mind for the last year or so.
4/?
The previous toot had a notation change on the diagram - function application is made explicit, and so is ordering of the edges (to compensate for PlantUML/Graphviz "optimizing" it).
But operators are functions too, so how about the same, but with all function applications explicit?
Attached is a picture of a proper graph form of the code.
I wonder, is that a more useful form to work with than AST?
I'm likely reinventing a branch of #compsci here; pointers to existing body of work appreciated.
5/?
@temporal I spent some time looking at the graph and I think can't help but think it doesn't contain enough information to recreate the original structure. Either that or I don't understand the notation.
The Lisp form is very clear though.
@ColinTheMathmo, since I have you in this thread - do you know if there exists, and/or could you help me find, a branch of mathematics that provides a language for higher-level transformations of graphs?
E.g. I have two directed graphs (or multigraphs) and want to join them together, based on some rules of how edges are formed.
Example use case here: https://mastodon.technology/@temporal/107323429225039113.
There must be a relevant body of vocabulary for this. I took a look at "graph algebra" on Wiki, but only got confused.
@ColinTheMathmo Thanks, this gives me some perspective.
I just stumbled upon relevant material, that just happened to be on HN's frontpage today:
https://valand.dev/blog/post/dont-bring-a-tree-to-a-mesh-fight
Can't decide if the article is wise or full of confusion, but it touches on some of the same things this toot thread did.
https://news.ycombinator.com/item?id=29301957
Associated HN thread, which is full of the kind of information I am interested in. Struck gold here.
@temporal There's a lot going on in both the article and the thread ... I hope they are helping you, even if only giving you more to think about.
I'm not sure I can help further, but feel free to ask. Don't forget, if it's helpful, you can ask @Chartodon to chart this conversation.
@loke