We talk so much about Abstract Syntax Trees, but maybe we should talk about "abstract structure graphs"?

Say you have code like:

foo = 1;
foo += bar(foo);

In AST form, using Lisp notation:

'((:= "foo" 1)
(+= "foo" ("bar" "foo")))

In "ASG" form:

'((:= #1="foo" 1)
(+= #1# ("bar" #1#))

Where #1= and #1# mean all instances of #1# contain (via pointers) *the same* object as marked with #1=, not just an equal one.

Some pictures to clarify what I mean are attached.

1/?

Now I'm not a compiler guy, but I'm pretty sure the very next step after parsing some code into AST is turning it into "ASG" - whether represented as a graph structure directly, or by a combination of helper structures (e.g. symbol table) and dynamic state (e.g. "environments" carried by recursing functions). You need to do that in some way, to be able to correctly associate a thing (like variable, function) with all references to it in the AST.

I wonder, why not expose this abstraction to the user?

2/?

In a way, we already implicitly code in graphs, not in trees. Rules of most programming languages make it that when you write:

foo = 1; foo += bar(foo);

all three instances of "foo" are *meant to* refer to the same thing.

But then, for code transformation, we parse it into AST, and that "sameness" becomes implicit. The graph becomes unrolled into a tree, and it's hard to walk it. We need supplemental information to work with it.

But what if we let the transformer writers walk graphs instead?

3/?

Imagine a more complex case:

a = bar(foo);
b = bar(foo) + bar(baz);

In pseudo-Lisp,

((:= a (bar foo))
(:= b (+ (bar foo) (bar baz))))

What if that parsed, in memory, into a graph structure like:

((:= a #1=(#2=bar foo))
(:= b (+ #1# (#2# baz))))

(picture attached)

Would it be easier or harder to work with it?

I honestly have no answer; it's just some pondering that I had on the back of my mind for the last year or so.

4/?

The previous toot had a notation change on the diagram - function application is made explicit, and so is ordering of the edges (to compensate for PlantUML/Graphviz "optimizing" it).

But operators are functions too, so how about the same, but with all function applications explicit?

Attached is a picture of a proper graph form of the code.

I wonder, is that a more useful form to work with than AST?

I'm likely reinventing a branch of here; pointers to existing body of work appreciated.

5/?

@temporal I spent some time looking at the graph and I think can't help but think it doesn't contain enough information to recreate the original structure. Either that or I don't understand the notation.

The Lisp form is very clear though.

Follow

@loke It should contain all information of the original Lisp form. If you start at the top and descend recursively (against the direction of arrows; I drew them in the "data flow" order), depth-first, but ensuring that at each level you go through alpha-arrow before you go through beta-arrow - you should be able to reconstruct my final Lisp form exactly.

The "edge ordering" thing is probably not typical for graphs, I don't know the name of this.

· · Web · 0 · 0 · 0
Sign in to participate in the conversation
Mastodon for Tech Folks

This Mastodon instance is for people interested in technology. Discussions aren't limited to technology, because tech folks shouldn't be limited to technology either!