We talk so much about Abstract Syntax Trees, but maybe we should talk about "abstract structure graphs"?

Say you have code like:

foo = 1;
foo += bar(foo);

In AST form, using Lisp notation:

'((:= "foo" 1)
(+= "foo" ("bar" "foo")))

In "ASG" form:

'((:= #1="foo" 1)
(+= #1# ("bar" #1#))

Where #1= and #1# mean all instances of #1# contain (via pointers) *the same* object as marked with #1=, not just an equal one.

Some pictures to clarify what I mean are attached.

1/?

Now I'm not a compiler guy, but I'm pretty sure the very next step after parsing some code into AST is turning it into "ASG" - whether represented as a graph structure directly, or by a combination of helper structures (e.g. symbol table) and dynamic state (e.g. "environments" carried by recursing functions). You need to do that in some way, to be able to correctly associate a thing (like variable, function) with all references to it in the AST.

I wonder, why not expose this abstraction to the user?

2/?

In a way, we already implicitly code in graphs, not in trees. Rules of most programming languages make it that when you write:

foo = 1; foo += bar(foo);

all three instances of "foo" are *meant to* refer to the same thing.

But then, for code transformation, we parse it into AST, and that "sameness" becomes implicit. The graph becomes unrolled into a tree, and it's hard to walk it. We need supplemental information to work with it.

But what if we let the transformer writers walk graphs instead?

3/?

Imagine a more complex case:

a = bar(foo);
b = bar(foo) + bar(baz);

In pseudo-Lisp,

((:= a (bar foo))
(:= b (+ (bar foo) (bar baz))))

What if that parsed, in memory, into a graph structure like:

((:= a #1=(#2=bar foo))
(:= b (+ #1# (#2# baz))))

(picture attached)

Would it be easier or harder to work with it?

I honestly have no answer; it's just some pondering that I had on the back of my mind for the last year or so.

4/?

The previous toot had a notation change on the diagram - function application is made explicit, and so is ordering of the edges (to compensate for PlantUML/Graphviz "optimizing" it).

But operators are functions too, so how about the same, but with all function applications explicit?

Attached is a picture of a proper graph form of the code.

I wonder, is that a more useful form to work with than AST?

I'm likely reinventing a branch of here; pointers to existing body of work appreciated.

5/?

Quick addendum: on the previous toot's diagram, the "function call" node is the only node repeating, but it's repeating nonetheless.

It feels like this one will be impossible to get rid of - it feels like a "structural operator", a subgraph demarker. But maybe it could be represented in some other way? A different way to draw this graph?

It's an open question for me now.

6/end.

· · Web · · ·

This Mastodon instance is for people interested in technology. Discussions aren't limited to technology, because tech folks shouldn't be limited to technology either!