We talk so much about Abstract Syntax Trees, but maybe we should talk about "abstract structure graphs"?

Say you have code like:

foo = 1;
foo += bar(foo);

In AST form, using Lisp notation:

'((:= "foo" 1)
(+= "foo" ("bar" "foo")))

In "ASG" form:

'((:= #1="foo" 1)
(+= #1# ("bar" #1#))

Where #1= and #1# mean all instances of #1# contain (via pointers) *the same* object as marked with #1=, not just an equal one.

Some pictures to clarify what I mean are attached.

1/?

Now I'm not a compiler guy, but I'm pretty sure the very next step after parsing some code into AST is turning it into "ASG" - whether represented as a graph structure directly, or by a combination of helper structures (e.g. symbol table) and dynamic state (e.g. "environments" carried by recursing functions). You need to do that in some way, to be able to correctly associate a thing (like variable, function) with all references to it in the AST.

I wonder, why not expose this abstraction to the user?

2/?

In a way, we already implicitly code in graphs, not in trees. Rules of most programming languages make it that when you write:

foo = 1; foo += bar(foo);

all three instances of "foo" are *meant to* refer to the same thing.

But then, for code transformation, we parse it into AST, and that "sameness" becomes implicit. The graph becomes unrolled into a tree, and it's hard to walk it. We need supplemental information to work with it.

But what if we let the transformer writers walk graphs instead?

3/?

Imagine a more complex case:

a = bar(foo);
b = bar(foo) + bar(baz);

In pseudo-Lisp,

((:= a (bar foo))
(:= b (+ (bar foo) (bar baz))))

What if that parsed, in memory, into a graph structure like:

((:= a #1=(#2=bar foo))
(:= b (+ #1# (#2# baz))))

(picture attached)

Would it be easier or harder to work with it?

I honestly have no answer; it's just some pondering that I had on the back of my mind for the last year or so.

4/?

The previous toot had a notation change on the diagram - function application is made explicit, and so is ordering of the edges (to compensate for PlantUML/Graphviz "optimizing" it).

But operators are functions too, so how about the same, but with all function applications explicit?

Attached is a picture of a proper graph form of the code.

I wonder, is that a more useful form to work with than AST?

I'm likely reinventing a branch of here; pointers to existing body of work appreciated.

5/?

Follow

Quick addendum: on the previous toot's diagram, the "function call" node is the only node repeating, but it's repeating nonetheless.

It feels like this one will be impossible to get rid of - it feels like a "structural operator", a subgraph demarker. But maybe it could be represented in some other way? A different way to draw this graph?

It's an open question for me now.

6/end.

· · Web · 0 · 0 · 0
Sign in to participate in the conversation
Mastodon for Tech Folks

This Mastodon instance is for people interested in technology. Discussions aren't limited to technology, because tech folks shouldn't be limited to technology either!