Trees

Trees is an abstract class (com.sun.source.util.Trees) which has two implementations, namely JavacTrees and DocTrees. Both live in com.sun.tools.javac.api. Trees, in its various implementations, is a helper class that, according to the short javadoc comment on top, bridges JSR 199, JSR 269, and the Tree API.

JSR 199 describes javax.tools
JSR 269 describes javax.lang.model
The Tree API describes com.sun.source.tree

The javax.tools package contains interfaces like JavaCompiler, JavaFileManager and Tool, plus classes like SimpleJavaFileObject and ToolProvider. The javax.lang.model package contains the subpackages element (interfaces for Symbol (sub)classes), type (interfaces for (sub)types of Type, and util (abstract Visitor classes for different Java versions, plus Elements interface that is implemented by com.sun.tools.javac.model.JavacElements, the latter being a utility class to operate on ‘program elements’). The com.sun.source.tree contains all Tree interfaces.

What can be bridged

In this section I sum up all the ‘bridges’ that Trees can make, ie the conversions form one sort of object to another. These are not casts, it is more like

From TreePath to Symbol

If we look at the first code block in the previous blog post, where we try to pick the ClassSymbol from a file, we use this line:

currentClass = (ClassSymbol) trees.getElement(getCurrentPath());

It is the getElement(TreePath treePath) that makes the bridge. It returns the interface type (Element) by default, but with a cast you can get the underlying runtime type.

If the element is not available (it is a field in JCTree but it is only set somewhere during the analyze phase), the method will return null.

If the TreePath does not identify a Tree that has a Symbol in it, the method returns an IllegalArgumentException.

From Symbol to TreePath

This method returns a TreePath if you provide a Symbol:

public abstract TreePath getPath(Element e);

If the node cannot be found, it will return null.

From Symbol to Tree

This method is less powerful than the previous one. Returns null if the node cannot be found:

public abstract Tree getTree(Element element);

From Tree to TreePath

Often you need the TreePath, because via TreePath you can find Symbol:

public abstract TreePath getPath(CompilationUnitTree unit, Tree node);

Note that when you work in an extension of TreePathScanner, you can use this method:

    public TreePath getCurrentPath() {
        return path;
    }

TreePathScanner always knows its TreePath. This getCurrentPath() method is actually the only new method that is provided by TreePathScanner above everything that is inherited from its parent TreeScanner.

From TreePath to Scope

While I don’t know where I could put this method to use, I know there is a relationship between TreePath and Scope and that both are required to understand semantics, and semantics requires context. The implementation of this method in JavacScope is not an easy one, it involves classes JavacScope, AttrContext and Env.

The Scope interface contains methods to retrieve the symbols of the enclosing class and the enclosing method (if applicable). You can also use the getLocalElements() method to get an iterable of all elements/symbols enclosed by this scope. Furthermore there is the getEnclosingScope() method, which allows you to get upwards in the nested-scopes structure.

public abstract Scope getScope(TreePath path);

From TreePath to TypeMirror

TypeMirror is the public interface for Type and you can retrieve it with the following method:

public abstract TypeMirror getTypeMirror(TreePath path);

It returns null if the TypeMirror is not resolved/available, or IllegalArgumentException if the provided TreePath has no associated TypeMirror.

Semantics and context

I had trouble fully understanding the difference between syntax and semantics. Yes, syntax is about the correct application of grammar rules and it does only require that the code can be parsed to an ordered tree that follows the syntax rules. Semantics is about meaning, about knowing what the named things in the code actually represent.

What I understand better now is that to figure out semantics, you need to figure out the specific context in which a named thing appears. If we apply this to natural language, we might have the following text:

He left his keys on the table. She put them in his pocket.

Both sentences are grammatically correct, there is no problem with syntax. If the two sentences have nothing to do with each other, things would still be grammatically correct. That would even be true for this variant:

He left his ideas in the elbow. She pushed it within butter.

Returning to the first text, semantics comes up with the words ‘them’ and ‘his’ in the second sentence. ‘Them’ can only be understood by reading both sentences, as it refers to ‘his keys’. There is context required to determine the meaning of ‘them’. The word ‘his’ in the second sentence is even more problematic, as it could refer to ‘He’ in the first line but also to some male person that is mentioned elsewhere.

Sometimes you need a lot of context to understand the meaning of a pronoun. Have a look at this text:

Mary had lived all of her life in Philadelphia. Being the daughter of a renowned lawyer, she had always suffered from high expectations and had never learned to be happy with life’s simple things. Even late in life, after having achieved more than almost anyone, it struck people in her hometown how oblivious she was for friendly gestures.

You can group together the following words or wordgroups with the same meaning:

Mary, the daughter of a renowned lawyer, she
life’s simple things, friendly gestures
Philadelphia, her hometown

If you semantically analyze text, or code, you need to read back in the text to find out what the pronouns refer to. There is always some declaration, ie the first time a named thing appears, and after that they can be referenced. Sometimes it is hard to figure out, and while in natural language we do not have many problems with it, in code it is hard for the compiler to figure out to which declaration pronouns (identifiers) refer.

For example, you can have overloaded methods and have to decide which one to apply. You can shadow variables or override methods, and then the compiler must understand which version of the variable or which override to choose.

To do so, the compiler must be able to look back in the code and that is why TreePath, Scope and Env exist. They allow the compiler, on demand, to search the context of an invocation of a method or a variable and determine which declaration is the right one.

The Java compiler, as ChatGPT learned me, makes a clear distinction between syntax and semantics. It does use the AST for syntax and other tools for semantic resolution. The AST works downwards until the leafs are reached and a Tree has no way of knowing its parent Tree. This makes the AST unfit for semantic resolution, as semantic resolution requires traversal upwards instead of downwards. To make upwards traversal possible, there is TreePath, and Scope and Env are possible because of it.

Analyzing the content of a BlockTree

Semantic resolution