Symbol
In previous blog posts I have talked about the abstract syntax tree and the implementation of Java’s TreeScanner. The abstract syntax tree is the result of the parse phase of compiling, and the command being used for it is task.parse(). The resulting tree of Tree objects can be traversed and visited with class TreeScanner. Tree objects have implementations in class JCTree, which is in a package that does not export by default.
In this blog post I’m going to go a bit deeper into Java’s compiler. The abstract syntax tree is only the first result in the compilation process. It is followed by the task.analyze() phase in which Symbol and Type objects are created. These two have their own logic which is not the logic of the AST, and in later phases of the compilation process the AST isn’t even used anymore.
What Symbol is
Java has the Symbol object, residing in com.sun.tools.javac.code.Symbol. The package is part of the jdk.compiler module. This module contains code that is not meant for regular use, it has the same disclaimer as JCTree:
This is NOT part of any supported API. If you write code that depends on this, you do so at your own risk. This code and its internal interfaces are subject to change or deletion without notice.
The definition of Symbol is ‘everything in the code with a name’. In the Symbol class you find a ton of methods and fields that provide all sorts of information about the specific symbol. The class contains static inner subclasses for specific types of symbols, like class, package, variable and method, each having additional methods and fields. Symbol is really a big thing.
Symbol’s interface: Element
Just as JCTree has a public API in the form of Tree (including many children of Tree), Symbol has a public API in the form of Element. Element is more limited in what it provides as methods and, being an interface, it has no fields that can be accessed. This is what Element looks like:
public interface Element extends javax.lang.model.AnnotatedConstruct {
// Methods typically related to underlying Symbol class
TypeMirror asType();
ElementKind getKind();
Set<Modifier> getModifiers();
Name getSimpleName();
Element getEnclosingElement();
List<? extends Element> getEnclosedElements();
// Methods form Object()
@Override
boolean equals(Object obj);
@Override
int hashCode();
// Methods inherited form AnnotatedConstruct
@Override
List<? extends AnnotationMirror> getAnnotationMirrors();
@Override
<A extends Annotation> A getAnnotation(Class<A> annotationType);
@Override
<A extends Annotation> A[] getAnnotationsByType(Class<A> annotationType);
// Elements can be visited, Symbol contains an inner interface named Visitor (Symbol.Visitor)
<R, P> R accept(ElementVisitor<R, P> v, P p);
The relevant methods here are the first six. They tell you what sort of Symbol/Element it is, what Elements it encloses and by what Element it is enclosed. Furthermore you get the Type (that’s for a next blog) and the modifiers.
This information is the most basic you can get, given what is inside the underlying Symbol object. The most telling methods are probably getEnclosingElement() and getEnclosedElements(). These provide information about where in the structure the Element resides. The structure in this case is not the Abstract Syntax Tree but a tree of nested scopes.
Symbol
To create some structure, I divide this chapter in multiple parts, namely the following:
- fields
- enums
- methods
- static inner classes
- the Kinds class
Fields
These are the top-level instance fields of Symbol. There are not that many (but wait for the methods):
public Kind kind;
public long flags_field;
public Name name;
public Type type;
public Symbol owner;
public Completer completer;
public Type erasure_field;
As you see they are all publicly accessible, which means you do not need getters. The flags_field variable, which is a 32 or 64 bit value representing the modifiers of a Symbol (more on this later) has a Javadoc comment on it that says that you are advised to use the method public long flags() { return flags_field; } to get its value. This is to make sure that the class symbol is loaded.
The ‘completer’ variable has to do, I suppose, with the fact that to be efficient Java does not load all symbols right on the start of the execution of the program. The owner variable is about the enclosing Symbol, this owner value is returned by the getEnclosingElement() method that we know from the Element interface.
Enums
Symbol has two inner enums, named ModuleFlags and ModuleResolutionFlags. This is the code for them:
public enum ModuleFlags {
OPEN(0x0020),
SYNTHETIC(0x1000),
MANDATED(0x8000);
public static int value(Set<ModuleFlags> s) {
int v = 0;
for (ModuleFlags f: s)
v |= f.value;
return v;
}
private ModuleFlags(int value) {
this.value = value;
}
public final int value;
}
public enum ModuleResolutionFlags {
DO_NOT_RESOLVE_BY_DEFAULT(0x0001),
WARN_DEPRECATED(0x0002),
WARN_DEPRECATED_REMOVAL(0x0004),
WARN_INCUBATING(0x0008);
public static int value(Set<ModuleResolutionFlags> s) {
int v = 0;
for (ModuleResolutionFlags f: s)
v |= f.value;
return v;
}
I have actually no idea what their use is but it might have to do with th JPMS.
Methods
There are 57 top-level methods in Symbol, which is too much to discuss. Fortunately it is possible to do some categorization.
Flag requests
The flags_field instance variable holds a 64 bit value (type long). It’s use is to hold bits not only for every possible modifier but also for properties of Symbols that the Java compiler discovers during the compilation process. Typical modifier flags, living in the first 12 bits, are keywords like FINAL, TRANSIENT and ABSTRACT. In subsequent flags you find terms like UNATTRIBUTED, ACYCLIC, EFFECTIVELY_FINAL and SYSTEM_MODULE.
These definitions, ie which bit sets which property, are found in the Flags class that lives in the same module as Symbol. The flag values are declared and initialized as static final ints like this:
public static final int TRANSIENT = 1<<7;
In Symbol these values are imported like this:
import static com.sun.tools.javac.code.Flags.*;
This means that the flag variables can be directly used in the Symbol code which results in a typical method like this:
public boolean isAbstract() {
return (flags_field & ABSTRACT) != 0;
}
While I do not know exactly how to work with bit operators, ChatGPT explained that this method checks whether the ABSTRACT flag, being the 11th bit, is set. I counted 14 methods that rely (partially) on this flag principle.
Scope-related methods
Symbol is related to Scope, and Scope is a class from the same package. This is what its Javadoc says:
A scope represents an area of visibility in a Java program. The Scope class is a container for symbols which provides efficient access to symbols given their names. Scopes are implemented as hash tables with “open addressing” and “double hashing”. Scopes can be nested. Nested scopes can share their hash tables.
There are methods that either tell you what scope the Symbol lives in, or what the class or package is in which your Symbol resides. THe latter requires some recursive method that goes up in scope until it meets the first class or package. Example of method that returns the closest enclosing class of a Symbol, it might be an inner class:
public ClassSymbol enclClass() {
Symbol c = this;
while (c != null &&
(!c.kind.matches(KindSelector.TYP) || !c.type.hasTag(CLASS))) {
c = c.owner;
}
return (ClassSymbol)c;
}
Here a method that provides the most outermost class that encloses the symbol. It does so by recursing to higher scopes until it meets the package scope:
public ClassSymbol outermostClass() {
Symbol sym = this;
Symbol prev = null;
while (sym.kind != PCK) {
prev = sym;
sym = sym.owner;
}
return (ClassSymbol) prev;
}
This one gives the package:
public PackageSymbol packge() {
Symbol sym = this;
while (sym.kind != PCK) {
sym = sym.owner;
}
return (PackageSymbol) sym;
}
Other methods, ones that we already encountered in the discussion of Element, are getEnclosingElement() and getEnclosedElements(). The latter has implementations for just a few Symbol types (ClassSymbol is the most obvious).
Furthermore there are three methods starting with ‘is’ and returning a boolean, namely isSubClass(Symbol base, Types types), isMemberOf(TypeSymbol clazz, Types types) and isEnclosedBy(ClassSymbol clazz). The first of these methods is only implemented for ClassSymbol.
ElementKind-related methods
There is an interface ElementKind in javax.lang.model.element and there is a class named Kinds in com.sun.tools.javac.code. As the package names suggest, the latter is more obscure and less friendly to use, being part of a non-exported package. The former is an exposed api and it is the one I’ll discuss first
Enum ElementKind contains the following values:
PACKAGE,
ENUM,
CLASS,
ANNOTATION_TYPE,
INTERFACE,
ENUM_CONSTANT,
FIELD,
PARAMETER,
LOCAL_VARIABLE,
EXCEPTION_PARAMETER,
METHOD,
CONSTRUCTOR,
INSTANCE_INIT,
OTHER,
RESOURCE_VARIABLE,
MODULE,
RECORD,
RECORD_COMPONENT,
BINDING_VARIABLE;
As you see these ElementKind values are quite straightforward. The enum contains a set of methods like isClass() which return true if the ElementKind value is one of a group of values. For isClass() true is returned if the value is CLASS, ENUM or RECORD for example.
The important method returng the ElementKind of a Symbol is getKind(). This method has overrides in the inner static classes of ClassSymbol, VarSymbol and MethodSymbol. Note that each of them can be cover multiple of these ElementTypes. A MethodSymbol for example can by of ElementKind CONSTRUCTOR, STATIC_INIT, INSTANCE_INIT or METHOD. A ClassSymbol can be of ElementKind ANNOTATION_TYPE, INTERFACE, ENUM, RECORD or CLASS.
Kinds- and Kind-related methods
Kinds is a class in com.sun.tools.javac.code with a user disclaimer. Code can change without notice. It contains a more precise categorization of all the diiferent available kinds then the more general ElementKind.
The Kinds class is hard to understand, the complex structure suggests that the makers wanted to be able to create very specific labels for different types of Symbols. The class is structured as follows:
public class Kinds {
private Kinds() {} // private constructor, uninstantiable
public enum Kind {}
public static class KindSelector {}
public enum KindName implements Formattable {}
// these methods translate ElementKind to KindName
public static KindName kindName(MemberReferenceTree.ReferenceMode mode) {}
public static KindName kindName(Symbol sym) {}
public static KindName typeKindName(Type t) {}
}
This Kinds class with its inner enums and inner static class is imported in Symbol in such a way that any value defined is available without prefix. These are the import statements:
import com.sun.tools.javac.code.Kinds.Kind;
import static com.sun.tools.javac.code.Kinds.*;
import static com.sun.tools.javac.code.Kinds.Kind.*;
Every Symbol has a field public Kind kind; that stores its kind. And actually there is no getter for it, nor is there any method that returns a Kind or KindName object. So this little chapter on ‘Kind- and KindName-related methods’ has nothing to show.
But on the other hand, in the method bodies of Symbol the <Symbol>.kind is ubiquitous. The Kinds class plays a big role, and the precise definition of what Kind of symbols exist seems to have been extremely useful to the makers. Below two examples of methods using ‘kind’:
public Symbol location() {
if (owner.name == null || (owner.name.isEmpty() &&
(owner.flags() & BLOCK) == 0 &&
owner.kind != PCK &&
owner.kind != TYP)) {
return null;
}
return owner;
}
public boolean isStatic() {
return
(flags() & STATIC) != 0 ||
(owner.flags() & INTERFACE) != 0 && kind != MTH &&
name != name.table.names._this;
}
Name-related methods
Several general methods return the name of the Symbol or are otherwise related to the name field. These methods do not return a String but a Name object. name lives in package com.sun.tools.javac.util and is not exported. The general methods in Symbol related to Name are:
public Name getSimpleName() {
return name;
}
public Name flatName() {
return getQualifiedName();
}
public Name getQualifiedName() {
return name;
}
public boolean isAnonymous() {
return name.isEmpty();
}
Static inner class ClassSymbol has its own versions of the methods. Naming of classes is a more delicate topic, which is illustrated by the fact that ClassSymbol has not one but two fields for its name:
public Name fullname;
public Name flatname;
Its methods:
public String className() {
if (name.isEmpty())
return
Log.getLocalizedString("anonymous.class", flatname);
else
return fullname.toString();
}
@Override @DefinedBy(Api.LANGUAGE_MODEL)
public Name getQualifiedName() {
return isUnnamed() ? fullname.subName(0, 0) /* empty name */ : fullname;
}
@Override @DefinedBy(Api.LANGUAGE_MODEL)
public Name getSimpleName() {
return name;
}
public Name flatName() {
return flatname;
}