Symbol
In previous blog posts I have talked about the abstract syntax tree and the implementation of Java’s TreeScanner. The abstract syntax tree is the result of the parse phase of compiling, and the command being used for it is task.parse(). The resulting tree of Tree objects can be traversed and visited with class TreeScanner. Tree objects have implementations in class JCTree, which is in a package that does not export by default.
In this blog post I’m going to go a bit deeper into Java’s compiler. The abstract syntax tree is only the first result in the compilation process. It is followed by the task.analyze() phase in which Symbol and Type objects are created. These two have their own logic which is not the logic of the AST, and in later phases of the compilation process the AST isn’t even used anymore.
What Symbol is
Java has the Symbol object, residing in com.sun.tools.javac.code.Symbol. The package is part of the jdk.compiler module. This module contains code that is not meant for regular use, it has the same disclaimer as JCTree:
This is NOT part of any supported API. If you write code that depends on this, you do so at your own risk. This code and its internal interfaces are subject to change or deletion without notice.
The definition of Symbol is ‘everything in the code with a name’. In the Symbol class you find a ton of methods and fields that provide all sorts of information about the specific symbol. The class contains static inner subclasses for specific types of symbols, like class, package, variable and method, each having additional methods and fields. Symbol is really a big thing.
Symbol’s interface: Element
Just as JCTree has a public API in the form of Tree (including many children of Tree), Symbol has a public API in the form of Element. Element is more limited in what it provides as methods and, being an interface, it has no fields that can be accessed. This is what Element looks like:
public interface Element extends javax.lang.model.AnnotatedConstruct {
// Methods typically related to underlying Symbol class
TypeMirror asType();
ElementKind getKind();
Set<Modifier> getModifiers();
Name getSimpleName();
Element getEnclosingElement();
List<? extends Element> getEnclosedElements();
// Methods form Object()
@Override
boolean equals(Object obj);
@Override
int hashCode();
// Methods inherited form AnnotatedConstruct
@Override
List<? extends AnnotationMirror> getAnnotationMirrors();
@Override
<A extends Annotation> A getAnnotation(Class<A> annotationType);
@Override
<A extends Annotation> A[] getAnnotationsByType(Class<A> annotationType);
// Elements can be visited, Symbol contains an inner interface named Visitor (Symbol.Visitor)
<R, P> R accept(ElementVisitor<R, P> v, P p);
The relevant methods here are the first six. They tell you what sort of Symbol/Element it is, what Elements it encloses and by what Element it is enclosed. Furthermore you get the Type (that’s for a next blog) and the modifiers.
This information is the most basic you can get, given what is inside the underlying Symbol object. The most telling methods are probably getEnclosingElement() and getEnclosedElements(). These provide information about where in the structure the Element resides. The structure in this case is not the Abstract Syntax Tree but a tree of nested scopes.
Symbol
To create some structure, I divide this chapter in multiple parts, namely the following:
- fields
- enums
- methods
- static inner classes
- the Kinds class
Fields
These are the top-level instance fields of Symbol. There are not that many (but wait for the methods):
public Kind kind;
public long flags_field;
public Name name;
public Type type;
public Symbol owner;
public Completer completer;
public Type erasure_field;
As you see they are all publicly accessible, which means you do not need getters. The flags_field variable, which is a 32 or 64 bit value representing the modifiers of a Symbol (more on this later) has a Javadoc comment on it that says that you are advised to use the method public long flags() { return flags_field; } to get its value. This is to make sure that the class symbol is loaded.
The ‘completer’ variable has to do, I suppose, with the fact that to be efficient Java does not load all symbols right on the start of the execution of the program. The owner variable is about the enclosing Symbol, this owner value is returned by the getEnclosingElement() method that we know from the Element interface.
Enums
Symbol has two inner enums, named ModuleFlags and ModuleResolutionFlags. This is the code for them:
public enum ModuleFlags {
OPEN(0x0020),
SYNTHETIC(0x1000),
MANDATED(0x8000);
public static int value(Set<ModuleFlags> s) {
int v = 0;
for (ModuleFlags f: s)
v |= f.value;
return v;
}
private ModuleFlags(int value) {
this.value = value;
}
public final int value;
}
public enum ModuleResolutionFlags {
DO_NOT_RESOLVE_BY_DEFAULT(0x0001),
WARN_DEPRECATED(0x0002),
WARN_DEPRECATED_REMOVAL(0x0004),
WARN_INCUBATING(0x0008);
public static int value(Set<ModuleResolutionFlags> s) {
int v = 0;
for (ModuleResolutionFlags f: s)
v |= f.value;
return v;
}
I have actually no idea what their use is but it might have to do with th JPMS.
Methods
There are 57 top-level methods in Symbol, which is too much to discuss. Fortunately it is possible to do some categorization.
Flag requests
The flags_field instance variable holds a 64 bit value (type long). It’s use is to hold bits not only for every possible modifier but also for properties of Symbols that the Java compiler discovers during the compilation process. Typical modifier flags, living in the first 12 bits, are keywords like FINAL, TRANSIENT and ABSTRACT. In subsequent flags you find terms like UNATTRIBUTED, ACYCLIC, EFFECTIVELY_FINAL and SYSTEM_MODULE.
These definitions, ie which bit sets which property, are found in the Flags class that lives in the same module as Symbol. The flag values are declared and initialized as static final ints like this:
public static final int TRANSIENT = 1<<7;
In Symbol these values are imported like this:
import static com.sun.tools.javac.code.Flags.*;
This means that the flag variables can be directly used in the Symbol code which results in a typical method like this:
public boolean isAbstract() {
return (flags_field & ABSTRACT) != 0;
}
While I do not know exactly how to work with bit operators, ChatGPT explained that this method checks whether the ABSTRACT flag, being the 11th bit, is set. I counted 14 methods that rely (partially) on this flag principle.
Scope-related methods
Symbol is related to Scope, and Scope is a class from the same package. This is what its Javadoc says:
A scope represents an area of visibility in a Java program. The Scope class is a container for symbols which provides efficient access to symbols given their names. Scopes are implemented as hash tables with “open addressing” and “double hashing”. Scopes can be nested. Nested scopes can share their hash tables.
There are methods that either tell you what scope the Symbol lives in, or what the class or package is in which your Symbol resides. THe latter requires some recursive method that goes up in scope until it meets the first class or package. Example of method that returns the closest enclosing class of a Symbol, it might be an inner class:
public ClassSymbol enclClass() {
Symbol c = this;
while (c != null &&
(!c.kind.matches(KindSelector.TYP) || !c.type.hasTag(CLASS))) {
c = c.owner;
}
return (ClassSymbol)c;
}
Here a method that provides the most outermost class that encloses the symbol. It does so by recursing to higher scopes until it meets the package scope:
public ClassSymbol outermostClass() {
Symbol sym = this;
Symbol prev = null;
while (sym.kind != PCK) {
prev = sym;
sym = sym.owner;
}
return (ClassSymbol) prev;
}
This one gives the package:
public PackageSymbol packge() {
Symbol sym = this;
while (sym.kind != PCK) {
sym = sym.owner;
}
return (PackageSymbol) sym;
}
Other methods, ones that we already encountered in the discussion of Element, are getEnclosingElement() and getEnclosedElements(). The latter has implementations for just a few Symbol types (ClassSymbol is the most obvious).
Furthermore there are three methods starting with ‘is’ and returning a boolean, namely isSubClass(Symbol base, Types types), isMemberOf(TypeSymbol clazz, Types types) and isEnclosedBy(ClassSymbol clazz). The first of these methods is only implemented for ClassSymbol.
ElementKind-related methods
There is an interface ElementKind in javax.lang.model.element and there is a class named Kinds in com.sun.tools.javac.code. As the package names suggest, the latter is more obscure and less friendly to use, being part of a non-exported package. The former is an exposed api and it is the one I’ll discuss first
Enum ElementKind contains the following values:
PACKAGE,
ENUM,
CLASS,
ANNOTATION_TYPE,
INTERFACE,
ENUM_CONSTANT,
FIELD,
PARAMETER,
LOCAL_VARIABLE,
EXCEPTION_PARAMETER,
METHOD,
CONSTRUCTOR,
INSTANCE_INIT,
OTHER,
RESOURCE_VARIABLE,
MODULE,
RECORD,
RECORD_COMPONENT,
BINDING_VARIABLE;
As you see these ElementKind values are quite straightforward. The enum contains a set of methods like isClass() which return true if the ElementKind value is one of a group of values. For isClass() true is returned if the value is CLASS, ENUM or RECORD for example.
The important method returng the ElementKind of a Symbol is getKind(). This method has overrides in the inner static classes of ClassSymbol, VarSymbol and MethodSymbol. Note that each of them can be cover multiple of these ElementTypes. A MethodSymbol for example can by of ElementKind CONSTRUCTOR, STATIC_INIT, INSTANCE_INIT or METHOD. A ClassSymbol can be of ElementKind ANNOTATION_TYPE, INTERFACE, ENUM, RECORD or CLASS.
Kinds- and Kind-related methods
Kinds is a class in com.sun.tools.javac.code with a user disclaimer. Code can change without notice. It contains a more precise categorization of all the diiferent available kinds then the more general ElementKind.
The Kinds class is hard to understand, the complex structure suggests that the makers wanted to be able to create very specific labels for different types of Symbols. The class is structured as follows:
public class Kinds {
private Kinds() {} // private constructor, uninstantiable
public enum Kind {}
public static class KindSelector {}
public enum KindName implements Formattable {}
// these methods translate ElementKind to KindName
public static KindName kindName(MemberReferenceTree.ReferenceMode mode) {}
public static KindName kindName(Symbol sym) {}
public static KindName typeKindName(Type t) {}
}
This Kinds class with its inner enums and inner static class is imported in Symbol in such a way that any value defined is available without prefix. These are the import statements:
import com.sun.tools.javac.code.Kinds.Kind;
import static com.sun.tools.javac.code.Kinds.*;
import static com.sun.tools.javac.code.Kinds.Kind.*;
Every Symbol has a field public Kind kind; that stores its kind. And actually there is no getter for it, nor is there any method that returns a Kind or KindName object. So this little chapter on ‘Kind- and KindName-related methods’ has nothing to show.
But on the other hand, in the method bodies of Symbol the <Symbol>.kind is ubiquitous. The Kinds class plays a big role, and the precise definition of what Kind of symbols exist seems to have been extremely useful to the makers. Below two examples of methods using ‘kind’:
public Symbol location() {
if (owner.name == null || (owner.name.isEmpty() &&
(owner.flags() & BLOCK) == 0 &&
owner.kind != PCK &&
owner.kind != TYP)) {
return null;
}
return owner;
}
public boolean isStatic() {
return
(flags() & STATIC) != 0 ||
(owner.flags() & INTERFACE) != 0 && kind != MTH &&
name != name.table.names._this;
}
Name-related methods
Several general methods return the name of the Symbol or are otherwise related to the name field. These methods do not return a String but a Name object. Name lives in package com.sun.tools.javac.util and is not exported. The reason that the Java compiler works with Name instead of String is because of speed and efficiency.
The principle behind name is that javac creates a Name.Table object (implemented by SharedNameTable) that stores a byte array, which is the list of all encountered names glued together to one long array of character bytes. A Name object is a reference object that has three values: the Table instance (every Context has one Table), a starting point/index, and a length. The latter two are stored as int.
This construction speeds everything up, largely due to the fact that Table will make sure that when two Name objects refer to the same character sequence, they will point to the same reference. Comparison is therefore way more efficient than comparing byte arrays or Strings based on their contents.
Name is not a unique identifier, only Symbol is. Two Symbol objects can have the same name and in that case Name is not able to distinguish them. Nevertheless, Name comparison can be used as a quick equality test, because two Symbols cannot be the same if they have a different name.
Besides Name (implemented by SharedNameTable.NameImpl) and Name.Table (implemented by SharedNameTable, note the crosslink pattern) there is the Names class. This names class functions as a factory for Name objects. When a new name is encountered by javac, it lets Names sort out whether the name already exist or that a new one has to be created. Names furthermore stores a list of well-known names like ‘field’ and ‘value’. Having these names set as fields somehow speeds things up even more.
Anyhow, the general methods in Symbol related to Name are:
public Name getSimpleName() {
return name;
}
public Name flatName() {
return getQualifiedName();
}
public Name getQualifiedName() {
return name;
}
public boolean isAnonymous() {
return name.isEmpty();
}
Static inner class ClassSymbol has its own versions of the methods. Naming of classes is a more delicate topic, which is illustrated by the fact that ClassSymbol has not one but two fields for its name:
public Name fullname; // FQN. Uses dots for packages and for nesting.
public Name flatname; // Uses dots for packages, $ for nested classes.
Fullname is the one easy to recognize, with just dots. Flatname exists because of inner classes. The JVM does not have a concept of inner class and this results in another notation, with ‘$’. Think also of local classes and anonymous classes. Java needs the flatname notation for this, otherwise it won’t be able to uniquely identify things.
This is what ChatGPT summarizes:
fullname preserves Java’s lexical nesting semantics, while flatname encodes the JVM’s flat classfile reality — and javac needs both all the time.
Its methods:
public String className() {
if (name.isEmpty())
return
Log.getLocalizedString("anonymous.class", flatname);
else
return fullname.toString();
}
@Override @DefinedBy(Api.LANGUAGE_MODEL)
public Name getQualifiedName() {
return isUnnamed() ? fullname.subName(0, 0) /* empty name */ : fullname;
}
@Override @DefinedBy(Api.LANGUAGE_MODEL)
public Name getSimpleName() {
return name;
}
public Name flatName() {
return flatname;
}
Static inner classes of Symbol
Symbol has static inner classes for different types of Symbol. There are 16 of them but some are more important than others, and besides, there is an inheritance tree that puts some more general static inner classes above more conncrete implementations. This is that tree:
Symbol
|
|-TypeSymbol
| |-TypeVariableSymbol
| |-ModuleSymbol
| |-PackageSymbol
| | |-RootPackageSymbol
| |
| |-ClassSymbol
|
|-VarSymbol
| |-RecordComponent
| |-BindingSymbol
| |-DynamicVarSymbol
|
|-MethodSymbol
|-DynamicMethodSymbol
|-MethodHandleSymbol
|-OperatorSymbol
The symbol types being used most often are PackageSymbol, Classsymbol, VarSymbol and Methodsymbol. Each of them has a substantial number of methods in it, ClassSymbol topping them all. I am not gonna discuss them all, time for a next thing.