Typing in original Clojure(JVM/CLR)
Typing in original Clojure(JVM/CLR)
Interface Expr
provides two methods of relevance:
bool HasClrType { get; }
Type ClrType { get; }
Calling ClrType
when HasClrType
is false can result in an exception being thrown.
The obvious ones
Starting with the Expr
subtypes that are pretty obvious.
First up: Subtypes of LiteralExpr
:
Type | HasClrType |
ClrType |
Comment |
---|---|---|---|
BooleanExpr |
true | bool |
|
ConstantExpr |
see below | see below | |
KeywordExpr |
true | Keyword |
|
NilExpr |
true | null |
|
NumberExpr |
true | int long double |
Depends on the value. These are the only options allowed. |
StringExpr |
true | String |
Note that NilExpr
is the only expression type that has ClrType
== null
.
ConstantExpr
is a bit more complicated. It holds a value, call it v
.
If v
is one of APersistentMap
, APersistentSet
or APersistentVector
, then HasClrType
is true and ClrType
is the corresponding one of those three types. Otherwise, we use the type of v
provided it is IsPublic
or IsNestedPublic
or is an instance of Type
. For that last condition, we have in code
typeof(Type).IsInstanceOfType(v)
This relates to the fact that, e.g., the type of System.Int64
is RuntimeType
and RuntimeType
is not public. Or so it says.
There are three types that encapsulate values of our significant collections types.
Type | HasClrType |
ClrType |
Comment |
---|---|---|---|
MapExpr |
true | IPersistentMap |
|
VectorExpr |
true | IPersistentVector |
|
SetExpr |
true | IPersistentSet |
To these we can add:
Type | HasClrType |
ClrType |
Comment |
---|---|---|---|
EmptyExpr |
true | IPersistentList IPersistentMap IPersistentSet IPersistentVector |
Depends on the value we want the ‘empty’ of |
Some random examples that are still pretty obvious:
Type | HasClrType |
ClrType |
Comment |
---|---|---|---|
DefExpr |
true | Var |
|
ImportExpr |
false | throws | Is this obvious? Why not null? |
NewExpr |
true | The type we are constructing an instance of | |
TheVarExpr |
true | Var |
|
UntypedExpr |
false | throws | Includes MonitorEnterExpr , MonitorExitExpr , ThrowExpr |
UnresolvedVarExpr |
false | throws |
Expressions that pass through the type of their subexpression
Some Expr
types merely pass along the subtypes of one of their subexpressions.
Type | HasClrType |
ClrType |
Comment |
---|---|---|---|
AssignExpr |
Val.HasClrType |
Val.ClrType |
Val is the expression being assigned to the target |
BodyExpr |
LastExpr.HasClrType |
LastExpr.ClrType |
LastExpr is the last expression in the body |
CaseExpr |
returnType is not null |
returnType |
returnType is not null if all case branches have the same return type |
LetExpr LetFnExpr |
Body.HasClrType |
Body.ClrType |
Body is the body of the let / letfn expression |
MetaExpr |
expr.HasClrType |
expr.ClrType |
expr is the expression being wrapped |
TryExpr |
_tryExpr .HasClrType |
_tryExpr .ClrType |
_tryExpr is the expression being wrapped not sure why the catch clauses are not examined |
Recur and conditionsals
One of my favorite classes in the ClojureCLR code is Recur
:
public static class Recur
{
public static readonly Type RecurType = typeof(Recur);
}
This class exists only to provide its own type. The value Recur.RecurType
is the type of a recur
expression:
Type | HasClrType |
ClrType |
Comment |
---|---|---|---|
RecurExpr |
true | Recur.RecurType |
A recur
expression does not have a value. It is looping construct, basically a go-to with local variable assignments. Nothing can follow it in the flow of control. If it is in a do
, for example, it can only occur as the last expression. This is essentially a tail call position.
The only place where Recur.RecurType
is used, other than RecurExpr
, is in IfExpr
. When does an IfExpr
have a type? Both its thenExpr
and its elseExpr
must have a type. and the types must be ‘compatible’. The types are compatible if:
- They are equal.
- One of them is
Recur.RecurType
- One of them is
null
and the other is not a value type (on the JVM, a primitive types)
Tags
For most of the remaining expression types, a type can be derived in two ways. One is from an analysis of its constituents, things like subexpressions or method information. The other is for the expression to have a tag. Tags always override internal analysis.
The tags usually are interpreted by the method HostExpr.TagToType
. There are several possibilities.
- the tag is a
Symbol
without a namespace:- We check if it is in the group of special names:
int
,long
,ints
,longs
, etc. - We check if it is mapped to a type in the current namespace.
- We check if it is in the group of special names:
- the tag is a
Symbol
with a namespace: We check if is an array type:int/5
orString/1
, for example. - the tag is a type: We use the type.
- If the tag is a
Symbol
with no namespace or a string, we try to look up the type (according to what’s appropriate for JVM vs CLR).
In the code for tagged forms, we typically see things like:
public override bool HasClrType => _tag != null || _tinfo != null;
public override Type ClrType => _tag != null ? HostExpr.TagToType(_tag) : _tinfo.FieldType;
Sometimes there is nothing besides the tag, with these forms corresponding simplified.
In the discussion below, I’ll just say hasTag
or tagType
to convey the notion.
Some tagged expressions
Let’s do some of the simpler tagged expreesions first.
Type | HasClrType |
ClrType |
Comment |
---|---|---|---|
VarExpr |
hasTag | tagType | see below for where the tag might come from |
LocalBindingExpr |
hasTag or lb.HasClrType |
tagType or lb.ClrType |
lb is the local binding |
The way one gets a VarExpr
is when syntactic analysis hits a symbol that maps to Var
in the current namespace. If the symbol itself is tagged, that is used. Else, we see if the Var itself itself is tagged – this might have come from when it def
-d to begin with, the tag coming from the symbol in the def
form.
The LocalBinding
referenced in a LocalBindingExpr
, though not an Expr
, does have methods HasClrType
and ClrType
. LocalBinding.HasClrType
is exceptionally ugly. It does some caching of values that complicates the code; removing that we are left with the following:
public bool HasClrType
{
get
{
if (Init != null
&& Init.HasClrType
&& Util.IsPrimitive(Init.ClrType)
&& !(Init is MaybePrimitiveExpr))
return false;
else
return Tag != null || (Init != null && Init.HasClrType);
}
}
public Type ClrType => Tag != null ? HostExpr.TagToType(Tag) : Init.ClrType;
This has the overall format of the code for preferring tags over inferred type.
The complicated conditional in HasClrType
can be put in words as:
If there is an initializer and it has a type and the type is primitive but the initializer expression is not capable of emitting a primitive, then this local binding does not have a type.
Note that in this case whether we have a tag or not is irrelevant. Why can a tag on the local binding symbols not override the type of the initializer? I don’t know. (In the constructor for LocalBinding
, an exception is thrown if there is tag and the intializer is a MaybePrimitiveExpr
and the return type is primitive but not void.) I suppose because you are going to get a boxed primitive and there is no point in trying to pretend otherwise with a a tag.
InvokeExpr
and friends
InvokeExpr
is more complicated. The parser for InvokeExpr
is the last resort when analyzing a form that looks like (f arg1 arg2 ... )
. We’ve tried macroexanding it, seeing if f
is a Var
with an inline definition, checking for f
representing special forms that have their own handlers, such as fn*
, let*
, if
, .
, etc. If none of these work, we call InvokeExpr
.
The parser for InvokeExpr
produces several kinds of expressions, depending on the nature of f
: InstanceOfExpr
, StaticInvokeExpr
, KeywordInvokeExpr
, StaticInvokeExpr
, various kinds of interop call expressions, and InvokeExpr
itself. The details of the parser how they parser picks each of these is not so relevant here.
The various kinds of interop calls that are generated will be covered in the next section. KeywordInvokeExpr
and InstanceOfExpr
are quite simple (see table below). InvokeExpr
and StaticInvokeExpr
have the same calculation of typing information – use the tag if it exists.
The twist here is how the tag is calculated. In order of preference:
- The tag on the form itself.
- The signature tag on the
:arglists
metadata of theVar
thatf
resolves to. - The tag on the
Var
itself.
For the second item, the :arglists
metadata should be a list of signatures, each signature being a vector such as [coll x]
or [coll x & xs]
. We search the list of signatures to find a match according to the number of arguments in the call, with appropriate handling of variadic signatures. If we find a match, we take the tag on that vector, if it exists. That give us:
Type | HasClrType |
ClrType |
Comment |
---|---|---|---|
KeywordInvokeExpr |
hasTag | tagType | |
InstanceOfExpr |
true | bool |
|
InvokeExpr StaticInvokeExpr |
hasTag | tagType | Tag computed as described above |
Interop calls
The interop calls are the subtypes of HostExpr
. For typing purposes, they are all the same. Each tries to identify a method/property/field to call. If it exists, it will have a return type. If it doesn’t exist, we will be coding reflection for the call and we have no information on the call’s return type. If a tag is provided, it is used instead. The exception to this is InstanceZeroArityCallExpr
; when it is issued, we are definitely in a reflection situation, so on the tag can be used.
There is one form of interop call that is not a subtype of HostExpr
: QualifiedMethodExpr
. This comes into play when the functional form is of for the form Type/name
. If name
starts with a .
, this is intneded to be an instance method call. If the name
is new
, then a constructor call intended. Else, it is a static method call. A QME can come either in the functional position (head of an invocation) or in a value position.
If it is in a functional position, it is converted into of the subtypes of HostExpr
. In this case, any tag will be used to indicate the CLR type, but it is not clear where this is used.
| Type | HasClrType
| ClrType
| Comment |
|:——-|:————–:|:———–:|:———|
| InstanceFieldExpr
InstancePropertyExpr
InstanceMethodExpr
StaticFieldExpr
StaticPropertyExpr
StaticMethodExpr | _hasTag_ or _hasRetType_ | _tagType_ or _retType_ | |
|
InstanceZeroArityCallExpr | _hasTag_ | _tagType_ | |
|
QualifiedMethodExpr` | hasTag | tagType | |
ObjExpr
and friends
ObjExpr
– a name I do not understand – is an abstract class with concrete implementations FnExpr
and NewInstanceExpr
, the latter a name I also don’t understand. FnExpr
is used for regular IFn
functions. NewInstanceExpr
comes from deftype
and reify
. The CLR type of an FnExpr
or NewInstanceExpr
is some kind of functional type. The defaults are AFunction
and IFn
, respectively. Mostly these don’t matter, though there are places where the fact that an FnExpr
is an AFunction
comes into play, specifically where protocol implementation is involved. If a tag is provided, it is used instead, but I don’t know why.
Even odder is NewInstanceExpr
. It inherits ClrType
from ObjExpr
, which calculates its type from
- Compiled class – the class that is generated for the
deftype
orreify
form. - Tag – if it exists.
IFn
– otherwise.
I cannot find any way that ClrType
could be called with the compiled class having already been generated.
Type | HasClrType |
ClrType |
Comment |
---|---|---|---|
FnExpr |
true | tagType or AFunction |
|
NewInstanceExpr |
true | compiled-type or tagType or IFn |
Of more interest are the method classes: FnMethod
and NewInstanceMethod
, both subclasses of ObjMethod
.
These are where the actual code for the functions are located, across various arities.
ObjMethod
defines properties
public abstract Type ReturnType { get; }
public abstract Type[] ArgTypes { get; }
In FnMethod
these are implemented as
public override Type[] ArgTypes
{
get
{
if (IsVariadic && _reqParms.count() == Compiler.MaxPositionalArity)
{
Type[] ret = new Type[Compiler.MaxPositionalArity + 1];
for (int i = 0; i < Compiler.MaxPositionalArity + 1; i++)
ret[i] = typeof(Object);
return ret;
}
return Compiler.CreateObjectTypeArray(NumParams);
}
}
public override Type ReturnType
{
get
{
if (_prim != null) // Objx.IsStatic)
return _retType;
return typeof(object);
}
}
These are used to generate the signatures for the invoke methods.
Thus we have an array whose values are all typeof(Object)
– that is the typing for invokes.
The return type is typeof(object)
unless the method is a primitive method,
in which case the return type is the primitive type.
The retType
value initially is transferred from the :tag
metadata of the defining
form on the name in the defn
form. But that is the last priority.
Higher priority are the :tag
metadata on the parameter vector and the :tag
metadata on the :arglists
entry (for the signature with matching argument count).
MaybePrimitiveExpr
Some of the node types implement the MaybePrimitiveExpr
interface, defined as follows.
public interface MaybePrimitiveExpr : Expr
{
bool CanEmitPrimitive { get; }
void EmitUnboxed(RHC rhc, ObjExpr objx, CljILGen ilg);
}
Implementing this interface indicates there is a chance the expression can emit an unboxed primitive.
In the context of classic Clojure(JVM/CLR) the only primitives allowed are long
and double
. In ClojureCLR.Next, the intentions is to have a compiler mode that extends this to all value types.
The expression types that implement MaybePrimitiveExpr
are:
LocalBindingExpr |
BodyExpr |
CaseExpr |
IfExpr |
|
InstanceOfExpr |
LetExpr |
LetFnExpr |
MethodParamExpr |
|
NumberExpr |
RecurExpr |
StaticInvokeExpr |
HostExpr +subtypes |
Let us consider one example. When can BodyExpr
emit an unboxed primitive? Its code consists of code for each of the expressions in the body, with all but the last having their values discarded.
The last expression is the one that matters. If it is a MaybePrimitiveExpr
, then BodyExpr
can emit an unboxed primitive. If it is not, then BodyExpr
cannot emit an unboxed primitive.
public bool CanEmitPrimitive
{
get { return LastExpr is MaybePrimitiveExpr expr && expr.CanEmitPrimitive; }
}
What is the difference between the regular Emit
code and the EmitUnboxed
code? The former concludes with:
LastExpr.Emit(rhc, objx, ilg);
The latter concludes with
MaybePrimitiveExpr mbe = (MaybePrimitiveExpr)LastExpr;
mbe.EmitUnboxed(rhc, objx, ilg);
What expression types are not MaybePrimitiveExpr
? And can we guess why?
Type | Reason? |
---|---|
BooleanExpr |
It has a bool value, but the only primitives that count are long and double . |
ConstantExpr |
It might hold a value which is a primitive, but double or long constants will be parsed as NumberExpr , so this can’t be primitive by our definition. |
InvokeExpr |
this will call some variant of IFn.invoke which has returns a reference. |
KeywordInvokeExpr |
Does a lookup in a map, will return a reference. |
DefExpr TheVarExpr VarExpr |
Returns a Var |
KeywordExpr |
Returns a Keyword |
MetaExpr |
The wrapped expressions must be an IObj , which is a reference type. |
NewExpr |
Returns a new instance of a class, which is a reference type. |
EmptyExpr MapExpr SetExpr VectorExpr |
These are collections, which are reference types. |
StringExpr |
Returns a String |
ObjExpr FnExpr NewInstanceExpr |
These are functional types, which are reference types. |
ImportExpr MonitorEnterExpr MonitorExitExpr ThrowExpr UnresolvedVarExpr UntypedExpr |
Void or no return at all |
QualifiedMethodExpr |
Generates an FnExpr |
There are two expression types for which I’m not sure why they couldn’t be MaybePrimitiveExpr
: AssignExpr
and TryExpr
. I may have to think more.
The real essence of MaybePrimitiveExpr
becomes apparent when you look at how code is emitted, specifically, the attempts to avoid boxing for known primitive values. That is for another time and place.