Typename syntax and resolution in ClojureCLR
Wherein we describe the significant enhancements to typename syntax and resolution in ClojureCLR effective with version 1.12.3-alpha2.
TL;DR
Several significant improvements have been made to typename syntax and resolution in ClojureCLR.
-
Discover and automatically load assemblies that the application depends on and assemblies that are part of the .NET runtime, so that types in those assemblies can be found without explicit assembly loading. You will never write
(assembly-load "System.Text.Json")
again; if you execute(System.Text.Json.JsonSerializer/Serialize ...)
, the assembly will be loaded automatically if not alread loaded. - You can define type aliases for any type, including generic types.
- You can use type aliases at the top level or embedded as generic type parameters.
- You can use the built-in Clojure primitive type names such as
int
,long
,shorts
, etc. as generic type parameters. - In many places, you no longer need to include the arity of the generic type in the name.
With these changes, you can write code like this:
(add-type-alias 'List |System.Collections.Generic.List`1|)
(add-type-alias 'IntList |List[int]|)
(def il (IntList/new))
(defn f [^IntList xs) ... )
ClojureCLR typename syntax and resolution: status quo ante
I wrote previously about typename resolution in ClojureCLR in Are you my type?. That post described the strategies used to look up types by name, and some of the tradeoffs involved.
Two prominent pain points in dealing with typenames and resolving them are:
-
the need to call
assembly-load
or related functions to load assemblies before types in them can be resolved, when the assemblies of interest could be automatically discovered and loaded. -
the need to use fully namespace-qualifed names, explicit number of generic parameter counts, and other syntactic burdens when referring to types.
The first is self-explanatory. For the second, a poster child of the problem is
|System.Collections.Generic.Dictionary`2[System.String,System.Collections.Generic.List`1[System.Int64]]|
- Why must we write
System.Int64
instead of justint
? - Why must we write
System.String
instead of justString
? We can just writeString
when used direclty as a type hint. Why not here? - Why must we write
Dictionary`2
instead of justDictionary
? We have the context to infer the arity of the generic type definition.
One has to make direct reference by name to underlying platform types in various places in Clojure(JVM or CLR) code. Type hints to avoid reflection are one example. There are some places where a string can be used instead of a symbol to refer to a type, but often a symbol must be used. Which presented a problem for ClojureCLR given the complexity of CLR types.
For a variety of reasons, I decided to use the syntax of fully-qualified type names used by the CLR itself. This is the syntax used by methods such as Type.GetType()
and Assembly.GetType()
.
A detailed specfication for fully qualified type names can be found here: Specify fully qualified type names. You might also want to take a peek at Type.AssemblyQualifiedName Property. I’ll have occasion to look at a few of the its more obscure details later.
A non-trivial problem with that choice; the syntax uses characters such as backquotes, commas and square brackets that are not valid in Clojure symbols. So I had to come up with a way to write a symbol using characters that the Lisp reader would not normally accept. (Any alternative syntax likely would have had the same problem.)
Other Lisps have solutions to this problem. I decided to use a simplified version of the symbol syntax used in CommonLisp. This is the |
-quotiing used by the Clojure Lisp reader. Read about it in Reader extension: |
-quoting.
Thus we end up with the aforementioned
|System.Collections.Generic.Dictionary`2[System.String,System.Collections.Generic.List`1[System.Int64]]|
The type resolution code in ClojureCLR passes the name of this symbol directly to methods such as Type.GetType()
and Assembly.GetType()
.
Unfortunately, this syntax is not very pleasant to write.
Aliases in the before world
Why not just use import
and ns
declarations to define type aliases?
Namespaces already supply a mechanism for mapping symbols to types.
Namespaces are mappings from simple (unqualified) symbols to Vars and/or Classes. Vars can be interned in a namespace, using def or any of its variants, in which case they have a simple symbol for a name and a reference to their containing namespace, and the namespace maps that symbol to the same var. A namespace can also contain mappings from symbols to vars interned in other namespaces by using refer or use, or from symbols to Class objects by using import. [ Emphasis mine. Reference: Namespaces ]
Use of the symbol-to-type map is embedded all over the Clojure interpreter/compiler code. I’ve written a little about this:
- C4: Symbolic of what? - A little digression on what symbols represent
- Are you my type?.
Every namespace comes pre-loaded with a set of type aliases for all the public types in the System
namespace in assemblies that are loaded during Clojure initialization. This is why you have been able to write
(.ToLower ^String s)
There is an entry in the namespace map associating String
with the type System.String
. That mapping is found when the type hint ^String
is processed.
Clojure provides a mechanism for users to define type aliases: import
. Though one can call import directly, it is more commonly encountered in :import
clauses in ns
declarations. import
and (ns ... (:import ...))
can do some of twhat we want, but is not tied into the underlying CLR type resolution mechanism. For example, you can write:
(ns my.stuff
(:import
(System.IO FileInfo Path)
(System.Text Encoding)))
to introduce aliases FileInfo
for System.IO.FileInfo
, etc.
And these will work standalone.
(defn f [file-info]
(.FullName ^FileInfo file-info))
But, before the changes described below, this would not work:
|System.Collections.Generic.List`1[FileInfo]|
The underlying CLR typename resolution algorithm is not aware of Clojure aliases; it requires fully-qualified names. Instead of FileInfo
, you must write System.IO.FileInfo
.
In addition, import
does not do what is needed for generic types. Though you can do
(ns my.stuff
(:import (System.Collections.Generic List)))
the definition of import
is that the alias is the name of the type, which is actually List`1
. If you were to write
(ns my.stuff2
(:import (System.Collections.Generic |List`1[System.String]|
|List`1[System.Int64]|)))
you would get an error about defining a second alias for List`1
.
Let’s turn to the improvements.
Improved assembly discovery
There are assemblies that should be checked and automatically loaded when looking up types. These include assemblies that the application itself directly depends on as–they might not be loaded yet–and assemblies that are part of the .NET runtime, such as System.Collections.Concurrent.dll
.
The new version of RT.classForName()
(see the aforementioned Are you my type?) automatically discovers and loads these assemblies. It uses the DependencyContext
class to find the entry assembly’s dependencies. It also discovers assemblies that are part of the shared .NET runtime via AppContext.GetData("TRUSTED_PLATFORM_ASSEMBLIES")
.
Note that the latter is not available in .NET Framework 4.6.2. I added a build of ClojureCLR for .NET Framework 4.8.1 which does have access to AppContext.GetData
. If you are on Framework and want the improved typename resolution, use that build.
The algorithm uses several heuristics to identify assemblies that should be inspected. There might be some improvements in these heuristics in the future. But the current version seems to work well in practice. One user reported on the #clr
channel on Clojurians Slack that they were able to remove the following assembly-load
calls in one of their files.
(assembly-load "System.Net.Http")
(assembly-load "System.Web.HttpUtility")
(assembly-load "System.Text.Json")
(assembly-load "System.Collections")
(assembly-load "System.Collections.Concurrent")
(assembly-load "System.Threading.Channels")
(assembly-load "System.Threading.Tasks")
(assembly-load "System.Security.Cryptography")
Sweet.
Typename syntax and resolution – improvements
One yearns for the simplicity of just writing names along the lines of C#’s;
Dictionary<String, List<long>>
And now we have a close approximation of that in ClojureCLR. The improvements are:
- type aliases
- use of the standard built-in Clojure typenames such as
int
andshorts
for things like generic type parameters - automatic calculation of generic type arities (in most circumstances)
In ClojureCLR: Brave new world, we can now define aliases such as:
(alias-type Dictionary |System.Collections.Generic.Dictionary`2|)
(alias-type List |System.Collections.Generic.List`1|)
(alias-type IntList |List[int]|)
and then refer to types
|Dictionary[String, List<long>]|
|Dictionary[String, IntList]|
and create expressions such as:
(def my-list (IntList/new))
(IntList/.Add my-list 42)
I’ll describe each of the three improvements in turn.
Type aliases
There are two aspects: providing a way to define type aliases that fixes the problems with import
, and integrating type aliasing into the typename resolution mechanism.
To define aliases, I have introduced a new function and a new macro. They have the same functionality, but the macro does not require quoting. The following are equivalent:
(alias-type Dictionary |System.Collections.Generic.Dictionary`2|)
(add-type-alias 'Dictionary |System.Collections.Generic.Dictionary`2|)
The second argument must evaluate to a Type
object. With type aliasing fully incorporated into typename resolution, you can now write:
(alias-type List |System.Collections.Generic.List`1|) ;; List maps to the generic type definition
(alias-type IntList |List[int]|) ;; aliases List and int are both recognized
(def il (IntList/new)) ;; You can use an alias where type is expected
(defn f [^IntList xs) ... )
Note that we cannot write just
(alias-type Dictionary |System.Collections.Generic.Dictionary|)
The backquote-arity suffix is required; it is the true name of the type. In this circumstance, we do not have a context to compute the arity.
I considered ways to provide that context. C# would allow this reference
Dictionary<,>
But I’m not sure that is really better than Dictionary`2
. And I very much prefer
|System.Func`6|
to
|System.Func<,,,,,>|
The need to provide backquote-arity suffixes only occurs when referring to the generic type definition itself. When you provide type arguments, the arity can be inferred, as in
|System.Collections.Generic.List[int]|
A note on the built-in special types
Clojure provides special handling for names identifying built-in primitive numeric types: int
, long
, shorts
, etc.
ClojureCLR adds a few for primitive numeric types that are unique to ClojureCLR, such as uint
and sbytes
.
Note that these do not use the type alias mechanism. They are special-cased to be recognized only in certain places, such as the processing of type hints. Consider:
(alias-type Long System.Int64) ;; esablish an alias
Long ;; => System.Int64 -- evaluates to a type
long ;; => #object[core$long 0x28993d0 "clojure.core$long"] -- the Clojure function 'long`
(defn f [^long x] ...) ;; interprets ^long as a type hint for System.Int64
;; on the JVM, this is a type hint for the primitive numeric type 'long'
The typename resolution code will recognize int
and friends when they appear appear in type definitions as generic type parameters only. Of course, the pre-existing usage in type hints is unaffected.
Note: I tried to allow expressions like |int*[]|
to be used at the top level, something way down deep in the compiler had a problem with that. I decided it wasn’t worth the effort to find a solution – for now. You can still use |System.Int32*[]|
or define an alias.
Inferring generic type arity
When in C# you write
Dictionary<String, List<long>> mylist = [];
(assuming appropriate using
directives), the compiler does a lot of work for you behind the scenes. It knows first of all that for Dictionary
it is looking for a generic type with two type parameters. For that reason, it looks for a type named Dictionary`2
: generic type definition names must have a suffix of a backquote and the arity of the generic.
However, to allow such overloading on generic arity at the source language level, CLS Rule 43 is defined to map generic type names to unique CIL names. That Rule states that the CLS-compliant name of a type C having one or more generic parameters, shall have a suffix of the form `n, where n is a decimal integer constant (without leading zeros) representing the number of generic parameters that C has. [Source: ECMA-335, Common Language Infrastructure (CLI), Partition II, section 9]
Looking through the namespaces mentioned in using
s, the C# compiler finds System.Collections.Generic.Dictionary`2
. Similarly, for List`1
.
Our enhancements to typename resolution in ClojureCLR now allow you to omit the arity when generic type arguments are supplied. However, you still must supply the backquote-arity in the name when no generic type arguments are given, i.e., when you are referring to the generic type definition itself.
Nested types and generic type definitions – Beware!
CLR supports nested classes. Referring to these is very straightforward – unless generics are involved. For a simple case such as
namespace MyNamespace;
public class Outer
{
public class Inner { }
}
you could refer to MyNamespace.Outer
and MyNamespace.Outer+Inner
. You could import MyNamespace+Outer
and then refer just to Outer+Inner
. All fine and dandy.
Now consider:
namespace MyNamespace;
public class GenParent<T1, T2>
{
public class Child
{
public class GrandChild<T3>
{
public class GreatGrandChild<T4, T5>
{
}
}
}
}
Working in C#, you could refer to any of these (assuming using MyNamespace;
is in effect):
GenParent<,> // Generic type definition
GenParent<int, string> // constructed generic type
GenParent<,>.Child // nested type -- this is also a generic type definition
GenParent<,>.Child.GrandChild<> // nested type -- constructed generic type
GenParent<int string,>.Child.GrandChild<double> // constructed generic type
// etc.
However, if you were to print the fully-qualified name of the type
GenParent<int, string>.Child.GrandChild<double>.GreatGrandChild<int, long>
you might be surprised to find that you obtain (leaving out assembly information):
GenParent`2+Child+GrandChild`1+GreatGrandChild`2[System.Int32,System.String,System.Double,System.Int32,System.Int64]
Yup. the actual generic type definition is
GenParent`2+Child+GrandChild`1+GreatGrandChild`2
and it takes five type parameters.
We have most of flexibility of the C# syntax in ClojureCLR, except: if you have generic type definitions, as opposed to constructed generic types (those with type arguments supplied), you must provide the backquote-arity suffixes for all generic type definitions in the nesting hierarchy. For the C# examples given above:
GenParent<,> // Generic type definition
GenParent<int, string> // constructed generic type
GenParent<,>.Child // nested type -- this is also a generic type definition
GenParent<,>.Child.GrandChild<> // nested type -- constructed generic type
GenParent<int string,>.Child.GrandChild<double> // constructed generic type
// etc.
we can write in ClojureCLR:
|MyNamespace.GenParent`2| ;; Generic type definition, backquote-arity required
|MyNamespace.GenParent[int, String]| ;; No need for `2 here
|MyNamespace.GenParent`2+Child| ;; Nested generic type definition; must provide `2
|MyNamespace.GenParent`2+Child+GrandChild`1| ;; Nested generic type definition; must provide `2 and `1
|MyNamespace.GenParent[int,String]+Child+GrandChild[double]| ;; No need for `2 or `1 here
If you introduce type aliases, the same rules apply.
(alias-type GP |MyNamespace.GenParent`2|)
(alias-type GPC |GP+Child|) ;; we know the arity from the alias for GP
We can then refer to
|GP[int, string]| ;; constructed generic type;
|GPC[long, double]| ;; constructed generic type;
|
The triple identity of square brackets
When I first worked with the CLR typename syntax, one thing I found confusing was that square brackets ([
and ]
) are used in three different ways:
- To delimit the list of type arguments to a generic type definition. Example:
Dictionary`2[System.String,System.Int32]
. - To indicate an array type. Example:
System.String[]
is an array of strings. - To delimit assembly names in assembly-qualified type names.
For the last case, you do not need brackets around the string if the assembly is for the top level name, such as
System.String, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
The comma after System.String
indicates that an assembly name follows. However, if you are supplying an assembly name for a generic type parameters, you need the brackets. Example:
System.Collections.Generic.List`1[[System.Int64, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]
which is deconstructed as
typename[ ...type argument ... ], ...assembly-specifier
(note: not brackets around this), and where the type argument is
[typename,assembly-specifier]
(Note the brackets.)
I hope you never have to deal with nested generic types and assembly-qualified names.
Conclusion
I hope this has been helpful in understanding the improvements to typename syntax and resolution in ClojureCLR. I think these changes make type referencing more pleasant to use and easier to understand.