A minimal implementation of Cons
Seeing the complexity of the Clojure interface/data-structure ecosystem as we did in the last post can be a bit daunting. But if we start gently we can tease out some of the basic interactions and techniques that underlie how the real Clojure versions of these data structures are implemented.
Despite perhaps putting the scare into you with the diagram of Cons
in the last post, a simple cons cell is the place to start. A cons cell is just a node for singly-linked list – a field containing an item and a field pointing to the rest of the list, perhaps the simplest notion of sequence we can come up with. However, as we will see, we need more than just this node type to meet the needs of the Clojure interfaces.
The interfaces
To simplify our task, we will implement only the trio of interfaces that sits at the heart of Clojure’s sequence paradigm – ISeq
, IPersistentCollection
, and Seqable
.
If you need a refresher on how Clojure works with sequences, you can take a gander at Sequences. We are about to expose what lies underneath, what a data structure must implement in order to participate in actions such as interating through a sequence.
Let’s start with Seqable
:
type [<AllowNullLiteral>] Seqable =
abstract seq : unit -> ISeq
A Seqable
is anything that can produce an ISeq
to iterate over its contents. Calling seq
on a Sequable
will hand you an ISeq
, an object that represents a sequence, suitable for iteration and other sequence-y things. For some ojbects, seq
might return the object itself: its class implements ISeq
. For other objects, calling its seq()
will produce a different object to handle the iteration. Thus, something could be Seqable
without itself being an ISeq
or an IPersistentCollection
.
Side note: If you think
Sequable.seq
sounds like theseq
function in Clojure, well, yes. But the Clojureseq
can be called on many types of objects, first and foremost those that implementSequable
. However,String
does not implementSeqable
but nevertheless evaluate(seq "abcd")
and you will get back a sequence comprising the characters in the string. Clojure’sseq
is implemented as a call to a runtime library function (clojure.lang.RT.seq
if you must know).RT.seq
first checks if its argument implementsSeqable
, in which case it calls itsSeqable.seq
method. Otherwise, it runs through a bunch of special cases – strings, arrays,IEnumerable
(or the Java equivalent) – and creates appropriateISeq
objects for them. (Real inside baseball: this is where protocols could come into play. Shhhh.)
The IPersistentCollection
interface is straightforward. An object could be an IPersistentCollection
without being an ISeq
, hence these interfaces cannot be merged. But an IPersistentCollection
must be Seqable
; its essence as a collection is that you can iterate over it. It’s just that the iterator (its seq
) may be different kind of object.
and [<AllowNullLiteral>] IPersistentCollection =
inherit Seqable
abstract count : unit -> int
abstract cons: obj -> IPersistentCollection
abstract empty: unit -> IPersistentCollection
abstract equiv: obj -> bool
A quick look at the methods in IPersistentCollection
:
count()
returns a count of items in the collection – for anISeq
, that would be the number of items in the sequence from that point on.empty()
yields an empty collection of the appropriate type. (You’d have to read comments in the Clojure code to get this.) A hash map would return an empty hash map, for example. In what follows, because aCons
cannot be empty –it has an element – some other type will have to be returned.x.cons(o)
returns a newISeq
which has the itemo
first, followed by the items inx
. It is up to each type to figure out how to implement this. In actual Clojure ,PersistentList
returns a newPersistentList
.EmptyList
does also. Other types might use aCons
.equiv(o)
is used for equality checking on collections. Each collection defines its own. (Discussingequiv
and equality in Clojure will require a separate post.)
The ISeq
interface captures the essence of iteration across a sequence.
and [<AllowNullLiteral>] ISeq =
inherit IPersistentCollection
abstract first : unit -> obj
abstract next : unit -> ISeq
abstract more : unit -> ISeq
abstract cons : obj -> ISeq
The difference between next
and more
is subtle. The best place to learn a little more (no pun) is here but you’ll have to read carefully. Originlly, Clojure had just first
and rest
. When laziness was added into the picture, a redesign was needed.
Essentially, more
gives you an object that represents the rest of sequence, without actually triggering a computation to compute the next element in the sequence. next
goes to the trouble of figuring out if there is more to the sequence or not (so to speak) – it will determine if there is a next element or if we have reached the end of the sequence. more
is provided to avoid a potentially significant computation that might not be necessary. If you are doing something like iterating through a list, for example, you would use next
– you know you are going to use the next element.
From Sequences we have:
The Seq interface
( first coll)
Returns the first item in the collection. Calls seq on its argument. If coll is nil, returns nil.
( rest coll)
Returns a sequence of the items after the first. Calls seq on its argument. If there are no more items, returns a logical sequence for which seq returns nil.
The doc string for the next
function in Clojure states:
Returns a seq of the items after the first. Calls seq on its argument. If there are no more items, returns nil.
The rest
function in Clojure is essentially ISeq.more
. Note that it does not return nil
. Thus, we need something to represent an empty list that is not nil
. A cons cell is never empty. Thus, our minimum is a cons cell and ‘a logical sequence for which seq returns nil’ – an EmptyList
.
You should note that the null value in Clojure –
nil
– is not an empty list. And the empty list –()
– is notnil
.
(nil? ()) ; -> false
(nil? nil) ; -> true
(seq? ()) ; -> true
(seq? nil) ; -> false
(list? ()) ; -> true
(list? nil) ; -> false
Getting started
Rather than just jump directly into implementing SimpleCons
and SimpleEmptyList
, I’ll complicate things a bit and introduce some helper functions. We could do without these for our simple example, but this mimics how it is actually done in the Clojure code and also will set us up to implement more complicated data structures down the line. I put these helpers into a module named, of course, Util
.
Equality first.
let checkEquals o1 o2 = obj.ReferenceEquals(o1,o2) || not (isNull o1) && o1.Equals(o2)
let rec seqEquals (s1:ISeq) (s2:ISeq) =
match s1, s2 with
| null, null -> true
| null, _ -> false
| _, null -> false
| _ -> checkEquals (s1.first()) (s2.first()) && seqEquals (s1.next()) (s2.next())
checkEquals
just checks if two objects are equal. As written, it short-circuits through Object.ReferenceEquals
to both handle null values properly and avoid unnecessary calls to Equals
.
seqEquals
is prototypical example of iteration through a sequence using first
and next
. In this case, it happens to iterate through two sequences simultaneously. The recursion stops when then end of one of the sequences is reached. (And if they are not both at the end, of course they cannot be equal.)
There is a notion of equiv
(equivalent) that is separate from Equals
, mostly in how numeric values are handled. We ignore that here, but reserve a place now for elaboration in a later post.
let seqEquiv s1 s2 = seqEquals s1 s2
While we are on basics, we need to make hash codes consistent with equality for sequences.
let getHashCode (s:ISeq) =
let combine hc x = 31*hc + if isNull x then 0 else x.GetHashCode()
let rec step (s:ISeq) hc =
if isNull s then hc
else step (s.next()) (combine hc (s.first()))
step s 1
This walks the sequence and combines hash codes for each element. Finally, we need a standard way to compute the count of a sequence. (This is a slight simplification of the code in RT
.)
let seqCount (s:ISeq) =
let rec step (s:ISeq) cnt = if isNull s then cnt else step (s.next()) (cnt+1)
step s 0
Finally, let’s provide a little help for printing sequences. (Feel free to skip this on the first pass.)
let rec seqToString (s: ISeq) =
let sb = new StringBuilder()
let rec appendItem (o: obj) =
match o with
| :? Seqable as s -> appendSeq (s.seq ())
| _ -> sb.Append(o.ToString()) |> ignore
and appendSeq (s: ISeq) =
match s with
| null -> ()
| _ ->
appendItem (s.first ())
sb.Append(" ") |> ignore
appendSeq (s.next ())
sb.Append("(") |> ignore
appendSeq s
sb.Append(")") |> ignore
sb.ToString()
Again, a sequence iteration implemented by recursion. Please note this emphatically is not how printing is handled for in the Clojure code; this will serve as a placeholder for now.
The empty sequence
And now we can write our two collection classes. These are mutually recursive and so are joined by and
in the actual code. Let’s start with the simpler ‘empty sequence’.
type SimpleEmptySeq() =
override _.GetHashCode() = 1
override _.ToString() = "()"
override _.Equals(o) =
match o with
| :? Seqable as s -> s.seq() |> isNull
| _ -> false
The basic overrides are simple: Fixed hashcode value; ToString
as required by Clojure. (That ()
should look familiar.) For equality, we can only be equal to an empty sequence. As Clojure is set up, an empty sequence is something that returns nil
(= null
in .NET) when seq
is called on it. That means we need something that seq
can be called on: a Seqable
.
Yes, in Clojure, when you call
seq
on an empty sequence, you get backnil
:
(seq ()) ; -> nil
You can read all kinds of dicussions on Clojure groups about why one tests (seq s)
for various conditions – keep in mind that nil
counts as false
in boolean conditions.
On to the sequence goodies. The defining characteristic of an empty sequence object is that calling seq
on it returns null, as we just saw.
interface Seqable with
member _.seq() = null
Next, an empty list is a very simple sort of collection.
interface IPersistentCollection with
member _.count() = 0
member this.cons(o) = upcast (this:>ISeq).cons(o)
member this.empty() = upcast this
member this.equiv(o) = this.Equals(o)
The object is its own empty
. Obviously the count
is zero. equiv
equates to Equals
here.
For cons
, the definition is a bit odd. We have two different cons
es floating around: IPersistentCollection.cons
and ISeq.cons
. They have distinct return types. Because an ISeq
is an IPersistentCollection
, you will often see what have done here: implement the IPersistentCollection
version in terms of the ISeq
version, with an upcast
.
Finally,
interface ISeq with
member _.first() = null
member _.next() = null
member this.more() = upcast this
member this.cons(o) = upcast SimpleCons(o,this)
The values for first
and next
are by definition. We need to return an emtpy sequence–not null
– for more
; this object itself is such a thing. Finally, with cons
we see our entanglement with SimpleCons
.
A simple cons cell
And so on to SimpleCons
:
and type SimpleCons(head: obj, tail: ISeq ) =
The main constructor is all we need, along with two (immutable) fields to hold the values. For the Object
overrides, we can defer mostly to our utility functions:
override this.GetHashCode() = Util.getHashCode this
override this.ToString() = Util.seqToString this
override this.Equals(o) =
match o with
| :? Seqable as s -> Util.seqEquals (this:>ISeq) (s.seq())
| _ -> false
We will compare affirmatively only to other sequences, and they must have equal elements in order.
interface Seqable with
member this.seq() = upcast this
Because SimpleCons
implements ISeq
, the object can be its own seq
.
interface IPersistentCollection with
member _.count() = 1 + Util.seqCount tail
member this.cons(o) = upcast (this:>ISeq).cons(o)
member _.empty() = upcast SimpleEmptySeq()
member this.equiv(o) =
match o with
| :? Seqable as s -> Util.seqEquiv (this:>ISeq) (s.seq())
| _ -> false
As a collection, SimpleCons
has its own element plus the elements in its tail
; thus count
.
For cons
we defer to ISeq.cons
as mentioned above.
And empty
gets us the reverse entanglement with SimpleEmptySeq
.
Finally,
interface ISeq with
member _.first() = head
member this.next() = (this:>ISeq).more().seq()
member _.more() = if isNull tail then upcast SimpleEmptySeq() else tail
member this.cons(o) = upcast SimpleCons(o,this)
I hope it is not a surprise that to cons something onto the front of ourselves, we use a SimpleCons
.
The dance here between more
and next
is common.
more
cannot return null
, so it cannot just return the tail
. If the tail
is null, we must return ‘a logical sequence for which seq returns nil’: our SimpleEmptySeq
. That test for emptiness here is very specific to SimpleCons
: a null tail
.
next
is often defined in terms of more
in the manner shown here. If more
returns an empty sequence, the seq
on it will return null
, which is what next
’s contract says. And our tail
is an ISeq
, which is an IPersistentCollection
, which is an ISeq
, so the call to seq()
on the more
will always be safe.
And we are done.
Source code for these examples are available at ClojureCLR-Next repo.