Seeing the complexity of the Clojure interface/data-structure ecosystem as we did in the last post can be a bit daunting. But if we start gently we can tease out some of the basic interactions and techniques that underlie how the real Clojure versions of these data structures are implemented.
Despite perhaps putting the scare into you with the diagram of
Cons in the last post, a simple cons cell is the place to start. A cons cell is just a node for singly-linked list – a field containing an item and a field pointing to the rest of the list, perhaps the simplest notion of sequence we can come up with. However, as we will see, we need more than just this node type to meet the needs of the Clojure interfaces.
To simplify our task, we will implement only the trio of interfaces that sits at the heart of Clojure’s sequence paradigm –
If you need a refresher on how Clojure works with sequences, you can take a gander at Sequences. We are about to expose what lies underneath, what a data structure must implement in order to participate in actions such as interating through a sequence.
Let’s start with
type [<AllowNullLiteral>] Seqable = abstract seq : unit -> ISeq
Seqable is anything that can produce an
ISeq to iterate over its contents. Calling
seq on a
Sequable will hand you an
ISeq, an object that represents a sequence, suitable for iteration and other sequence-y things. For some ojbects,
seq might return the object itself: its class implements
ISeq. For other objects, calling its
seq() will produce a different object to handle the iteration. Thus, something could be
Seqable without itself being an
ISeq or an
Side note: If you think
Sequable.seqsounds like the
seqfunction in Clojure, well, yes. But the Clojure
seqcan be called on many types of objects, first and foremost those that implement
Stringdoes not implement
Seqablebut nevertheless evaluate
(seq "abcd")and you will get back a sequence comprising the characters in the string. Clojure’s
seqis implemented as a call to a runtime library function (
clojure.lang.RT.seqif you must know).
RT.seqfirst checks if its argument implements
Seqable, in which case it calls its
Seqable.seqmethod. Otherwise, it runs through a bunch of special cases – strings, arrays,
IEnumerable(or the Java equivalent) – and creates appropriate
ISeqobjects for them. (Real inside baseball: this is where protocols could come into play. Shhhh.)
IPersistentCollection interface is straightforward. An object could be an
IPersistentCollection without being an
ISeq, hence these interfaces cannot be merged. But an
IPersistentCollection must be
Seqable; its essence as a collection is that you can iterate over it. It’s just that the iterator (its
seq) may be different kind of object.
and [<AllowNullLiteral>] IPersistentCollection = inherit Seqable abstract count : unit -> int abstract cons: obj -> IPersistentCollection abstract empty: unit -> IPersistentCollection abstract equiv: obj -> bool
A quick look at the methods in
count()returns a count of items in the collection – for an
ISeq, that would be the number of items in the sequence from that point on.
empty()yields an empty collection of the appropriate type. (You’d have to read comments in the Clojure code to get this.) A hash map would return an empty hash map, for example. In what follows, because a
Conscannot be empty –it has an element – some other type will have to be returned.
x.cons(o)returns a new
ISeqwhich has the item
ofirst, followed by the items in
x. It is up to each type to figure out how to implement this. In actual Clojure ,
PersistentListreturns a new
EmptyListdoes also. Other types might use a
equiv(o)is used for equality checking on collections. Each collection defines its own. (Discussing
equivand equality in Clojure will require a separate post.)
ISeq interface captures the essence of iteration across a sequence.
and [<AllowNullLiteral>] ISeq = inherit IPersistentCollection abstract first : unit -> obj abstract next : unit -> ISeq abstract more : unit -> ISeq abstract cons : obj -> ISeq
The difference between
more is subtle. The best place to learn a little more (no pun) is here but you’ll have to read carefully. Originlly, Clojure had just
rest. When laziness was added into the picture, a redesign was needed.
more gives you an object that represents the rest of sequence, without actually triggering a computation to compute the next element in the sequence.
next goes to the trouble of figuring out if there is more to the sequence or not (so to speak) – it will determine if there is a next element or if we have reached the end of the sequence.
more is provided to avoid a potentially significant computation that might not be necessary. If you are doing something like iterating through a list, for example, you would use
next – you know you are going to use the next element.
From Sequences we have:
The Seq interface
( first coll)
Returns the first item in the collection. Calls seq on its argument. If coll is nil, returns nil.
( rest coll)
Returns a sequence of the items after the first. Calls seq on its argument. If there are no more items, returns a logical sequence for which seq returns nil.
The doc string for the
next function in Clojure states:
Returns a seq of the items after the first. Calls seq on its argument. If there are no more items, returns nil.
rest function in Clojure is essentially
ISeq.more. Note that it does not return
nil. Thus, we need something to represent an empty list that is not
nil. A cons cell is never empty. Thus, our minimum is a cons cell and ‘a logical sequence for which seq returns nil’ – an
You should note that the null value in Clojure –
nil– is not an empty list. And the empty list –
()– is not
(nil? ()) ; -> false (nil? nil) ; -> true (seq? ()) ; -> true (seq? nil) ; -> false (list? ()) ; -> true (list? nil) ; -> false
Rather than just jump directly into implementing
SimpleEmptyList, I’ll complicate things a bit and introduce some helper functions. We could do without these for our simple example, but this mimics how it is actually done in the Clojure code and also will set us up to implement more complicated data structures down the line. I put these helpers into a module named, of course,
let checkEquals o1 o2 = obj.ReferenceEquals(o1,o2) || not (isNull o1) && o1.Equals(o2) let rec seqEquals (s1:ISeq) (s2:ISeq) = match s1, s2 with | null, null -> true | null, _ -> false | _, null -> false | _ -> checkEquals (s1.first()) (s2.first()) && seqEquals (s1.next()) (s2.next())
checkEquals just checks if two objects are equal. As written, it short-circuits through
Object.ReferenceEquals to both handle null values properly and avoid unnecessary calls to
seqEquals is prototypical example of iteration through a sequence using
next. In this case, it happens to iterate through two sequences simultaneously. The recursion stops when then end of one of the sequences is reached. (And if they are not both at the end, of course they cannot be equal.)
There is a notion of
equiv (equivalent) that is separate from
Equals, mostly in how numeric values are handled. We ignore that here, but reserve a place now for elaboration in a later post.
let seqEquiv s1 s2 = seqEquals s1 s2
While we are on basics, we need to make hash codes consistent with equality for sequences.
let getHashCode (s:ISeq) = let combine hc x = 31*hc + if isNull x then 0 else x.GetHashCode() let rec step (s:ISeq) hc = if isNull s then hc else step (s.next()) (combine hc (s.first())) step s 1
This walks the sequence and combines hash codes for each element. Finally, we need a standard way to compute the count of a sequence. (This is a slight simplification of the code in
let seqCount (s:ISeq) = let rec step (s:ISeq) cnt = if isNull s then cnt else step (s.next()) (cnt+1) step s 0
Finally, let’s provide a little help for printing sequences. (Feel free to skip this on the first pass.)
let rec seqToString (s: ISeq) = let sb = new StringBuilder() let rec appendItem (o: obj) = match o with | :? Seqable as s -> appendSeq (s.seq ()) | _ -> sb.Append(o.ToString()) |> ignore and appendSeq (s: ISeq) = match s with | null -> () | _ -> appendItem (s.first ()) sb.Append(" ") |> ignore appendSeq (s.next ()) sb.Append("(") |> ignore appendSeq s sb.Append(")") |> ignore sb.ToString()
Again, a sequence iteration implemented by recursion. Please note this emphatically is not how printing is handled for in the Clojure code; this will serve as a placeholder for now.
The empty sequence
And now we can write our two collection classes. These are mutually recursive and so are joined by
and in the actual code. Let’s start with the simpler ‘empty sequence’.
type SimpleEmptySeq() = override _.GetHashCode() = 1 override _.ToString() = "()" override _.Equals(o) = match o with | :? Seqable as s -> s.seq() |> isNull | _ -> false
The basic overrides are simple: Fixed hashcode value;
ToString as required by Clojure. (That
() should look familiar.) For equality, we can only be equal to an empty sequence. As Clojure is set up, an empty sequence is something that returns
null in .NET) when
seq is called on it. That means we need something that
seq can be called on: a
Yes, in Clojure, when you call
seqon an empty sequence, you get back
(seq ()) ; -> nil
You can read all kinds of dicussions on Clojure groups about why one tests
(seq s) for various conditions – keep in mind that
nil counts as
false in boolean conditions.
On to the sequence goodies. The defining characteristic of an empty sequence object is that calling
seq on it returns null, as we just saw.
interface Seqable with member _.seq() = null
Next, an empty list is a very simple sort of collection.
interface IPersistentCollection with member _.count() = 0 member this.cons(o) = upcast (this:>ISeq).cons(o) member this.empty() = upcast this member this.equiv(o) = this.Equals(o)
The object is its own
empty. Obviously the
count is zero.
equiv equates to
cons, the definition is a bit odd. We have two different
conses floating around:
ISeq.cons. They have distinct return types. Because an
ISeq is an
IPersistentCollection, you will often see what have done here: implement the
IPersistentCollection version in terms of the
ISeq version, with an
interface ISeq with member _.first() = null member _.next() = null member this.more() = upcast this member this.cons(o) = upcast SimpleCons(o,this)
The values for
next are by definition. We need to return an emtpy sequence–not
more; this object itself is such a thing. Finally, with
cons we see our entanglement with
A simple cons cell
And so on to
and type SimpleCons(head: obj, tail: ISeq ) =
The main constructor is all we need, along with two (immutable) fields to hold the values. For the
Object overrides, we can defer mostly to our utility functions:
override this.GetHashCode() = Util.getHashCode this override this.ToString() = Util.seqToString this override this.Equals(o) = match o with | :? Seqable as s -> Util.seqEquals (this:>ISeq) (s.seq()) | _ -> false
We will compare affirmatively only to other sequences, and they must have equal elements in order.
interface Seqable with member this.seq() = upcast this
ISeq, the object can be its own
interface IPersistentCollection with member _.count() = 1 + Util.seqCount tail member this.cons(o) = upcast (this:>ISeq).cons(o) member _.empty() = upcast SimpleEmptySeq() member this.equiv(o) = match o with | :? Seqable as s -> Util.seqEquiv (this:>ISeq) (s.seq()) | _ -> false
As a collection,
SimpleCons has its own element plus the elements in its
cons we defer to
ISeq.cons as mentioned above.
empty gets us the reverse entanglement with
interface ISeq with member _.first() = head member this.next() = (this:>ISeq).more().seq() member _.more() = if isNull tail then upcast SimpleEmptySeq() else tail member this.cons(o) = upcast SimpleCons(o,this)
I hope it is not a surprise that to cons something onto the front of ourselves, we use a
The dance here between
next is common.
more cannot return
null, so it cannot just return the
tail. If the
tail is null, we must return ‘a logical sequence for which seq returns nil’: our
SimpleEmptySeq. That test for emptiness here is very specific to
SimpleCons: a null
next is often defined in terms of
more in the manner shown here. If
more returns an empty sequence, the
seq on it will return
null, which is what
next’s contract says. And our
tail is an
ISeq, which is an
IPersistentCollection, which is an
ISeq, so the call to
seq() on the
more will always be safe.
And we are done.
Source code for these examples are available at ClojureCLR-Next repo.