PersistentHashMap, part 2 -- The root
I’ll sketch the code for the root of the PersistentHashMap structure in this post. The focus is on the data structures used and the primary operations: adding a key/value pair, find the value associated with a key, and removing a key. We’ll start with the Clojure interfaces that must be implemented.
Refresher
You can refer to the first post in this series for the background on HAMTs.
Interfaces
To fit into the Clojure eco-system, we will need to implement the following interfaces. We discussed these in detail in This map is the territory.
[<AllowNullLiteral>]
type Seqable =
abstract seq: unit -> ISeq
and [<AllowNullLiteral>] IPersistentCollection =
inherit Seqable
abstract count: unit -> int
abstract cons: obj -> IPersistentCollection
abstract empty: unit -> IPersistentCollection
abstract equiv: obj -> bool
and [<AllowNullLiteral>] ISeq =
inherit IPersistentCollection
abstract first: unit -> obj
abstract next: unit -> ISeq
abstract more: unit -> ISeq
abstract cons: obj -> ISeq
[<AllowNullLiteral>]
type ILookup =
abstract valAt: key: obj -> obj
abstract valAt: key: obj * notFound: obj -> obj
[<AllowNullLiteral>]
type IMapEntry =
abstract key: unit -> obj
abstract value: unit -> obj
[<AllowNullLiteral>]
type Associative =
inherit IPersistentCollection
inherit ILookup
abstract containsKey: key: obj -> bool
abstract entryAt: key: obj -> IMapEntry
abstract assoc: key: obj * value: obj -> Associative
[<AllowNullLiteral>]
type Sequential =
interface
end
[<AllowNullLiteral>]
type Counted =
abstract count: unit -> int
[<AllowNullLiteral>]
type IPersistentMap =
inherit Associative
inherit IEnumerable<IMapEntry> // do we really need this?
inherit Counted
abstract assoc: key: obj * value: obj -> IPersistentMap
abstract assocEx: key: obj * value: obj -> IPersistentMap
abstract without: key: obj -> IPersistentMap
abstract cons: o: obj -> IPersistentMap
At the root
The code below is fairly direct translation of the original C# code, a direct translation of the original Java code from Clojure(JVM). There are ways in which the code might not be considered idiomatic F#. For example, the code uses null values as flags instead of using option types. Deal with it.
Actually, after getting this code running, I re-implemented the code at least three times to explore other choices. It was … interesting. Mostly, small variations in efficiency and readability. I thought I’d just stick with the code that’s been heavily tested in the wild.
There is one immediate consequence of using null as a flag: the internal nodes that contain keys cannot handle null itself as a key. So the root data node holds a flag and the value for the null key, if required.
PersistentHashMap private (meta: IPersistentMap,
count: int,
root: INode,
hasNull: bool,
nullValue: obj) =
inherit APersistentMap()
The PersistentHashMap class represents the map itelf. It holds any associated metadata (needed for the IObj and IMeta interface implementations). It holds the count of entries so we can implement Counted (for an efficient count method).
The hasNull flag indicates if null is present as a key in the map; the nullValue field holds the associated value.
The root field contains a pointer to the index node at the top of the tree. There are three types of nodes in the tree: BitmapIndexedNode, ArrayNode, and HashCollisionNode; each implements the INode interface. In general, operations on the map are performed on the root and generally defer to the root node to do the heavy lifting – those are the operations defined in INode.
The base class, APersistentHashMap provides some standardized implementations for things that are common to most maps and that we will not concern ourselves with here. For example, implementations of map equality, of IDictionary, etc.
We provide some convenient accessors for these fields:
member internal _.Meta = meta
member internal _.Count = count
member internal _.Root = root
member internal _.HasNull = hasNull
member internal _.NullValue = nullValue
Note that root can be null; this will be the case if the map is empty or has only the null key as an entry.
Thus we can define our default ‘empty’ map via:
static member Empty = PersistentHashMap(null, 0, null, false, null)
At this point, many interface operations are simple. Either they can be done directly or we delegate them to the root.
interface IMeta with
override _.meta() = meta
interface IObj with
override this.withMeta(m) =
if LanguagePrimitives.PhysicalEquality m meta then
this
else
PersistentHashMap(m, count, root, hasNull, nullValue)
interface Counted with
override _.count() = count
interface IPersistentCollection with
override _.count() = count
override _.empty() =
(PersistentHashMap.Empty :> IObj).withMeta (meta) :?> IPersistentCollection
// empty passes the metadata along
We start getting a bit more in the weeds as we get into detailed operations:
interface ILookup with
override this.valAt(k) = (this :> ILookup).valAt (k, null)
override _.valAt(k, nf) =
if isNull k then
if hasNull then nullValue else nf
elif isNull root then
null
else
root.find (0, NodeOps.hash (k), k, nf)
You see in the second valAt our special handling of a possible null key and the null key value.
If the root is null, there are no non-null entries are so nothing to find. else we defer the lookup to the root node.
The find method takes the level in the tree (0 at the start), the hash value for the key, the key, and the ‘not found’ value to return if the key is not in the map.
The Associative inteface is the more map-specific version of ILookup. It deals with key presence and returning IMapEntry objects (key/value pairs) instead of just the values. But the form is similar to the what we just saw.
interface Associative with
override _.containsKey(k) =
if isNull k then
hasNull
else
(not (isNull root))
&& root.find (0, NodeOps.hash (k), k, PersistentHashMap.notFoundValue)
<> PersistentHashMap.notFoundValue
override _.entryAt(k) =
if isNull k then
if hasNull then
upcast MapEntry.create (null, nullValue)
else
null
elif isNull root then
null
else
root.find (0, NodeOps.hash (k), k)
You will note a fairly standard trick: we use a special value, notFoundValue, to indicate that a key was not found. Thisis passed down to the nodes below. If it comes back, no entry for the key was found.
Other than the lookup functionality defined above, the map operations of interest as assoc, adding a new key/value pair, and without, removing a key. There are in the IPersistentMap interface. Starting with assoc:
interface IPersistentMap with
override this.assoc(k, v) =
if isNull k then
if hasNull && LanguagePrimitives.PhysicalEquality v nullValue then
upcast this
else
upcast PersistentHashMap(meta, (if hasNull then count else count + 1), root, true, v)
We being with special handling for a null key. If present, and the value matches, there is no change: just return this. Otherwise, we create new map with the null key/value pair. The count will increase or not depending on whether the null key was already present. Continuing:
else
let addedLeaf = BoolBox()
let rootToUse: INode = if isNull root then upcast BitmapIndexedNode.Empty else root
let newRoot = rootToUse.assoc (0, NodeOps.hash (k), k, v, addedLeaf)
if LanguagePrimitives.PhysicalEquality newRoot root then
upcast this
else
upcast
PersistentHashMap(
meta,
(if addedLeaf.isSet then count + 1 else count),
newRoot,
hasNull,
nullValue
)
This is already a bit in the weeds. We have to have a root to defer the operation to. If there is a value for root, we use. Otherwise we create an empty node of the appropriate type (BitmapIndexedNode – much more on this coming below). We defer the assoc operation to the root node. If the root node comes back unchanged, then the key/value pair was already present and the assoc operation was a no-op – we can return this. If the root node is different, then we have a new root node and we create a new map with that root node. The count of entries in the map will increase if the key was not already present. The special object addedLeaf is a sentinel that will be set if a new entry is added to the map. The BoolBox type is a simple mutable boolean type.
type BoolBox(init) =
let mutable value: bool = init
new() = BoolBox(false)
member _.set() = value <- true
member _.reset() = value <- false
member _.isSet = value
member _.isNotSet = not value
(There are other ways to handle this. I just used what the C#/Java versions used.)
The assocEx operation is an assoc that throws an exception if the key is present.
override this.assocEx(k, v) =
if (this :> Associative).containsKey (k) then
raise <| InvalidOperationException("Key already present")
(this :> IPersistentMap).assoc (k, v)
Finally, the without operation:
override this.without(k) =
if isNull k then
if hasNull then
upcast PersistentHashMap(meta, count - 1, root, false, null)
else
upcast this
elif isNull root then
upcast this
else
let newRoot = root.without (0, NodeOps.hash (k), k)
if LanguagePrimitives.PhysicalEquality newRoot root then
upcast this
else
upcast PersistentHashMap(meta, count - 1, newRoot, hasNull, nullValue)
There is the usual special case handling for the null key. When we do the operation on the root, getting back the same root indicates the key was not present, so removing it was a no-op. Otherwise, we have a new root and reduced count.
In the next post, we look at the structure of the INode interface and the three node types that implement it.