Persistent vectors, Part 3 -- Transiency
Clojure’s vectors, hash-maps and hash-sets support transients. We examine the implementation of transiency for PersistentVector.
Background
You should read the reference article Transient Data Structures to get the rationale and overall idea of the role of transient data structures and interaction with them. We will examine them at the implementation level rather than the Clojure API level.
In the context of PersistentVector, the functionality that we need to support is:
- Creating a transient version of an existing vector; this operation needs to be O(1).
- No operation on the transient changes the existing object.
- The transient needs to support operations such as
assocandconjin an efficient manner. These operations on a transient return another transient. However, mutation of the transients structures is allowed. Transients are not persistent. - You can create a persistent data structure from the transient; this operation needs to be O(1) also.
The requirement for constant time operations for creating transients and converting them back to persistent objects means that there needs to be a great deal of structure sharing with the original (persistent, immutable) object. Our mutating operations will have to be careful to do mutations only on copies of pieces of the data structure that they know they own exclusively.
The interfaces
As always, we start with the interfaces that define the behavior we need to implement. We start with
[<AllowNullLiteral>]
type IEditableCollection =
abstract asTransient: unit -> ITransientCollection
PersistentVector will implement this interface to create a transient version of itself.
The remaining interfaces are implemented by the transient version.
[<AllowNullLiteral>]
type ITransientCollection =
abstract conj: o: obj -> ITransientCollection
abstract persistent: unit -> IPersistentCollection
[<AllowNullLiteral>]
type ITransientAssociative =
inherit ITransientCollection
inherit ILookup
abstract assoc: key: obj * value: obj -> ITransientAssociative
[<AllowNullLiteral>]
type ITransientAssociative2 =
inherit ITransientAssociative
abstract containsKey: key: obj -> bool
abstract entryAt: key: obj -> IMapEntry
[<AllowNullLiteral>]
type ITransientVector =
inherit ITransientAssociative
inherit Indexed
abstract assocN: idx: int * value: obj -> ITransientVector
abstract pop: unit -> ITransientVector
These are versions of IPersistentCollection and relatives that are specialized to a restricted set of operations and designed so that the operations all return transient objects. Methods such as conj and assoc have their usual meanings. Of note is persistent – this is the method that returns a persistent version of the object when we desire to transition back to immutability+persistence.
Implementation
A PersistentVector has a head node with a tail array and a pointer to the root of the index tree where the bulk of the elements in the array are stored. We have a parallel for the persistence version. Compare
type PersistentVector(meta: IPersistentMap, cnt: int, shift: int, root: PVNode, tail: obj array) =
inherit APersistentVector()
member internal _.Count = cnt
member internal _.Shift = shift
member internal _.Root = root
member internal _.Tail = tail
to
type TransientVector private (_cnt, _shift,_root,_tail) =
inherit AFn()
[<VolatileField>]
let mutable cnt : int = _cnt
[<VolatileField>]
let mutable shift : int = _shift
[<VolatileField>]
let mutable root : PVNode = _root
[<VolatileField>]
let mutable tail : obj array = _tail
Where we had immutable fields for the count, etc., we now have mutable fields, allowing the head object’s data to be updated as we perform operations.
We call the IEditableCollection.asTransient() method to create a transient copy of our PersistentVector:
interface IEditableCollection with
member this.asTransient() = TransientVector(this)
This invokes the TransientVector constructor
new(v:PersistentVector) = TransientVector(
v.Count,
v.Shift,
TransientVector.editableRoot(v.Root),
TransientVector.editableTail(v.Tail))
We need to make copies of the tail and the root node so that we may mutate them as needed without affecting our original PersistentVector. Creating the tail is just a copy of small array:
static member editableTail(tl : obj array) =
let arr : obj array = Array.zeroCreate 32
Array.Copy(tl,arr,tl.Length)
arr
We need to have the same basic structure for indexing our transient version so that we can easily switch from persistent to transient and back to persistent without copying large parts of the tree. This means using the same PVNode structure we had before.
And here is where the magic happens. Our PVNode has a field in it the purpose of which is to identify whether the node can be mutated as part of the current operation, or instead needs to be copied.
Here is PVnode in its entirety:
PVNode(edit: AtomicBoolean, array: obj array) =
member _.Edit = edit
member _.Array = array
new(edit) = PVNode(edit, (Array.zeroCreate 32))
The Edit field is used to contain a token serving two purposes: to provide a value indicating being involved with the mutable operations of a particular transient root, and to indicate that transience has ended. Though there a number of ways to accomplish this, a simple solution is an AtomicBoolean. It will compare as equal only to itself. If the Edit value of a node is equal to the Edit value of the root, then we can mutate it. If not … see below.
Thus, the code to create the root node of our new TransientVector:
static member editableRoot(node:PVNode) =
PVNode(AtomicBoolean(true),
node.Array.Clone() :?> obj array)
As we perform mutating operations, we just need to check whether the node we are about to operate on is participating in this transient transaction. If not, we work on a copy instead. This is checked by EnsureEditable:
member this.ensureEditable(node:PVNode) =
if node.Edit = root.Edit then
node
else
PVNode(root.Edit,node.Array.Clone() :?> obj array)
Before we begin using one of our mutating methods, we also need to check that the root node indicates that transience is still being done, i.e., that we haven’t called persistent yet. That is done by calls to another ensureEditable. This one checks the flag in the PersistentBoolean:
member this.ensureEditable() =
if not <| root.Edit.Get() then
raise <| InvalidOperationException("Transient used after persistent! call")
TransientVector ends up duplicating much of the code of PersistentVector for operations such as conj, the difference being that mutating operations are done where possible. Consider /cons/conj specifically. Here is the PersistentVector version:
override this.cons(o) =
if cnt - this.tailoff () < 32 then
// room in the tail
let newTail = Array.zeroCreate (tail.Length + 1)
Array.Copy(tail, newTail, tail.Length)
newTail[tail.Length] <- o
PersistentVector(meta, cnt + 1, shift, root, newTail)
else
// tail is full, push into tree
let tailNode = PVNode(root.Edit, tail)
let newroot, newshift =
// overflow root?
if (cnt >>> 5) > (1 <<< shift) then
let newroot = PVNode(root.Edit)
newroot.Array[0] <- root
newroot.Array[1] <- PersistentVector.newPath (root.Edit, shift, tailNode)
newroot, shift + 5
else
this.pushTail (shift, root, tailNode), shift
PersistentVector(meta, cnt + 1, newshift, newroot, [| o |])
And the TransientVector version:
member this.conj(v) =
this.ensureEditable()
let n = cnt
if n - this.tailoff() < 32 then
// room in tail?
tail[n &&& 0x01f] <- v
cnt <- n+1
this
else
// full tail, push into tree
let tailNode = PVNode(root.Edit,tail)
tail <- Array.zeroCreate 32
tail[0] <- v
let newRoot, newShift =
if (n >>> 5) > (1 <<< shift) then
let newRoot = PVNode(root.Edit)
newRoot.Array[0] <- root
newRoot.Array[1] <- TransientVector.newPath(root.Edit,shift,tailNode)
newRoot, shift+5
else
let newRoot = this.pushTail(shift, root, tailNode)
newRoot, shift
root <- newRoot
shift <- newShift
cnt <- n+1
this
We can see mutation on the tail array and on the TransientVector’s root, shift and count.
Finally, we can move back to a persistent structure easily. Just create a head node and set your AtomicBoolean to a false value. Then we cannot do any more mutation operations through the root node. It is safe to use the root here. (We do create a new tails that is no bigger than required.)
member this.persistent() =
this.ensureEditable()
root.Edit.Set(false)
let trimmedTail : obj array = Array.zeroCreate (cnt-this.tailoff())
Array.Copy(tail,trimmedTail,trimmedTail.Length)
PersistentVector(cnt,shift,root,trimmedTail)
Create similar modifictionsfor other operations such as pop and assoc and you are done.