Reductionism

You can be a boss at reducers if you know this one weird trick!

At least at one point I had hoped that was true. It turns out that getting reducers right requires thinking it through every single time you are confronted with new one. But I think we can come up with enough guidance so that after a few examples, we won’t really need to look at the reducers in collections to come; you’ll be able to understand them and verify them yourself.

Background

If you look for ‘clojure reduce’ in your search engine of choice, you might run across Reducers. Reducers are a very useful suite of functions that should definitely be in the arsenal of any Clojure programmer, but we have a simpler aim: clojure.core/reduce.

There are two forms of reduce, with two and three parameters. The two-parameter form takes a reducing function (of two arguments) that is applied first to the first two items of the collection, then to the result of that invocation and the third item, that result and the fourth item, etc., until the end of the collection is reached and the accumulated result is returned.

(reduce '+ [1 2 3 4 5] ; =>15

Effectively it computes (((1+2)+3)+4+5).

The three-paremeter version supplies a starting value that is supplied to the reducing function along with the first item, and so on.

(reduce '+ 10 [1 2 3 4 5] ; =>25

Effectivly it computes ((((10+1)+2)+3)+4+5).

Here is the contract for reduce:

f should be a function of 2 arguments. If val is not supplied, returns the result of applying f to the first 2 items in coll, then applying f to that result and the 3rd item, etc. If coll contains no items, f must accept no arguments as well, and reduce returns the result of calling f with no arguments. If coll has only 1 item, it is returned and f is not called. If val is supplied, returns the result of applying f to val and the first item in coll, then applying f to that result and the 2nd item, etc. If coll contains no items, returns val and f is not called.

The code for reduce in core.clj is indirect:

(defn reduce
  ([f coll]
     (if (instance? clojure.lang.IReduce coll)
       (.reduce ^clojure.lang.IReduce coll f)
       (clojure.core.protocols/coll-reduce coll f)))
  ([f val coll]
     (if (instance? clojure.lang.IReduceInit coll)
       (.reduce ^clojure.lang.IReduceInit coll f val)
       (clojure.core.protocols/coll-reduce coll f val))))

A collection implements the IReduce and/or IReduceInit interfaces to provide a specialized, presumably more efficient, reduction algorithm. Otherwise, the magic of protocols is used to extend reduction to types that do not have these interfaces defined. That code is in clojure.core.protocols. The protocols aspect is not our focus here; we are interested in how to implement the interfaces in our collections.

The interfaces

And here they are:

[<AllowNullLiteral>]
type IReduceInit =
    abstract reduce: IFn * obj -> obj

[<AllowNullLiteral>]
type IReduce =
    inherit IReduceInit
    abstract reduce: IFn -> obj

IReduceInit.reduce take a function and start argument. IReduce.reduce just takes the reducing function. The missing argument is the collection itself, which is this.

Feeling reduced, but not diminished

Before we get to code, we need to have a little chat about Reduced. It figures significantly in the code we are about to write.

It is hard to find information about Reduced. I checked five books on Clojure and found nary a mention. The most prominent mention of Reduced is in the reference on Transducers.

Reduced is used to stop reductions early. From the Transducers article:

Clojure has a mechanism for specifying early termination of a reduce:

…

A process that uses transducers must check for and stop when the step function returns a reduced value (more on that in Creating Transducible Processes). Additionally, a transducer step function that uses a nested reduce must check for and convey reduced values when they are encountered.

A reduced value is literally an object of type Reduced. It just wraps a value, making it available through the deref method of the IDeref interface:

type IDeref =
    abstract deref: unit -> obj

[<Sealed>]
type Reduced(value) =
    interface IDeref with
        member _.deref() = value

You can read the Transducers article for reasons for using this. For one thing, it is the only way to run a reduction over an infinite collection – you have to send a signal that you’ve had enough.

One essential rule when writing a reduce method (for IReduce and IReduceInit): after each invocation of the reduction function, check the result to see if it is an instance of Reduced; if so, stop immediately and return the deref value.

Note: if you are actually writing transducers, you might need to be passing back the Reduced object itself. This is not our concern. Our rule is only for IReduceInit.reduce and IReduce.reduce.

Some code

I’m going to start with some of the original C# code (essentially identical to the Java code) to see the problems we have with making a translation to F#. One of the simplest comes from PersistentList:

public object reduce(IFn f)
    {
        object ret = first();
        for (ISeq s = next(); s != null; s = s.next()) { 
            ret = f.invoke(ret, s.first());
            if (RT.isReduced(ret))
                return ((IDeref)ret).deref();
        }
        return ret;
    }

public object reduce(IFn f, object start)
{
    object ret = f.invoke(start, first());
    for (ISeq s = next(); s != null; s = s.next()) {
        if (RT.isReduced(ret))
            return ((IDeref)ret).deref(); 
        ret = f.invoke(ret, s.first());
    }
    if (RT.isReduced(ret))
        return ((IDeref)ret).deref();
    return ret;
}

The contract for reduce makes these demands:

Without a start value:

a. if the collection has no items, return the result of calling f with no arguments

b. if the collection has only one item, return that item (f is not called)

c. the first application of f should be to the first and second items in the collection

d. if a call to f results in a Reduced instance, dereference it and return that value.
With a start value:

a. if the collection has no items, return the start value (f is not called)

b. the first call to f should be on the start value and the first item

c. if a call to f results in a Reduced instance, dereference it and return that value.

It might appear that requirements (1a) and (2a) are violated in the PersistentList code. And then you remember that PersistentList always has at least one member so that no check for emptiness is required. You should verify that the other conditions are met.

You can’t take that C# code and just copy it into F#. It relies on early returns out of loops, which we don’t have in F#. And we’d probably prefer to avoid mutating bindings. The technique often used is to translate to a recurive function that does the looping, which is essentially the same in our examples as using a recur loop in Clojure.

For example, take our first reduce above:

        object ret = first();
        for (ISeq s = next(); s != null; s = s.next()) { 
            ret = f.invoke(ret, s.first());
            if (RT.isReduced(ret))
                return ((IDeref)ret).deref();
        }
        return ret;

Two things change from iteration to iteration: the values of ret and s; just look for the assignments to those variables. Those become our parameters. Regular exit is when s = null – we negate the condition of loop continuation to get the condition for method termination. Early exit is done by checking for Reduced. Thus our loop can be encoded by

let rec step (ret:obj) (s:ISeq) =
    if isNull s then
        ret
    else
        match f.invoke(ret,s.first()) with
        | :? Reduced as red -> (red:>IDeref).deref()
        | nextRet -> step nextRet (s.next())

The iteration is started by calling step on arguments that set up the correct initial values for ret and s:

step (this:>ISeq).first() ((this:>ISeq).next())

The other reduce is similar

interface ReduceInit with
    member this.reduce(f,init) =
        let rec step (ret:obj) (s:ISeq) =
            if isNull s then
                ret
            else
                match ret with
                | :? Reduced as red -> (red:>IDeref).deref()
                | _ -> step (f.invoke(ret,s.first())) (s.next())
    let ret = step (f.invoke(start,(this:>ISeq).first())) (this:>ISeq>.next())
    match ret with
    | :? Reduced as red -> (red:>IDeref).deref()
    |_ -> ret

If you look closely, there is distinct difference between the two, both in the original and in the translation. For the first one, in the loop, we call f and check its value. For the second one, we check the value from the previous iteration, then call f to generate a value to pass for the next iteration. If one writes the start-value version in C# this way:

public object reduce(IFn f, object start)
{
    object ret = f.invoke(start, first());
    if (RT.isReduced(ret)) 
        return ((IDeref)ret).deref();

    for (ISeq s = next(); s != null; s = s.next()) {
        ret = f.invoke(ret, s.first());
        if (RT.isReduced(ret))
            return ((IDeref)ret).deref(); 
    }
    return ret;
}

the loop body is now the same here as in the first one. Translating ths into F#, the two versions now have identical step functions. You can move that into a method, leading to this code:

member this.recurser(acc:obj, s:ISeq) =
    if isNull s then
        ret
    else
        match f.invoke(ret,s.first()) with
        | :? Reduced as red -> (red:>IDeref).deref()
        | nextRet -> this.recurser(nextRet, (s.next()))

interface IReduce with
    member this.reduce(f) = 
        let asSeq = this:>ISeq
        this.recurser(asSeq.first(),asSeq.next())

interface IReduceInit with
    member this.reduce(f,start) =
        let asSeq = this:> ISeq
        match f.invoke(start,asSeq.first()) with
        | :? Reduced as red -> (red:>IDeref).deref()
        | acc -> this.recurser(acc,asSeq.next())

Because the start-value version does a call of f(start,first()) before we get into the loop, we must make sure to check it for Reduced before looping.

If you check carefully against our requirements, you will find that they are all met. Do not neglect to do this exercise for every reduce you write. Trust me.

Cycling

Let’s do one more. There is a cycle function in Clojure that “[r]eturns a lazy (infinite!) sequence of repetitions of the items in coll.” It just calls a factory method on a the Create class.

(cycle [1 2 3] ) ;=> (1 2 3 1 2 3 1 2 3 ...)

A simple implementation of Cycle would hold the original sequence on the side so we could start over at the beginning if we have run through all the elements. It then just needs to know the ‘current’ sequence. Calling first() on the Cycle would just call first() on the ‘current’ sequence. Calling next() on the Cycle, we’d call next() on the underlying sequence and make that result the ‘current’ sequence in a new Cycle object.

The actual implementation of Cycle works a little harder in order to more efficient, by being lazy about calling next on the underlying sequence. One does not really need to know the next() on the underlying sequence until you either call first() or next() on the cycle object. At that point you can compute next(). We will need a mutable field in our Cycle to save the ‘current’ sequence when we finally get around to computing it. This will not be visible from the outside, so Cycle is immutable to outward appearance.

It’s probably easier just to look at the code.

type Cycle private (meta:IPersistentMap, all:ISeq, prev:ISeq, c:ISeq, n:ISeq) = 
    inherit ASeq(meta)
    
    [<VolatileField>]
    let mutable current : ISeq = c   // lazily realized

    [<VolatileField>]
    let mutable next : ISeq = n  // cached
    
    private new(all,prev,current) = Cycle(null,all,prev,current,null)

    static member create(vals:ISeq) : ISeq =
        if isNull vals then
            PersistentList.Empty
        else
            Cycle(vals,null,vals)

    member this.Current() =
        if isNull current then
            let c = prev.next()
            current <- if isNull c then all else c

        current

    interface ISeq with
        override this.first() = this.Current().first()
        override this.next() =
            if isNull next then
                next <- Cycle(all,this.Current(),null)

            next

A couple of small details. If Cycle.create(s) is called with an empty sequence, we return an empty list, not a Cycle. If we have Cycle object in our hand, we are guaranteed that its base sequence is not empty. Note that both first() and next() access the ‘current’ sequence through a call to Current; that method takes care of noticing if the underlying field current is occupied – null indicates we haven’t done the work of calling next on the underlying sequence yet. When you access Current, it will do that computation and save the result. This code also handles cycling back to the beginning if we have reached the end. It’s pretty slick. (Note: the cleverness is in the Java code. I didn’t come up with it. Authorship note in that file credits Alex Miller. Little tricks to promote laziness pop up all over the place.)

On to reduce. We will need to advance through the underlying sequence to access successive items. We do not need to use Cycle.next() to do this – that would create a bunch of unnecessary Cycle items. We just need to compute on the underlying sequence directly, performing the action that is done in Cycle.Current(). The following method does this.

member this.advance(s: ISeq) =
    match s.next () with
    | null -> all         // we've hit the end, cycle back to the beginning
    | x -> x

Consider the no-start-value version of reduce. We always have items, no need to check. The sequence is infinite, so there is no end condition from the sequence. The only way out is to get a Reduce back from f. I wrote down the sequence of steps and looked for a loop point.

    acc <- first
    s <- advance from current (because we have just eaten the first element)
    Loop:
    newAcc = f.invoke(acc, s.first())
    check newAcc for Reduced -> leave
    loop with newAcc, advance(s)

How does the start-value version compare?

    acc <- start-value
    s <- current
    Loop:
    newAcc = f.invoke(acc, s.first())
    check newAcc for Reduced -> leave
    loop with newAcc, advance(s)

The loop is the same, other than how get started. Verify that the conditions (1c), (1d), (2b), and (2c) are met. (The others don’t matter.) And this goes straight to code.

member this.reducer(f: IFn, startVal: obj, startSeq: ISeq) =
    let rec step (acc: obj) (s: ISeq) =
        match f.invoke (acc, s.first ()) with
        | :? Reduced as red -> (red :> IDeref).deref ()
        | nextAcc -> step nextAcc (this.advance s)

    step startVal startSeq

interface IReduce with
    member this.reduce(f) =
        let s = this.Current()
        this.reducer (f, s.first (), this.advance (s))

interface IReduceInit with
    member this.reduce(f, v) = this.reducer (f, v, this.Current())

Side note

The only way to test Cycle.reduce is to have an IFn that at some point returns a Reduced object. The magic of F#’s object expressions comes in handy here. We can create an object directly that implements IFn. However, don’t try to do this with an object expression based on IFn directly – you’d have to have an entry for each of the almost-20 invoke methods. Instead, you can base your object expression on AFn, an abstract class that has default implementations (raising `NotImplementedException’) for all of them. Here is an extract of my test code (using Expecto for writing the tests):

let adderStopsShort n =
        { new AFn() with
            member this.ToString() = ""
        interface IFn with
            member this.invoke(x, y) =
                if Numbers.gte(y ,n:>obj) then Reduced(x) else Numbers.add(x,y) }

let iter = Cycle.create(LongRange.create(100)) :?> IReduce
Expect.equal (iter.reduce (adderStopsShort 10)) 45L "Add them up for a while"
Expect.equal (iter.reduce ((adderStopsShort 10),100L)) 145L "Add them up for a while, with a kicker"

Our cycle is based on a 100-element LongRange. addStopsShort called on a stop value yields an IFn with this behavior: When the second argument reaches the stop value, it returns the current value of the accumulator wrapped in a Reduced; otherwise, it is just +.

(The override of ToString in the object expression is necesary. It seems you can’t just override an inteface only.)

And with that, let’s quit.

Behind the scenes

What I’ve not talked about is all the machinery behind the CollReduce protocol. That all lies out in the Clojure source code and is not our present concern. Mostly. I did have to dig into it to solve one problem. There is a reduce method in ArrayChunk. That actually is the reduce method for the IChunk interface. (See previous post Laziness and Chunking The reduce method in ArrayChunk does stop early when it gets a Reduced object back from the reducer function, but it returns the Reduced object, not the wrapped value. I struggled with this for a while until finally getting set on the correct track by Alex Miller over in the #clr-dev channel in the Clojurian slack. First is to note that this reduce is for IChunk. Then you have to figure out where it gets called from. And that’s where the protocol comes in. reduce will through the CollReduce protocol, which in this case will end up going through the InternalReduce protocol, wherein we find a handler for IChunkedSeq:

  clojure.lang.IChunkedSeq
  (internal-reduce
   [s f val]
   (if-let [s (seq s)]
    (if (chunked-seq? s)
       (let [ret (.reduce (chunk-first s) f val)]
         (if (reduced? ret)
           @ret
           (recur (chunk-next s)
                  f
                  ret)))
       (interface-or-naive-reduce s f val))
	 val))

It is this handler that calls Chunk.reduce. It notes the returned Reduced value, stops the iteration, and does the deref. If ArrayChunk did the deref, this handler woudn’t know to stop.

My head hurts.

End note

If you want to get a sense of the history of reduce, reducers, and transducers, check out the Clojure change log. These things take time to develop. Changes sometimes work through the code slowly. clojure.lang.Reduced was introduced in 2012 and incorporated into some of the reduce methods at that time. (Here is the commit.) But other edits came later. For example, it was two years later that IReduceInit was split off from IReduce (this commit) and checking for a Reduce‘d value was added to PersistentList.reduce() (this commit).

If you’ve made it this far, you’re likely someone who would check these things out.