A mega-dose of micro-benchmarks, Part 3 – Finishing touches

We apply some finishing touches to the PersistentArrayMap creation code. And then declare success.

Refer to the preceding posts

for context.

We started all this work here:

    static member createWithCheck(init: obj array) =
        for i in 0 .. 2 .. init.Length-1 do
            for j in i + 2 .. 2 .. init.Length - 1  do
                if PersistentArrayMap.equalKey (init[i], init[j]) then
                    raise <| ArgumentException($"Duplicate key: {init[i]}")
                    
        PersistentArrayMap(init)

In Part 2, we dealt with the numeric comparison code underneath the equalKey call. What’s left?

It matters how you iterate

I looked at the IL generated from the code above, and found it untidy. Worse than untidy. It was using an enumerator to iterate over a Range!

So I decided to benchmark. Simple looping.

The constructs are:

InNoStep - for iter in 0 .. this.size do ...
ForToIteration -for iter = 0 to this.size-1 do ...
ManualIterationStep1 - manual iteration with an mutable index varialb.e

let mutable iter = 0
while iter < this.size do
    ...
    iter <- iter + 1

InStep1 - for iter in 0 .. 1 .. this.size do ...
ManualIterationStep2 - manual iteration with step size 2

let mutable iter = 0
while iter < doubleSize do
    ...
    iter <- iter + 2

InStep2 - for iter in 0 .. 2 .. doubleSize do ...

The results varied from .Net 6.0 to .Net 8.0. Here are the results for .Net 6.0:

Method	size	Mean	Ratio
InNoStep	1000	254.3 ns	1.00
ForToIteration	1000	255.4 ns	1.00
ManualIterationStep1	1000	255.8 ns	1.01
InStep1	1000	253.1 ns	1.00
ManualIterationStep2	1000	251.9 ns	0.99
InStep2	1000	3,737.7 ns	14.71

InNoStep	1000000	249,367.3 ns	1.00
ForToIteration	1000000	246,770.3 ns	0.99
ManualIterationStep1	1000000	249,436.2 ns	1.00
InStep1	1000000	249,051.1 ns	1.00
ManualIterationStep2	1000000	249,295.9 ns	1.00
InStep2	1000000	3,184,726.6 ns	12.77

Under .Net 8.0:

Method	size	Mean	Ratio
InNoStep	1000	264.3 ns	1.00
ForToIteration	1000	253.6 ns	0.95
ManualIterationStep1	1000	254.5 ns	0.95
InStep1	1000	256.2 ns	0.96
ManualIterationStep2	1000	254.6 ns	0.95
InStep2	1000	1,121.8 ns	4.20

InNoStep	1000000	246,060.1 ns	1.00
ForToIteration	1000000	249,332.2 ns	1.01
ManualIterationStep1	1000000	252,724.2 ns	1.03
InStep1	1000000	245,521.5 ns	1.00
ManualIterationStep2	1000000	246,447.9 ns	1.00
InStep2	1000000	1,073,703.8 ns	4.36

Not as bad, but still bad.

The F# source is:

    member this.InStep2() = 
        let doubleSize = 2*this.size
        let mutable i : int =0
        for iter in 0 .. 2 .. doubleSize do
            i <- i + 17
        i

I looked at the debug IL using ILSpy (and decompiled back into C#):

public int InStep2()
{
	int doubleSize = 2 * size;
	int i = 0;
	IEnumerable<int> enumerable = Operators.OperatorIntrinsics.RangeInt32(0, 2, doubleSize);
	foreach (int iter in enumerable)
	{
		i += 17;
	}
	return i;
}

Pretty much the same under both, so they changed something in the enumerable, I guess.

The weird thing is, when I use sharplab.io, I get something completely different:

        public int InStep2()
        {
            int num = 2 * size@;
            int num2 = 0;
            ulong num3 = ((num >= 0) ? ((ulong)(int)((uint)(num - 0) / 2u) + 1uL) : 0);
            ulong num4 = 0uL;
            int num5 = 0;
            while (num4 < num3)
            {
                num2 += 17;
                num5 += 2;
                num4++;
            }
            return num2;
        }

This should run just fine. What is going on? I’m going to have to hit the F# forums on this one.

At any rate, I changed the iteration code to use mutable indexes and manual stepping and got a percentage point or two.

   static member createWithCheck(init: obj array) =
        let mutable i = 0;
        while i < init.Length do
            let mutable j = i + 2
            while j < init.Length do
                if PersistentArrayMap.equalKey (init[i], init[j]) then
                    raise <| ArgumentException($"Duplicate key: {init[i]}")
                j <- j + 2
            i <- i + 2

        PersistentArrayMap(init)

And that’s about it. The code went from a 20-30% slowdown (it might have been worse than that, actually) to:

FirstCreatePAMCheck - this is the call to PersistentArrayMap.createWithCheck in the C# code
NextCreatePAMCheck - this is the call to PersistentArrayMap.createWithCheck in the F# code

The size colulmn indicates the size of the input array. This carries us from an empty map up to the maximum size we will use PersistentArrayMap for.

Method	size	Mean	Ratio
FirstCreatePAMCheck	0	4.491 ns	1.00
NextCreatePAMCheck	0	4.290 ns	0.95

FirstCreatePAMCheck	1	4.380 ns	1.00
NextCreatePAMCheck	1	4.290 ns	0.98

FirstCreatePAMCheck	2	20.843 ns	1.00
NextCreatePAMCheck	2	17.334 ns	0.83

FirstCreatePAMCheck	3	48.023 ns	1.00
NextCreatePAMCheck	3	39.711 ns	0.83

FirstCreatePAMCheck	4	91.690 ns	1.00
NextCreatePAMCheck	4	68.231 ns	0.74

FirstCreatePAMCheck	6	234.983 ns	1.00
NextCreatePAMCheck	6	160.788 ns	0.68

FirstCreatePAMCheck	8	439.240 ns	1.00
NextCreatePAMCheck	8	320.191 ns	0.73

FirstCreatePAMCheck	12	1,018.231 ns	1.00
NextCreatePAMCheck	12	742.523 ns	0.73

FirstCreatePAMCheck	16	1,839.793 ns	1.00
NextCreatePAMCheck	16	1,310.127 ns	0.71

Better across the board.

One caveat. We had removed a call to check for keyword keys. I’m not quite ready to implement Keyword yet, but I can substitute a check for a different type. So I put it back, but checked for some other type than Keyword.

Method	size	Mean	Ratio
FirstCreatePAMCheck	0	4.698 ns	1.00
NextCreatePAMCheck	0	4.215 ns	0.90

FirstCreatePAMCheck	1	4.221 ns	1.00
NextCreatePAMCheck	1	4.362 ns	1.03

FirstCreatePAMCheck	2	21.157 ns	1.00
NextCreatePAMCheck	2	18.872 ns	0.89

FirstCreatePAMCheck	3	48.321 ns	1.00
NextCreatePAMCheck	3	40.721 ns	0.84

FirstCreatePAMCheck	4	89.689 ns	1.00
NextCreatePAMCheck	4	77.693 ns	0.87

FirstCreatePAMCheck	6	229.218 ns	1.00
NextCreatePAMCheck	6	187.162 ns	0.82

FirstCreatePAMCheck	8	436.050 ns	1.00
NextCreatePAMCheck	8	374.963 ns	0.86

FirstCreatePAMCheck	12	1,011.322 ns	1.00
NextCreatePAMCheck	12	872.979 ns	0.86


FirstCreatePAMCheck	16	1,822.578 ns	1.00
NextCreatePAMCheck	16	1,497.199 ns	0.82

A slight degradation in performance, as expected, but we are still winning across the board.

I’m done.

For now.