12/19/2011

Clocks with dynamic hands... and how Fortran makes sense

I have recently encountered the need to iterate through arrays with a dynamic number of dimensions. The usual way of having nested for loops does not work any more since the number of for loops would need to be adapted at run time, Generating code is cool, but to do it at runtime based on a dynamic data structure in a language which is not LISP can be infeasible.
The solution we propose comes in two parts, first the vectorisation of the matrix in order to facilitate access to the elements i.e. make it more consistent. When thinking about this, Fortran all of a sudden made sense.

The second part involves one of the possible implementations of the access to the data structure.
This new representation makes for a very simple way of iterating through arrays of dynamic dimensions.
Lets first provide a bit more context to illustrate all this.
In Octave you can create data structures with any number of dimensions.
When you want to iterate through them you either know the number of dimensions ahead of time or you can use a handy operator ":" that converts to a vector any matrix. This operation is called vectorisation and as such can be found on wikipedia.

This approach is fine but if you have to do something special per row, column or any slice of the array playing with indices can get scary. A way of getting the best of both words is to calculate the index of each array position as a function of an index of each one of the dimensions. When figuring out the mapping, which is really not difficult. It then becomes apparent why Fortran has its column first strategy.
The mapping is indeed very simple, and even simpler if we apply the Fortran convention
With the help of an array $D$ which describes the dimensions of the matrix.
$D$ can be used besides for validating if the element is inside bounds. To compute quickly what is the index of the element in the vectorized form of the matrix.
For the bound check we have that for a given index vector $V$ any of its elements $v_i$ for dimension $i$ has to be strictly bound by the value of $D$ for the same dimension $i$ a.k.a $d_i$ or in other words $0\leq v_i < d_i$.
The position of an element in the vectorialised form of the array is given by the following formula. $p = \sum_i=0^|D| v_i \prod_j=1^i d_j $ as you can see the first dimension is skipped.
The cool thing about this formula is that it still holds for the row first or Cs' way of storing arrays, all you need to do there is swap $v_0$ and $v_1$ as well as $d_0$ and $d_1$.

The other method although more involved programatically yields itself better to more complex manipulations. I like to call it the dynamic hand clock, because it functions just like a clock only the number of hands and the maximum value that they reach are dynamic.
The index of the current element is stored in an array $v$ where $v_i$ stores the current position for dimension $i$. Alongside the vector $v$ we also need a vector $m$ storing the maximum valid value for each dimension. The next index $v$ is then calculated by adding 1 modulo the maximum value for that index. If a rol-over occurs the vector is incremented at position $i+1$.

It becomes then very easy to react on such rol overs at each dimension, making this method very convenient for an iterator or a delegate pattern.