PDA

View Full Version : Numbering enabled PE's in ascending order


DaveC
04-30-2008, 05:43 PM
Here is a little function i wrote for my binning work, numbering only the enabled PEs, using swazzle, but without using divides or comparisons for speed.

Its a little complicated, but the principle is, say n have succeded, and i want to fill the next n bin spaces with them, and there is no pattern to which succeeded, which failed. So they need to be numbered 1 to n, with all the failed ones numbered 0.

In the more general case i use, i want numbers i can use as addresses, so you pass in the number of the first space in the bin (startAdd), and the number of spaces to skip with each increment (increment). peAdd is a pointer to the number to be used for addressing that is returned, peEnable is an enable flag - set to 0 if not to be binned, 1 if it is to be binned.

Also the function return value is the number of new addresses calculated (ie n).


// All PEs with peEnable 0, peAdd =0
// For each PE where peEnable = 1, peAdd = startAdd + t * increment, where t is the number of preceding enabled PEs
inline mono int generateAddressNumbers(poly int peEnable, poly int * mono peAdd, mono int startAdd, mono int increment)
{
poly int toSwazzle, buffer, buffer2, peAddress;
mono int i, numPes, numberMade=0;

numPes = get_num_pes();
toSwazzle = 0;
peAddress = 0;
buffer2 = 0;
set_swazzle_low(increment);
buffer2 = swazzle_up(buffer2);
set_swazzle_low(startAdd);
buffer = swazzle_up(toSwazzle);
set_swazzle_low(0);

for(i=0;i<numPes;i++)
{
peAddress += (peEnable*buffer);
toSwazzle = buffer+((peEnable)*buffer2);
buffer2 = swazzle_up_zero(buffer2);
buffer = swazzle_up_zero(toSwazzle);
}

numberMade = (get_swazzle_high_int() - startAdd)/increment;

*peAdd = peAddress;

return numberMade;
}


Dave

Solomon
06-02-2008, 02:45 PM
Note that this algorithm presented below is a form of the 'prefix sum' otherwise known as SCAN
where the input is one number per PE
and the output is for each PE the sum off all values to the left of itself.

If speed matters then there are algorithms that are O(log N). i.e.
for 96 PEs you can get the sum of the 95 PEs to the left in less that 95 addition steps
(in fact only 7 addition steps are needed: 2^7=128)


Also I am surprised at the integer divide step - intuition would suggest that it could be avoided ?
(i.e. convert to a mulitply by increment to give peAddress)

I will write up by algorithm in a day or two when I have more time