> This is how I understand it: up to step 5 we have determined some > pivots, which there are P-1 of, and broadcast these out to all > processors. Yes. > Now, we break up the already sorted data on each processor > into P sections of data and.....this is where I get lost. Call the pivots v(1), v(2), v(3), ..., v(p-1), which all of the processors have received. Each processor then determines its local block of elements B(i), for each (0 <= i <= P-1), such that for each element "b" in B(i), v(i-1) < b <= v(i). (Assume that v(0) = - infinity). > Are the > pivots supposed to be the min/max values for the data sent to each > partition? If so, this means that each section of data will not > necessarily have the same number of entries. Yes, that is correct. The number of elements in each block, |B(i)|, (0 <= i <= P-1), (and on each processor) may differ from block to block.