(CS838: Topics in parallel computing, CS1221, Thu, Mar 25, 1999, 8:00-9:15 a.m.)

The algorithm consists of 8 phases:

- Phase 1: Sort snakelike individual blocks, all in parallel.
- Phase 2: Permute
*n*columns of each vertical slice to distribute them^{3/4}**uniformly**among all*m*vertical slices. The*i*-th column of slice*j*,*i=1,...,n*,^{3/4}*j=1,...,m*goes into vertical slice*(i\mod m)*into position*(j-1)n*(see Figure ).^{1/2}+(i\mod m)**CAPTION: Uniform distribution of columns among vertical slices. It can also be defined as a shuffle operation if the vertices are denoted by binary labels.** - Phase 3: Repeat Phase 1, i.e., sort snakelike individual blocks, all in parallel.
- Phase 4: Sort individual columns top-down, all in parallel.
- Phase 5: Sort all odd-even vertical pairs of blocks in all vertical slices snakelike (a pair of blocks is treated as one block of double height).
- Phase 6: Do the same with even-odd pairs of blocks.
- Phase 7: Sort individual rows of the whole mesh according to a global snakelike ordering.
- Phase 8: Perform
*2n*steps of odd-even transposition on this global snake.^{3/4}

**Lemma**

The algorithm sortsN=nnumbers on mesh^{2}M(n,n)into a snake-like order in3n+o(n)parallel steps.

(by 0-1 Sorting Lemma)

- Every horizontal slice consists of
*m*blocks and every vertical slice consists of*m*blocks. - After Phase 1, each block contains at most one dirty row (see Figure ).
**CAPTION: The situation after Phase 1.** - In Phase 2, all potential
*m*dirty rows in each horizontal slice are uniformly distributed among all blocks of the slice. In the worst case, all such dirty rows may have the 1's starting at the same position modulo*m*in each block and so after the permutation one block can get by at most*m*1's more than another one. Hence, after Phase 2, any two blocks within each horizontal slice can differ by at most*m*1's. - It follows that the number of 1's between any two vertical slices can differ by at most
*n*1's.^{1/2}= m^{2}**CAPTION: The situation after (a) Phase 3 and (b) Phase 4.** - Since any two blocks within one horizontal slice can differ by at most
*m*1's and the length of one row in each block is*n*, it follows that after Phase 3, each horizontal slice can have at most 2 dirty rows (see Figure (a)).^{3/4}=m^{3} - After Phase 4, each vertical slice can have at most
*m*dirty rows, since it consists of*m*blocks, each with at most one dirty row (see Figure (b)).**CAPTION: The situation after (a) Phase 5 and (b) Phase 6.** - Those
*m*dirty rows can span the boundary between even-odd or odd-even pairs of vertically adjacent blocks (as shown for example in Figure (b)), therefore we need both Phase 5 and 6 (see Figure ). - After Phase 6, each vertical slice can have at most one dirty row (see Figure (b)).
- We know that any 2 vertical slices can differ in at most
*n*1's. Since the rows in vertical slices have length^{1/2}*n*, there can exist at most 2 global dirty rows in^{3/4}> n^{1/2}*M(n,n)*and the number of 1's in them can differ by at most*n*.^{1/2}× m=n^{3/4} - After Phase 7, if just one dirty line remains, we are done (as in our example).
- If there are still 2 dirty lines, the upper one consists of 0's except perhaps for at most
*n*1's and the lower one consists of 1's except perhaps for at most^{3/4}*n*0's.^{3/4} - Then Phase 8 must complete the sorting.

- Phase 1:
*T*_{1}(N)=n^{3/4}(2log n^{3/4}+1)=O(n^{3/4}log n) - Phase 2:
*T*_{2}(N)=n - Phase 3:
*T*_{3}(N)=T_{3}(N) - Phase 4:
*T*_{4}(N)=n - Phase 5:
*T*_{5}(N)=O(n^{3/4}log n) - Phase 6:
*T*_{6}(N)=T_{5}(N) - Phase 7:
*T*_{7}(N)=n - Phase 8:
*T*_{8}(N)=2n^{3/4}

Back to the beginning of the page | Back to the CS838 class schedule |