How to implement hardware for array sorting in ascending order (I) – With Comparator?

Implementing hardware based array sorting is frequently asked when interviewing with high frequency trading companies. There are quite a few software based sorting algorithms that hardware implementation can leverage.

If interested, you can go to this site and get a visual idea how each sorting algorithm works with random initial order.

In the following discussions, we assume the array to be sorted has 8 entries, or “array[7:0]”.

Bubble Sort

Bubble sort is the most straightforward and fundamental sorting algorithm. The pseudo code of bubble sort is shown below:

        // a[] is the array to be sorted, and
        // n is the number of elements in a[]
        void bubbleSort (int a[], int n) {
            int i, j;
            for (i=0; i < n; i++)
                for (j=1; j<n-i; j++)
                    if (a[j-1] > a[j])
                        swap(a[j-1], a[j]);
        }

To map bubble sort into hardware implementation, we need to unroll the nested for loops, and carefully pipeline the design. One possible implementation is shown below:

How to implement hardware for array sorting in ascending order from LSB to MSB – Bubble Sort

Inherently, bubble sort has a time complexity of O(X^2), assuming the array has X entries. The above hardware implementation achieves overall pipeline stages of (2X – 3). If timing allows, we can combine adjacent stages, such that the pipeline depth can be reduced. 

Selection Sort

In Iteration 0, we find the index M of the smallest entries in “array[7:0]”, and swap entry M with entry 0;

In Iteration 1, we find the index N of the smallest entries in “array[7:1]”, and swap entry N with entry 1;

We keep doing the same thing until Iteration 6, after which the entire array is sorted. The pseudo code of selection sort is shown below:

        // a[] is the array to be sorted, and
        // n is the number of elements in a[]
        void selectSort (int a[], int n) {
            int i, j, nMinIndex;
            for (i=0; i < n; i++) {
                nMinIndex = i;
                for (j=i+1; j<n; j++)
                    if (a[j] < a[nMinIndex])
                        nMinIndex = j;

                swap(a[i], a[nMinIndex]);
            }
        }

It is easy to see that, selection sort uses 7 + 6 + … + 1 = 28 comparisons, and 7 exchanges. To generalize, assuming the array has X entries, then selection sort uses (X^2 / 2) comparisons, and (X – 1) exchanges, or O(X^2) time complexity.

To map the above algorithm to hardware, we need a trade-off between hardware resource and number of cycles. For example, if we use 7 cycles to sort “array[7:0]”, we only need 1 set of “find-min” logic and perform 1 exchange per cycle; or, if we use 4 cycles to sort “array[7:0]”, then we need 2 sets of “find-min” logic and perform 2 exchanges per cycle.

Merge Sort

Merge sort is a typical application of the Divide-and-Conquer concept. The basic plan is to first divide the array into two halves, then sort each half recursively, and finally merge the two sorted halves.

The pseudo code below shows how to merge two sorted array:

        // a[] is the array be sorted, and
        // n is the number of elements in a[]
        void mergeSort(int a[], int n) {
            int *tempArray = new int[n];
            sort(a, 0, n-1, tempArray);
            delete tempArray;
        }

        void sort(int a[], int first, int last, int temp[]) {
            if (first < last) {
                int mid = (first + last) / 2;
                // recursively sort each half
                sort(a, first, mid, temp);
                sort(a, mid+1, last, temp);
                mergeArray(a, first, mid, last, temp);
            }
        }

        // merge 2 sorted array together
        void mergeArray(int a[], int first, int mid, int last, int temp[]) {
            int i = first, j = mid + 1;
            int m = mid, n = last;
            int k = 0;

            while (i <= m && j <= n) {
                if (a[i] < a[j])
                    temp[k++] = a[i++];
                else
                    temp[k++] = a[j++];
            }

            while (i <= m) {
                temp[k++] = a[i++];

            while (j <= n) {
                temp[k++] = a[j++];

            for (i=0; i<k; i++)
                a[first+i] = temp[i];
        }

Merge sort is quite efficient, and it has a time complexity of O(X * logX), assuming there are X entries in the array. It is worth noting that the above merge sort is not performed in-place, i.e., it needs to allocate extra space to sort the array.

We show one possible hardware implementation for merge sort below. Note that, unlike its software counterpart, hardware implementation can sort the array in-place, i.e., without extra storage space.

How to implement hardware for array sorting in ascending order from LSB to MSB – Merge Sort

The pipeline above has 6 stages, and 19 compare & swap blocks. Stage 0 is sorting the array in 4 sub-arrays; Stage 1 & 2 are sorting the above and lower half of the array; Stage 3, 4 & 5 are merging the sorted two halves.

Reference

  1. https://en.wikipedia.org/wiki/Sorting_network
  2. https://www.toptal.com/developers/sorting-algorithms/random-initial-order

Subscribe

Enter your email to get updates from us. You might need to check the spam folder for the confirmation email.

Leave a comment