Tuesday 6 October 2015

Sorting

7 Sorting
Sorting is one of the most important operations performed by computers. In the days of magnetic tape storage before modern data-bases, it was almost certainly the most common operation performed by computers as most "database" updating was done by sorting transactions and merging them with a master file. It's still important for presentation of data extracted from databases: most people prefer to get reports sorted into some relevant order before wading through pages of data!

7.1 Bubble, Selection, Insertion Sorts

There are a large number of variations of one basic strategy for sorting. It's the same strategy that you use for sorting your bridge hand. You pick up a card, start at the beginning of your hand and find the place to insert the new card, insert it and move all the others up one place.
/* Insertion sort for integers */

void insertion( int a[], int n ) {
/* Pre-condition: a contains n items to be sorted */
    int i, j, v;
    /* Initially, the first item is considered 'sorted' */
    /* i divides a into a sorted region, x<i, and an
       unsorted one, x >= i */
    for(i=1;i<n;i++) {
        /* Select the item at the beginning of the
           as yet unsorted section */
        v = a[i];
        /* Work backwards through the array, finding where v 
           should go */
        j = i;
        /* If this element is greater than v,
              move it up one */
        while ( a[j-1] > v ) {
          a[j] = a[j-1]; j = j-1;
          if ( j <= 0 ) break;
          }
        /* Stopped when a[j-1] <= v, so put v at position j */
        a[j] = v;
        }
    }    



Bubble Sort

Another variant of this procedure, called bubble sort, is commonly taught:
/* Bubble sort for integers */
#define SWAP(a,b)   { int t; t=a; a=b; b=t; }

void bubble( int a[], int n )
/* Pre-condition: a contains n items to be sorted */
    {
    int i, j;
    /* Make n passes through the array */
    for(i=0;i<n;i++)
        {
        /* From the first element to the end
           of the unsorted section */
        for(j=1;j<(n-i);j++)
           {
           /* If adjacent items are out of order, swap them */
           if( a[j-1]>a[j] ) SWAP(a[j-1],a[j]);
           }
        }
    }    

Analysis

Each of these algorithms requires n-1 passes: each pass places one item in its correct place. (The nth is then in the correct place also.) The ith pass makes either ior n - i comparisons and moves. So:
or O(n2) - but we already know we can use heaps to get an O(n logn) algorithm. Thus these algorithms are only suitable for small problems where their simple code makes them faster than the more complex code of the O(n logn) algorithm. As a rule of thumb, expect to find an O(n logn) algorithm faster for n>10 - but the exact value depends very much on individual machines!. They can be used to squeeze a little bit more performance out of fast sort algorithms -

Key terms

Bubble, Insertion, Selection Sorts
Simple sorting algorithms with O(n2) complexity - suitable for sorting small numbers of items only.

Binary Sort

7.4 Bin Sort
Assume that in the bin sort
  1. the keys of the items that we wish to sort lie in a small fixed range and
  2. that there is only one item with each value of the key.
Then we can sort with the following procedure:
  1. Set up an array of "bins" - one for each value of the key - in order,
  2. Examine each item and use the value of the key to place it in the appropriate bin.
Now our collection is sorted and it only took n operations, so this is an O(n) operation. However, note that it will only work under very restricted conditions.

Constraints on bin sort

To understand these restrictions, let's be a little more precise about the specification of the problem and assume that there are m values of the key. To recover our sorted collection, we need to examine each bin. This adds a third step to the algorithm above,
  1. Examine each bin to see whether there's an item in it.
which requires m operations. So the algorithm's time becomes:
T(n) = c1n + c2m
and it is strictly O(n + m). Now if m <= n, this is clearly O(n). However if m >> n, then it is O(m). For example, if we wish to sort 104 32-bit integers, then m = 232 and we need 232 operations (and a rather large memory!).
For n = 104:

nlogn ~ 104 x 13 ~ 213 x 24 ~ 217
So quicksort or heapsort would clearly be preferred. An implementation of bin sort might look like:
#define EMPTY -1 /* Some convenient flag */
void bin_sort( int *a, int *bin, int n ) {
    int i;
    /* Pre-condition: for 0<=i<n : 0 <= a[i] < M */
    /* Mark all the bins empty */
    for(i=0;i<M;i++) bin[i] = EMPTY;
    for(i=0;i<n;i++)
        bin[ a[i] ] = a[i];
    }

main() {
    int a[N], bin[M];    /* for all i: 0 <= a[i] < M */
    .... /* Place data in a */
    bin_sort( a, bin, N );

If there are duplicates, then each bin can be replaced by a linked list. The third step then becomes:
  1. Link all the lists into one list.
We can add an item to a linked list in O(1) time. There are n items requiring O(n) time. Linking a list to another list simply involves making the tail of one list point to the other, so it is O(1). Linking m such lists obviously takes O(m) time, so the algorithm is still O(n+m). In contrast to the other sorts, which sort in place and don't require additional memory, bin sort requires additional memory for the bins and is a good example of trading space for performance.
Although memory tends to be cheap in modern processors -
so that we would normally use memory rather profligately to obtain performance,
memory consumes power
and in some circumstances, eg computers in space craft,
power might be a higher constraint than performance.
Having highlighted this constraint, there is a version of bin sort which can sort in place:
#define EMPTY -1 /* Some convenient flag */
void bin_sort( int *a, int n ) {
    int i;
    /* Pre-condition: for 0<=i<n : 0 <= a[i] < n */
    for(i=0;i<n;i++)
 if ( a[i] != i )
            SWAP( a[i], a[a[i]] );
    }
However, this assumes that there are n distinct keys in the range 0 .. n-1. In addition to this restriction, the SWAP operation is relatively expensive, so that this version trades space for time. The bin sorting strategy may appear rather limited, but it can be generalised into a strategy known as Radix sorting.

Radix

7 Radix Sorting
The bin sorting approach can be generalised in a technique that is known as radix sorting.

An example

Assume that we have n integers in the range (0,n2) to be sorted. (For a bin sort, m = n2, and we would have an O(n+m) = O(n2) algorithm.) Sort them in two phases:
  1. Using n bins, place ai into bin ai mod n,
  2. Repeat the process using n bins, placing ai into bin floor(ai/n), being careful to append to the end of each bin.
This results in a sorted list. As an example, consider the list of integers:
36 9 0 25 1 49 64 16 81 4
n is 10 and the numbers all lie in (0,99). After the first phase, we will have:
Bin01 23 45 67 89
Content 0 1
81
-- 64
4
25 36
16
- - 9
49
Note that in this phase, we placed each item in a bin indexed by the least significant decimal digit.
Repeating the process, will produce:

Bin01 23 45 67 89
Content 0
1
4
9
16 25 36 49 - 64 - 81 -
In this second phase, we used the leading decimal digit to allocate items to bins, being careful to add each item to the end of the bin.
We can apply this process to numbers of any size expressed to any suitable base or radix.

7.5.1 Generalised Radix Sorting

We can further observe that it's not necessary to use the same radix in each phase, suppose that the sorting key is a sequence of fields, each with bounded ranges, eg the key is a date using the structure:
typedef struct t_date {
    int day;
    int month;
    int year; 
} date;
If the ranges for day and month are limited in the obvious way, and the range for year is suitably constrained, eg 1900 < year <= 2000, then we can apply the same procedure except that we'll employ a different number of bins in each phase. In all cases, we'll sort first using the least significant "digit" (where "digit" here means a field with a limited range), then using the next significant "digit", placing each item after all the items already in the bin, and so on. Assume that the key of the item to be sorted has k fields, fi|i=0..k-1, and that each fi has si discrete values, then a generalised radix sort procedure can be written:
radixsort( A, n ) {
    for(i=0;i<k;i++) {
        for(j=0;j<si;j++) bin[j] = EMPTY;
O(si)
        for(j=0;j<n;j++) {
            move Ai 
            to the end of bin[Ai->fi]
            }
O(n)
        for(j=0;j<si;j++) 
            concatenate bin[j] onto the end of A;
        }
}
O(si)
Total
Now if, for example, the keys are integers in (0,bk-1), for some constant k, then the keys can be viewed as k-digit base-b integers.
Thus, si = b for all i and the time complexity becomes O(n+kb) or O(n). This result depends on k being constant.
If k is allowed to increase with n, then we have a different picture. For example, it takes log2n binary digits to represent an integer <n. If the key length were allowed to increase with n, so that k = logn, then we would have:
.
Another way of looking at this is to note that if the range of the key is restricted to (0,bk-1), then we will be able to use the radixsort approach effectively if we allow duplicate keys when n>bk. However, if we need to have unique keys, then k must increase to at least logbn. Thus, as n increases, we need to have logn phases, each taking O(n) time, and the radix sort is the same as quick sort!

Sample code

This sample code sorts arrays of integers on various radices: the number of bits used for each radix can be set with the call to SetRadices. The Bins class is used in each phase to collect the items as they are sorted. ConsBins is called to set up a set of bins: each bin must be large enough to accommodate the whole array, so RadixSort can be very expensive in its memory usage!

Quickk Sort

Quick Sort
  1. Features
    • Similar to mergesort - divide-and-conquer recursive algorithm
    • One of the fastest sorting algorithms
    • Average running time O(NlogN)
    • Worst-case running time O(N2)

  2. Basic idea
    1. Pick one element in the array, which will be the pivot.
    2. Make one pass through the array, called a partition step, re-arranging the entries so that:
        • the pivot is in its proper place.
        • entries smaller than the pivot are to the left of the pivot.
        • entries larger than the pivot are to its right.
    3. Recursively apply quicksort to the part of the array that is to the left of the pivot,
      and to the right part of the array.
    Here we don't have the merge step, at the end all the elements are in the proper order.
  3. Algorithm
  4. STEP 1. Choosing the pivot
    Choosing the pivot is an essential step.
    Depending on the pivot the algorithm may run very fast, or in quadric time.:
      1. Some fixed element: e.g. the first, the last, the one in the middle
      2. This is a bad choice - the pivot may turn to be the smallest or the largest element,
        then one of the partitions will be empty.
      3. Randomly chosen (by random generator ) - still a bad choice.
      4. The median of the array (if the array has N numbers, the median is the [N/2] largest number. This is difficult to compute - increases the complexity.
      5. The median-of-three choice: take the first, the last and the middle element.
        Choose the median of these three elements.
    Example:
    8, 3, 25, 6, 10, 17, 1, 2, 18, 5
    The first element is 8, the middle - 10, the last - 5.
    The three elements are sorted: [5, 8, 10] and the middle element is 8. This is the median.
    STEP 2. Partitioning
    Partitioning is illustrated on the above example.
    After finding the pivot the array will look like this:
    5, 3, 25, 6, 8, 17, 1, 2, 18, 10 1. The first action is to get the pivot out of the way - swap it with the next to the last element
    5, 3, 25, 6, 18, 17, 1, 2, 8, 10
    2. We want larger elements to go to the right and smaller elements to go to the left.
    Two "fingers" are used to scan the elements from left to right and from right to left:
    
    [5, 3, 25, 6, 18, 17, 1, 2, 8, 10]
       ^                     ^
       i                     j
    
    Note: we know that the first element is smaller than the pivot, so the first element to be processed is the element to the right. The last two elements are the pivot and an element greater than the pivot, so they are not processed.
    • While i is to the left of j, we move i right, skipping all the elements
      less than the pivot. If an element is found greater then the pivot, i stops
    • While j is to the right of i, we move j left, skipping all the elements
      greater than the pivot. If an element is found less then the pivot, j stops
    • When both i and j have stopped, the elements are swapped.
    • When i and j have crossed, no swap is performed, scanning stops,
      and the element pointed to by i is swapped with the pivot .
    In the example the first swapping will be between 25 and 2, the second between 18 and 1.
    3. Restore the pivot.
    After restoring the pivot we obtain the following partitioning into three groups:
    [5, 3, 2, 6, 1] [ 8 ] [18, 25, 17, 10]
    STEP 3. Recursively quicksort the left and the right parts
  5. Code
  6. Here is the code, that implements the partitioning.
    left points to the first element in the array currently processed, right points to the last element.
    
    if( left + 10 <= right)
        {
     int i = left, j = right - 1;
     for ( ; ; )
       {
         while (a[++i] < pivot  ) {}   // move the left finger
         while (pivot  < a[--j] ) {}   // move the right finger
    
         if (i < j) swap (a[i],a[j]);  // swap 
                else  break;    // break if fingers have crossed
              }
           swap (a[I], a[right-1]);    // restore the pivot
           quicksort ( a, left, i-1);   // call quicksort for the left part
           quicksort (a, i+1, right);   // call quicksort for the left part
         }
    else  insertionsort (a, left, right);
    
    
    If the elements are less than 10, quicksort is not very efficient.
    Instead insertion sort is used at the last phase of sorting.
    Click here to see the above example worked out in details
    Animation:
    http://math.hws.edu/TMCM/java/xSortLab/
    In the animation, the leftmost element is chosen as a pivot.
  7. Implementation notes
  8. Compare the two versions:
    
    A.
     while (a[++i] < pivot) {}
     while (pivot < a[--j]) {}
    
     if (i < j) swap (a[i], a[j]);
     else break;
    
    B.
    
     while (a[i] < pivot) {i++;}
     while (pivot < a[j] ) {j--;}
    
     if (i < j) swap (a[i], a[j]);
     else break;
    
    If we have an array of equal elements, the second code will never increment i or decrement j,
    and will do infinite swaps. i and j will never cross.
  9. Complexity of Quicksort
  10. Worst-case: O(N2)

    This happens when the pivot is the smallest (or the largest) element.
    Then one of the partitions is empty, and we repeat recursively the procedure for N-1 elements.
    Best-case O(NlogN) The best case is when the pivot is the median of the array,
    and then the left and the right part will have same size.
    There are logN partitions, and to obtain each partitions we do N comparisons
    (and not more than N/2 swaps). Hence the complexity is O(NlogN)
    Average-case - O(NlogN)
    Analysis
    T(N) = T(i) + T(N - i -1) + cN
    The time to sort the file is equal to
    • the time to sort the left partition with i elements, plus
    • the time to sort the right partition with N-i-1 elements, plus
    • the time to build the partitions
    6. 1. Worst case analysis
    The pivot is the smallest element
    T(N) = T(N-1) + cN, N > 1 Telescoping:
    T(N-1) = T(N-2) + c(N-1)
    T(N-2) = T(N-3) + c(N-2)
    T(N-3) = T(N-4) + c(N-3)
    T(2) = T(1) + c.2
    Add all equations:
    T(N) + T(N-1) + T(N-2) + … + T(2) =
    = T(N-1) + T(N-2) + … + T(2) + T(1) + c(N) + c(N-1) + c(N-2) + … + c.2
    T(N) = T(1) + c(2 + 3 + … + N)
    T(N) = 1 + c(N(N+1)/2 -1)
    Therefore T(N) = O(N2) 6. 2. Best-case analysis:
    The pivot is in the middle
    T(N) = 2T(N/2) + cN
    Divide by N:
    T(N) / N = T(N/2) / (N/2) + c
    Telescoping:
    T(N/2) / (N/2) = T(N/4) / (N/4) + c
    T(N/4) / (N/4) = T(N/8) / (N/8) + c
    ……
    T(2) / 2 = T(1) / (1) + c
    Add all equations:
    T(N) / N + T(N/2) / (N/2) + T(N/4) / (N/4) + …. + T(2) / 2 =
    = (N/2) / (N/2) + T(N/4) / (N/4) + … + T(1) / (1) + c.logN
    After crossing the equal terms:
    T(N)/N = T(1) + cLogN = 1 + cLogN
    T(N) = N + NcLogN
    Therefore T(N) = O(NlogN)
    6. 3. Average case analysis
    Similar computations, resulting in T(N) = O(NlogN)
    The average value of T(i) is 1/N times the sum of T(0) through T(N-1)
    1/N S T(j), j = 0 thru N-1
    T(N) = 2/N (S T(j)) + cN
    Multiply by N
    NT(N) = 2(S T(j)) + cN*N
    To remove the summation, we rewrite the equation for N-1:
    (N-1)T(N-1) = 2(S T(j)) + c(N-1)2, j = 0 thru N-2
    and subtract:
    NT(N) - (N-1)T(N-1) = 2T(N-1) + 2cN -c
    Prepare for telescoping. Rearrange terms, drop the insignificant c:
    NT(N) = (N+1)T(N-1) + 2cN
    Divide by N(N+1):
    T(N)/(N+1) = T(N-1)/N + 2c/(N+1)
    Telescope:
    T(N)/(N+1) = T(N-1)/N + 2c/(N+1)
    T(N-1)/(N) = T(N-2)/(N-1)+ 2c/(N)
    T(N-2)/(N-1) = T(N-3)/(N-2) + 2c/(N-1)
    ….
    T(2)/3 = T(1)/2 + 2c /3
    Add the equations and cross equal terms:
    T(N)/(N+1) = T(1)/2 +2c S (1/j), j = 3 to N+1
    T(N) = (N+1)(1/2 + 2c S(1/j))
    The sum S (1/j), j =3 to N-1, is about LogN
    Thus T(N) = O(NlogN)
  11. Conclusions
  12. Advantages:
    • One of the fastest algorithms on average.
    • Does not need additional memory (the sorting takes place in the array - this is called in-place processing). Compare with mergesort: mergesort needs additional memory for merging.
    Disadvantages: The worst-case complexity is O(N2)
    Applications:
    Commercial applications use Quicksort - generally it runs fast, no additional memory,
    this compensates for the rare occasions when it runs with O(N2)
    Never use in applications which require guaranteed response time:

    • Life-critical (medical monitoring, life support in aircraft and space craft)
    • Mission-critical (monitoring and control in industrial and research plants
      handling dangerous materials, control for aircraft, defense, etc)
    unless you assume the worst-case response time. Comparison with heapsort:
    • both algorithms have O(NlogN) complexity
    • quicksort runs faster, (does not support a heap tree)
    • the speed of quick sort is not guaranteed
    Comparison with mergesort:
    • mergesort guarantees O(NlogN) time, however it requires additional memory with size N.
    • quicksort does not require additional memory, however the speed is not quaranteed
    • usually mergesort is not used for main memory sorting, only for external memory sorting.
    So far, our best sorting algorithm has O(nlog n) performance: can we do any better?
    In general, the answer is no.

Learning Goals
  • Be able to explain how quicksort works and what its complexity is (worst-case, best-case, average case).
  • Be able to compare quicksort with heapsort and mergesort.
  • Be able to explain the advantages and disadvantages of quicksort,
    and its applications.

Exam-like questions

  1. Briefly describe the basic idea of quicksort.
  2. What is the complexity of quicksort?
  3. Analyze the worst-case complexity solving the recurrence relation.
  4. Analyze the best-case complexity solving the recurrence relation.
  5. Compare quicksort with mergesort and heapsort.
  6. What are the advantages and disadvantages of quicksort?
  7. Which applications are not suitable for quicksort and why?

Quick S



Sorting Algorithms - QUICKSORT

Example

if( left + 10 <= right)
    {
 int i = left, j = right - 1;
 for ( ; ; )
   {
     while (a[++i] < pivot  ) {}   // move the left finger
     while (pivot  < a[--j] ) {}   // move the right finger

     if (i < j) swap (a[i],a[j]);  // swap 
            else  break;    // break if fingers have crossed
          }
       swap (a[i], a[right-1]);    // restore the pivot
       quicksort ( a, left, i-1);   // call quicksort for the left part
       quicksort (a, i+1, right);   // call quicksort for the left part
     }
else  insertionsort (a, left, right);
Let's see how this code will work on an example, disregarding the first if statement - the check for the number of elements.
Source array to be sorted: 10, 3, 25, 6, 8, 17, 1, 2, 18, 5 I. Compute the first partitioning:
Source array: 10, 3, 25, 6, 8, 17, 1, 2, 18, 5
  1. Choosing the pivot:
  2. The median-of-three: [10, 8, 5] : the median is 8
  3. Order the three elements and hide the pivot :
  4. 5, 3, 25, 6, 8, 17, 1, 2, 18, 10
    5, 3, 25, 6, 18, 17, 1, 2, 8, 10
  5. Start rearranging:
  6. Set two indices: i and j
    i = left (i.e. i = 0)
    j = right -1 (i.e. j = 8)
    
    
    5, 3, 25, 6, 18, 17, 1, 2, 8, 10
    ^                          ^
    i                          j
    
    At each run of the for loop we swap two elements
    3.1. First run: while (a[++i] < pivot ){}
    ++i means: first increment i, then perform the comparison.
    Move i to the right skipping elements that are less than the pivot
    We know that the very first element is less than the pivot, so we do not compare it
    
    
    5, 3, 25, 6, 18, 17, 1, 2, 8, 10
          ^                    ^
          i                    j
    
    i stops at 25, as 25 is greater than the pivot ( i = 2)
    while (pivot < a[--j] ) {}
    Initially j is 'right'-1 = 8. This is the pivot.
    We do not process the pivot, that is why we write --j - first decrement, and then compare.
    
    5, 3, 25, 6, 18, 17, 1, 2, 8, 10
          ^                 ^
          i                 j
    
    j stops at 2 , the first element from right to left that is smaller than 8. (j = 7)
    if (i < j) swap (a[i], a[j]);
    else break;
    i = 2 is less than j=7 and we swap 25 and 2:
    
    5, 3, 2, 6, 18, 17, 1, 25, 8, 10
          ^                 ^
          i                 j
    
    3.2. Second run of the for loop:
    
    5, 3, 2, 6, 18, 17, 1, 25, 8, 10
          ^                 ^
          i                 j
    
    We start with i = 2 , pointing to 2, and j = 7, pointing to 25
    while (a[++i] < pivot) {}
    6 is less then the pivot (8), we skip it, and i stops at 18 ( i = 4)
    
    5, 3, 2, 6, 18, 17, 1, 25, 8, 10
                 ^         ^
                 i         j
    
    while (pivot < a[--j]) {}
    1 is less than the pivot, j stops there. (j = 6)
    
    5, 3, 2, 6, 18, 17, 1, 25, 8, 10
                 ^      ^
                 i      j
    
    if (i < j) swap (a[i], a[j]);
    else break;
    i = 4 is less than j = 6 and we swap 18 and 1:
    
    5, 3, 2, 6, 1, 17, 18, 25, 8, 10
                ^       ^
                i       j
    
    3.3. Third run of the for loop:
    We start with i = 4 , pointing to 1, and j = 6, pointing to 18.
    
    5, 3, 2, 6, 1, 17, 18, 25, 8, 10
                ^       ^
                i       j
    
    while (a[++i] < pivot {}
    17 is greater than the pivot and i stops there (i = 5)
    while (pivot < a[--j] < {}
    1 is less then the pivot and j stops there ( j = 4)
    
    5, 3, 2, 6, 1, 17, 18, 25, 8, 10
                ^   ^
                j   i
    
    if (i < j) swap (a[i], a[j]);
    else break;
    Now we break, because i and j have crossed
  7. Bring back the pivot:
  8. The pivot is swapped with a[i]
    5, 3, 2, 6, 1, 8, 18, 25, 17, 10
    This is the first partitioning:
    [ 5, 3, 2, 6, 1 ] [8] [18, 25, 17, 10]
II. Recursively we process the left part of the array: 5, 3, 2, 6, 1
  1. Choosing the pivot: the median of three: 5, 2, 1:
  2. The median is 2
  3. Order the three elements and hide the pivot:
  4. 1, 3, 6, 2, 5
  5. Start the for loop.
    1. i will stop at 3 ( i = 1) - the first element greater than the pivot:
    2. j will stop at 1 (j = 0)- the first element from right to left smaller than the pivot.
    3. No swap is done, as i and j have crossed:
  6. Bring back the pivot: swap the pivot with a[1]
  7. 1, 2, 6, 3, 5
    The partitioning now is:
    [ 1 ] [ 2 ] [ 6, 3, 5 ]
III. Do the same with the left array and the right array.
The left array consists only of one element, so we stop.
In practice, we stop partitioning when the elements are small in number, e.g. n < 10,
and we sort with insertion sort. Here we stop at 1, for the sake of the example.
Next comes the right array: [ 6, 3, 5]
It is only of three elements, and the median of three choice will put them in order:
3, 5, 6
Now we have a sorted part: 1, 2, 3, 5, 6
IV. In a similar way we process the right part in the first partitioning: 18, 25, 17, 10.
After it has been sorted, we will obtain:
[1, 2, 3, 5, 6] [8] [10, 17, 18, 25] and this is the whole array sorted.

Quick SOrt

Quick Sort: Partition in place
Most implementations of quick sort make use of the fact that you can partition in place by keeping two pointers: one moving in from the left and a second moving in from the right. They are moved towards the centre until the left pointer finds an element greater than the pivot and the right one finds an element less than the pivot. These two elements are then swapped. The pointers are then moved inward again until they "cross over". The pivot is then swapped into the slot to which the right pointer points and the partition is complete.

int partition( void *a, int low, int high )
  {
  int left, right;
  void *pivot_item;
  pivot_item = a[low];
  pivot = left = low;
  right = high;
  while ( left < right ) {
    /* Move left while item < pivot */
    while( a[left] <= pivot_item ) left++;
    /* Move right while item > pivot */
    while( a[right] > pivot_item ) right--;
    if ( left < right ) SWAP(a,left,right);
    }
  /* right is final position for the pivot */
  a[low] = a[right];
  a[right] = pivot_item;
  return right;
  }
Note that this above code does not check that left does not exceed the array bound. You need to add this check, before performing the swaps - both the one in the loop and the final one outside the loop. partition ensures that all items less than the pivot precede it and returns the position of the pivot. This meets our condition for dividing the problem: all the items in the lower half are known to be less than the pivot and all items in the upper half are known to be greater than it.
Note that we have used our ItemCmp function in the partition function. This assumes that there is an external declaration for ItemCmp and that in any one program, we only want to sort one type of object. Generally this will not be acceptable, so the formal specification for quicksort in the Unix and ANSI C libraries includes a function compar which is supplied to qsort when it is called. Passing the function, compar, which defines the ordering of objects when qsort is called avoids this problem in the same way that we passed an ItemCmp function to ConsCollection

Analysis

The partition routine examines every item in the array at most once, so it is clearly O(n). Usually, the partition routine will divide the problem into two roughly equal sized partitions. We know that we can divide n items in half log2n times. This makes quicksort a O(nlogn) algorithm - equivalent to heapsort.
However, we have made an unjustified assumption - see if you can identify it before you continue.http://imraan-prrec.blogspot.in/2015/10/quick.html

Quick

7.3 Quick Sort (cont)

Quick Sort - The Facts!

Quick Sort is generally the best known sorting algorithm, but it has a serious limitation, which must be understood before using it in certain applications. What happens if we apply the qsort routine on the previous page to an already sorted array? This is certainly a case where we'd expect the performance to be quite good!
However, the first attempt to partition the problem into two problems will return an empty lower partition - the first element is the smallest. Thus the first partition call simply chops off one element and calls qsort for a partition with n-1 items! This happens a further n-2 times! Each partition call still requires O(n) operations - and we have generated O(n) such calls.
In the worst case, quicksort is an O(n2) algorithm!

Can we do anything about this?

A number of variations to the simple quicksort will generally produce better results: rather than choose the first item as the pivot, some other strategies work better.

Median-of-3 Pivot

For example, the median-of-3 pivot approach selects three candidate pivots and uses the median one. If the three pivots are chosen from the first, middle and last positions, then it is easy to see that for the already sorted array, this will produce an optimum result: each partition will be exactly half (±one element) of the problem and we will need exactly ceiling(logn) recursive calls.

Random pivot

Some qsort's will simply use a randomly chosen pivot. This also works fine for sorted arrays - on average the pivot will produce two equal sized partitions and there will be O(logn) of them.
However, whatever strategy we use for choosing the pivot,
it is possible to propose a pathological case
in which the problem is not divided equally at any partition stage. Thus quicksort must always be treated as potentially O(n2).

Why bother with quicksort then?

Heap sort is always O(nlog n): why not just use it?

Heap

Algorithms for Sorting - Heapsort


  1. Heapsort
  2. Based on Priority Queues
    Sorts in O(NlogN) time by performing N times deleteMin operations.
    Why O(NlogN)?
      • Each deleteMin operation takes logN running time.
      • N times performing deleteMin → NlogN running time
    Basic algorithm (not very efficient - extra memory required):
    1. Store N elements in a binary heap tree.
    2. Perform deleteMin operation N times, storing each element deleted from
      the heap into another array.
    3. Copy back the array.
    Improvements: Use the same array to store the deleted elements
    instead of using another array
    How can we do this?
    After each deletion we percolate down the hole that is formed after the deletion,
    and as a result we get a vacant position in the array - the last cell.
    There we store the deleted element.
    When all the elements are deleted and stored in the same array following the above method,
    the elements will be there in reversed order.
    What is the remedy for this?

    Store the elements in the binary heap tree in reverse order of priority -
    then at the end the elements in the array will be in correct order.
    Complexity of heap sort: O(NlogN) is the rough estimate.
    Here is an example of heapsort.




Learning Goals
  • Be able to explain how heapsort works and its complexity.

Exam-like questions
  1. Explain the complexity of heapsort
  2. Given an array e.g. 17, 23, 10, 1, 7, 16, 9, 20, sort it on paper using heapsort
    Write down explicitly each step.

Heap

An Example of Heapsort:
Given an array of 6 elements: 15, 19, 10, 7, 17, 16, sort it in ascending order using heap sort.
Steps:

  1. Consider the values of the elements as priorities and build the heap tree.
  2. Start deleteMin operations, storing each deleted element at the end of the heap array.
After performing step 2, the order of the elements will be opposite to the order in the heap tree.
Hence, if we want the elements to be sorted in ascending order, we need to build the heap tree
in descending order - the greatest element will have the highest priority.
Note that we use only one array , treating its parts differently:

  1. when building the heap tree, part of the array will be considered as the heap,
    and the rest part - the original array.
  2. when sorting, part of the array will be the heap, and the rest part - the sorted array.
This will be indicated by colors: white for the original array, blue for the heap and red for the sorted array
Here is the array: 15, 19, 10, 7, 17, 6
  1. Building the heap tree
  2. The array represented as a tree, complete but not ordered: 
    Start with the rightmost node at height 1 - the node at position 3 = Size/2.
    It has one greater child and has to be percolated down:
     
    After processing array[3] the situation is:
     
    Next comes array[2]. Its children are smaller, so no percolation is needed. 
    The last node to be processed is array[1]. Its left child is the greater of the children.
    The item at array[1] has to be percolated down to the left, swapped with array[2].
     
    As a result the situation is:
     
    The children of array[2] are greater, and item 15 has to be moved down further, swapped with array[5].
     
    Now the tree is ordered, and the binary heap is built.
  1. Sorting - performing deleteMax operations:
  2. 1. Delete the top element 19. 1.1. Store 19 in a temporary place. A hole is created at the top
     
    1.2. Swap 19 with the last element of the heap.

    As 10 will be adjusted in the heap, its cell will no longer be a part of the heap.
    Instead it becomes a cell from the sorted array
     
    1.3. Percolate down the hole
     
    1.4. Percolate once more (10 is less that 15, so it cannot be inserted in the previous hole)
     
    Now 10 can be inserted in the hole
     
    2. DeleteMax the top element 172.1. Store 17 in a temporary place. A hole is created at the top
    2.2. Swap 17 with the last element of the heap.
    As 10 will be adjusted in the heap, its cell will no longer be a part of the heap.
    Instead it becomes a cell from the sorted array
     
    2.3. The element 10 is less than the children of the hole, and we percolate the hole down:
     
    2.4. Insert 10 in the hole
     
    3. DeleteMax 163.1. Store 16 in a temporary place. A hole is created at the top
     
    3.2. Swap 16 with the last element of the heap.

    As 7 will be adjusted in the heap, its cell will no longer be a part of the heap.
    Instead it becomes a cell from the sorted array
     
    3.3. Percolate the hole down (7 cannot be inserted there - it is less than the children of the hole)
     
    3.4. Insert 7 in the hole
     
    4. DeleteMax the top element 154.1. Store 15 in a temporary location. A hole is created.
     
    4.2. Swap 15 with the last element of the heap.
    As 10 will be adjusted in the heap, its cell will no longer be a part of the heap.
    Instead it becomes a position from the sorted array
     
    4.3. Store 10 in the hole (10 is greater than the children of the hole)
     
    5. DeleteMax the top element 10.5.1. Remove 10 from the heap and store it into a temporary location.
     
    5.2. Swap 10 with the last element of the heap.
    As 7 will be adjusted in the heap, its cell will no longer be a part of the heap. Instead it becomes a cell from the sorted array  
    5.3. Store 7 in the hole (as the only remaining element in the heap
     
    7 is the last element from the heap, so now the array is sorted