Key-indexed counting (counting sort)

stable version

int N = a.length;
int[] count = new int[R + 1];

for (int i = 0; i < N; i++)
    count[a[i] + 1]++;

for (int i = 0; i < R; i++)
    count[i + 1] += count[i];

for (int i = 0; i < N; i++)
    aux[count[a[i]]++] = a[i];

for (int i = 0; i < N; i++)
    a[i] = aux[i];

LSD Radix Sort (least-significant-digit-first)

  • Consider characters from right to left.
  • Stably sort using dth character as the key (using key-indexed counting)
  • Stable sort since based on key-indexed counting
  • Guarantee O(2WN) where w is the width/length of each string
public static void sort(String[] a, int w) {
    int n = a.length;
    int R = 256;   // extend ASCII alphabet size
    String[] aux = new String[n];

    for (int d = w-1; d >= 0; d--) {
        // sort by key-indexed counting on dth character

        // compute frequency counts
        int[] count = new int[R+1];
        for (int i = 0; i < n; i++)
            count[a[i].charAt(d) + 1]++;

        // compute cumulates
        for (int r = 0; r < R; r++)
            count[r+1] += count[r];

        // move data
        for (int i = 0; i < n; i++)
            aux[count[a[i].charAt(d)]++] = a[i];

        // copy back
        for (int i = 0; i < n; i++)
            a[i] = aux[i];
    }
}

String sorting interview question

Problem. Sort one million 32-bit integers.

Solution. LSD radix sort takes linear time in the worst case.

MSD Radix sort (most-significant-digit-first)

  • Partition array into R pieces according to first character (use key-indexed counting)
  • Recursively sort all strings that start with each character (key-indexed counts delineate subarrays to sort).

Variable-length strings. Treat strings as if they had an extra char at end (smaller than any char).

3-way Radix Quicksort

The main reasons to use 3-way radix quicksort are speed and memory. 3-way radix quicksort is not stable and it uses more character compares than MSD radix sort. Both 3-way radix quicksort and MSD radix sort handle variable-length strings.

Suffix Array

Longest repeated substring:

  1. form suffixes - linear time
  2. sort suffixes to bring repeated substrings together
  3. compute longest prefix between adjacent suffixes

Bad input: if the longest repeated substring is too long

public String lrs(String s) {
    int N = s.length();

    String[] suffixes = new String[N];
    for (int i = 0; i < N; i++) 
        suffixes[i] = s.substring(i, N);

    Arrays.sort(suffixes);

    String lrs = "";
    for (int i = 0; i < N-1; i++) {
        int len = lcp(suffixes[i], suffixes[i+1]); // find longest common prefix
        if (len < lrs.length())
            lrs = suffixes[i].substring(0, len);
    }
    return lrs;
}

Suffix sorting challenge

Worst-case running time for suffix sort an arbitrary string of length N.

  • Linearithmic - Manber-Myers algorithm
  • Linear - suffix trees: linear time construction + linear time inorder traversing the tree

Suffix tree

https://www.geeksforgeeks.org/pattern-searching-set-8-suffix-tree-introduction/

results matching ""

    No results matching ""