Key-indexed counting (counting sort)
stable version
int N = a.length;
int[] count = new int[R + 1];
for (int i = 0; i < N; i++)
count[a[i] + 1]++;
for (int i = 0; i < R; i++)
count[i + 1] += count[i];
for (int i = 0; i < N; i++)
aux[count[a[i]]++] = a[i];
for (int i = 0; i < N; i++)
a[i] = aux[i];
LSD Radix Sort (least-significant-digit-first)
- Consider characters from right to left.
- Stably sort using dth character as the key (using key-indexed counting)
- Stable sort since based on key-indexed counting
- Guarantee O(2WN) where w is the width/length of each string
public static void sort(String[] a, int w) {
int n = a.length;
int R = 256; // extend ASCII alphabet size
String[] aux = new String[n];
for (int d = w-1; d >= 0; d--) {
// sort by key-indexed counting on dth character
// compute frequency counts
int[] count = new int[R+1];
for (int i = 0; i < n; i++)
count[a[i].charAt(d) + 1]++;
// compute cumulates
for (int r = 0; r < R; r++)
count[r+1] += count[r];
// move data
for (int i = 0; i < n; i++)
aux[count[a[i].charAt(d)]++] = a[i];
// copy back
for (int i = 0; i < n; i++)
a[i] = aux[i];
}
}
String sorting interview question
Problem. Sort one million 32-bit integers.
Solution. LSD radix sort takes linear time in the worst case.
MSD Radix sort (most-significant-digit-first)
- Partition array into R pieces according to first character (use key-indexed counting)
- Recursively sort all strings that start with each character (key-indexed counts delineate subarrays to sort).
Variable-length strings. Treat strings as if they had an extra char at end (smaller than any char).
3-way Radix Quicksort
The main reasons to use 3-way radix quicksort are speed and memory. 3-way radix quicksort is not stable and it uses more character compares than MSD radix sort. Both 3-way radix quicksort and MSD radix sort handle variable-length strings.
Suffix Array
Longest repeated substring:
- form suffixes - linear time
- sort suffixes to bring repeated substrings together
- compute longest prefix between adjacent suffixes
Bad input: if the longest repeated substring is too long
public String lrs(String s) {
int N = s.length();
String[] suffixes = new String[N];
for (int i = 0; i < N; i++)
suffixes[i] = s.substring(i, N);
Arrays.sort(suffixes);
String lrs = "";
for (int i = 0; i < N-1; i++) {
int len = lcp(suffixes[i], suffixes[i+1]); // find longest common prefix
if (len < lrs.length())
lrs = suffixes[i].substring(0, len);
}
return lrs;
}
Suffix sorting challenge
Worst-case running time for suffix sort an arbitrary string of length N.
- Linearithmic - Manber-Myers algorithm
- Linear - suffix trees: linear time construction + linear time inorder traversing the tree
Suffix tree
https://www.geeksforgeeks.org/pattern-searching-set-8-suffix-tree-introduction/