R-way Trie:

Deletion:

  • Walk all the way down the trie, and untag the bottom node that is tagged "isWord".
  • If this node doesn't have any children, delete and go back up
  • Delete all the nodes on way back up, until a node has tag "isWord" is encountered.

Challenge: use less memory

Ternary Search Trie

  • Store characters and values in nodes (not keys).
  • Each node has 3 children: smaller (left), equal (middle), larger (right).

Follow links corresponding to each character in the key.

  • If less, take left link; if greater take right link.
  • If equal, take the middle link and move to the next key character.

More space efficient than R-way trie: more nodes (one more for each character), but many fewer null links.

Even better - TST with R^2 branching at root

Hybrid of R-way trie and TST

  • Do R^2-way branching at root.
  • Each of R^2 root nodes points to a TST

TST comparisons bottom line

  • Faster than hashing (especially for search misses, since hash computation is based on the whole string, plus possible collisions).
  • More flexible than red-black BSTs.

Note: when radix is small, R-way trie is better.

Application

1) Ordered iteration - to iterate through all keys in sorted order:

  • Do in order traversal of trie; add keys encountered to a queue.
  • Maintain sequence of characters on path from root to node.

2) Prefix matches in an R-way trie. Collect all keys in the subtree corresponding to a prefix. (can do ordered iteration)

3) Longest prefix. Find longest key in symbol table that is a prefix of query string. e.g. router chooses IP address in routing table that is longest prefix match.

  • search for query string.
  • Keep track of longest key encountered. (e.g. last nodes with "isWord" tag)

Patricia trie

  • Remove one-way branching.
  • Each node represents a sequence of characters. (compressed singles children)

Applications: database search, P2P network search, IP routing table (find longest prefix match), storing and querying XML documents.

Suffix tree

  • Patricia trie of suffixes of a string.
  • Linear-time construction

Applications:

  • Linear-time longest repeated substring, longset common substring, longset palindromic substring, substring search, tandem repeats...
  • Computational biology databases

For now, know how to construct it in O(n^2)

  • Traverse backwards, similar to patricia trie

TODO: Linear construction using Ukkonen's algorithm: https://youtu.be/aPRqocoBsFQ

  • based on the O(n^3) algorithm
  • online algorithm - can be updated with new characters

Exercise Questions

1) Prefix free codes. In data compression, a set of binary strings is if no string is a prefix of another. For example, {01,10,0010,1111} is prefix free, but {01,10,0010,10100} is not because 10 is a prefix of 10100. Design an efficient algorithm to determine if a set of binary strings is prefix-free. The running time of your algorithm should be proportional the number of bits in all of the binary stings.

Insert the binary strings into a 2-way trie.

_Remark: _it's also possible to solve this problem using radix sorting or a ternary search trie.

2) Boggle. Boggle is a word game played on an 4-by-4 grid of tiles, where each tile contains one letter in the alphabet. The goal is to find all words in the dictionary that can be made by following a path of adjacent tiles (with no tile repeated), where two tiles are adjacent if they are horizontal, vertical, or diagonal neighbors

R-way trie.

3) Suffix trees. Learn about and implement , the ultimate string searching data structure.

results matching ""

    No results matching ""