Tries

Structure

A Trie is a tree that is used to store strings. It does this by storing strings as paths within the tree, which can be traversed to see if a string exists or not!

To define a trie, we first need to specify an alphabet the trie uses, which is a fixed set of characters.

For our purposes, we will use ASCII, which has 128 entries. Other alphabets, like UNICODE, have far more possible characters.

Then, every node in the trie will have the following:

  • Children Buffer: A buffer of length equal to the size of the alphabet (), where every index in this buffer contains a child if a string with that character, at position equal to the node’s depth, has been stored.
  • Last Character Flag: A flag indicating if the node is the last character of some string.

Tries have a wide variety of uses. Some of them are given as follows:

  1. Key-Value Stores: We can replace node flags with values, to turn our trie into a key-value store! This is especially feasible due to the efficiency of insertion, search, and deletion.
  2. Prefix Search: As tries group strings by prefix, we can perform a wide variety of string prefix operations with them!
  3. Globbing: We can use glob patterns (ex. file*) to match a variety of strings in our trie!

Operations

Searching for Strings

Given the Trie struction, we can use the presentation of nodes to determine if strings exist! To do this, we check to see there is a path matching the string, ending in a flagged node.

The general algorithm is as follows:

  1. Start from the root node. Let index .
  2. For every node , use the string’s character to move to the child corresponding to the character value (as given by the buffer indices). Increment , repeat until no child can be found, or there exist no more characters in the string.
  3. If the final node was flagged as being the last character of a string, then the string exists!

Note from the above example how tries let us combine strings with common prefixes together, in a layout that is fairly memory efficient!

Given a search on a tree with keys, the worst-cast complexity for any string is given as:

  • For all possible strings, the worst-case is the length of the longest key stored.
  • For any single string , the worst-case is the length of .

This means that search complexity is completely independent of the trie size !

Insertion into Tries

Inserting into Tries is very similar to searching! We wish to form a path for the string, and make sure the end of the path is correctly flagged.

The general algorithm is as follows:

  1. Start at the root node. Let index .
  2. For every node , use the string’s character to find the child corresponding to the character value. If the child doesn’t exist, create one.
  3. Increment , and repeat (2) until there are no more characters left in the string.
  4. Finally, flag the node so that it terminates a string.

Note that by this algorithm, if we insert a prefix of an already existing string, we don’t use any extra memory! This can make Tries fairly memory-efficient.

Deletion from Tries

Because searching relies that the path ends with a flagged node, to delete a string from a trie, we simply need to unflag the last node of the string’s path!

The general algorithm is as follows:

  1. Start at the root node. Let index .
  2. For every node , use the string’s character to find the child corresponding to the character value. Increment , and repeat until there are no more characters left in the string, or the child doesn’t exist.
  3. If we have used all characters in the string, unflag the last node.

Note that while deleting, we don’t actually remove nodes - and in fact, it’s perfectly reasonable to never delete nodes, as we will likely use some to most of the prefixes later.

Patricia Tries

Motivation: Trie Compression

Given a trie, there can oftentimes be intermediate nodes, non-terminal nodes with only one child. These nodes waste a lot of space!

To avoid this, it may be feasible in these cases to merge these intermediate nodes to avoid this wasted space! For example, we could merge the following trie as follows:

graph TD
    subgraph Compressed
    7[root] -. ab .-> 8[0];
    7 -. bat .-> 9[0];
    end
    
    subgraph Uncompressed
    1[root] -. a .-> 2[ ];
    2 -. b .-> 3[0];
    1 -. b .-> 4[ ] -. a .-> 5[ ] -. t .-> 6[0];
    end

This is a lot more memory efficient, and is the motivation behind Patricia Tries!

Structure

Patricia Tries (Practical Algorithm to Retrieve Information Coded in Alphanumeric) implement the above compression idea. It does this by by defining different node types within the tree.

Every node in a Pactricia Tree has the following:

  • Children Buffer: A buffer containing pointers to the node’s children.
  • Key Value: The value of the key being stored within the node.
  • Terminating Flag: A flag indicating if the node ends a string.

Binary Patricia Tries

It is commonly the case that we restrict our alphabet to and in binary. In this case, we have a Binary Patricia Trie!

Operations

Patricia Trie Searching

Searching in a Patricia Trie is a bit more complex than your typical tree, as you need to account for nodes that store keys and nodes that are splitters.

The general algorithm is as follows. Suppose we have a string :

  1. Start from the root.
  2. For any given node , check the key value of the node.
    • If the key value is a prefix of the string, remove it from the beginning of .
    • Otherwise, string not found.
  3. Using the children buffer, find the node’s child at index equal to the value of the first character of the updated string . Repeat step 2.
    • If there does not exist a child, the string does not exist.
  4. If the string is empty after removing the key value, and the node is flagged, the string exists!

Patricia Trie Insertion

Inserting in a Patricia Trie is more complicated than a traditional trie, as we now need to account for merged nodes.

The general algorithm is as follows. Suppose we want to insert into our tree. Then,

  1. Search for into the tree. Continuously repeat (2) until our search succeeds.
  2. If the search fails, then resolve by doing one of the following, and move to step 3.
    • Case 1 (Null): If the next pointer is null, then allocate a new node for the remainder of . Move to step 3 on this new node.

    • Case 2 (Simple Split): If our new key is a prefix of the last node hit, split the node’s key value from this prefix, and insert the prefix as a new parent of the node. Move to step 3 on this parent.

    • Case 3 (Complex Split): If our new key shares a common prefix with the last node we hit, split both the key and the node’s key value from this common prefix.

      Then, create a new parent node with this common prefix, and add the original node and a new node with the key’s remaining value as children. Move to step 3 on this new node.

  3. When the search succeeds, then flag the node as terminating a string. Terminate.

See the below example, with a Binary Patricia Trie for simplicity.

Patricia Trie Deletion

Deletion from a Patricia Trie is more complicated than a traditional trie.

The general algorithm is as follows. Suppose we’re deleting a string from the tree.

  1. Search for into the tree. If we fail, there’s nothing to delete!
  2. If we suceed, we perform one of the 3 cases:
    • Case 1: If the node has more than one child, unflag the node.
    • Case 2: If the node only has one child, merge it with the child.
    • Case 3: If the node has no children, delete it by removing its reference in the parent.
  3. Then, check the parent. If the parent is unflagged and only has one child, then merge the parent with this child.