Class FrequencyTrie<V>

java.lang.Object
org.egothor.stemmer.FrequencyTrie<V>
Type Parameters:
V - value type

public final class FrequencyTrie<V> extends Object
Read-only trie mapping String keys to one or more values with frequency tracking.

A key may be associated with multiple values. Each value keeps the number of times it was inserted during the build phase. The method get(String) returns the locally most frequent value stored at the terminal node of the supplied key, while getAll(String) returns all locally stored values ordered by descending frequency.

If multiple values have the same local frequency, their ordering is deterministic. The preferred value is selected by the following tie-breaking rules, in order:

  1. shorter String representation wins, based on value.toString()
  2. if the lengths are equal, lexicographically lower String representation wins
  3. if the textual representations are still equal, first-seen insertion order remains stable

Values may be stored at any trie node, including internal nodes and leaf nodes. Therefore, reduction and canonicalization always operate on both the node-local terminal values and the structure of all descendant edges.

  • Method Details

    • get

      public V get(String key)
      Returns the most frequent value stored at the node addressed by the supplied key.

      If multiple values have the same local frequency, the returned value is selected deterministically by shorter toString() value first, then by lexicographically lower toString(), and finally by stable first-seen order.

      Parameters:
      key - key to resolve
      Returns:
      most frequent value, or null if the key does not exist or no value is stored at the addressed node
      Throws:
      NullPointerException - if key is null
    • getAll

      public V[] getAll(String key)
      Returns all values stored at the node addressed by the supplied key, ordered by descending frequency.

      If multiple values have the same local frequency, the ordering is deterministic by shorter toString() value first, then by lexicographically lower toString(), and finally by stable first-seen order.

      The returned array is a defensive copy.

      Parameters:
      key - key to resolve
      Returns:
      all values stored at the addressed node, ordered by descending frequency; returns an empty array if the key does not exist or no value is stored at the addressed node
      Throws:
      NullPointerException - if key is null
    • getEntries

      public List<ValueCount<V>> getEntries(String key)
      Returns all values stored at the node addressed by the supplied key together with their occurrence counts, ordered by the same rules as getAll(String).

      The returned list is aligned with the arrays returned by getAll(String) and the internal compiled count representation.

      The returned list is immutable.

      In reduction modes that merge semantically equivalent subtrees, the returned counts may be aggregated across multiple original build-time nodes that were reduced into the same canonical compiled node.

      Parameters:
      key - key to resolve
      Returns:
      immutable ordered list of value-count entries; returns an empty list if the key does not exist or no value is stored at the addressed node
      Throws:
      NullPointerException - if key is null
    • writeTo

      public void writeTo(OutputStream outputStream, FrequencyTrie.ValueStreamCodec<V> valueCodec) throws IOException
      Writes this compiled trie to the supplied output stream.

      The binary format is versioned and preserves canonical shared compiled nodes, therefore the serialized representation remains compact even for tries reduced by subtree merging.

      The supplied codec is responsible for persisting individual values of type V.

      Parameters:
      outputStream - target output stream
      valueCodec - codec used to write values
      Throws:
      NullPointerException - if any argument is null
      IOException - if writing fails
    • readFrom

      public static <V> FrequencyTrie<V> readFrom(InputStream inputStream, IntFunction<V[]> arrayFactory, FrequencyTrie.ValueStreamCodec<V> valueCodec) throws IOException
      Reads a compiled trie from the supplied input stream.

      The caller must provide the same value codec semantics that were used during persistence as well as the array factory required for typed result arrays.

      Type Parameters:
      V - value type
      Parameters:
      inputStream - source input stream
      arrayFactory - factory used to create typed arrays
      valueCodec - codec used to read values
      Returns:
      deserialized compiled trie
      Throws:
      NullPointerException - if any argument is null
      IOException - if reading fails or the binary format is invalid
    • size

      public int size()
      Returns the number of canonical compiled nodes reachable from the root.

      The returned value reflects the size of the final reduced immutable trie, not the number of mutable build-time nodes inserted before reduction. Shared canonical subtrees are counted only once.

      Returns:
      number of canonical compiled nodes in this trie