Package org.egothor.stemmer.trie


package org.egothor.stemmer.trie
Provides internal trie infrastructure used by FrequencyTrie compilation, reduction, canonicalization, and binary reconstruction.

This subpackage contains the implementation-level data structures that support transformation of mutable build-time trie content into a compact immutable compiled representation. The types in this package are primarily intended for cooperation within the stemming implementation and are not designed as a general-purpose public extension surface.

Trie construction begins with mutable nodes represented by MutableNode, which store child transitions and local terminal value frequencies in insertion-preserving maps. Local node value distributions are analyzed through LocalValueSummary, which derives the deterministically ordered local values, aligned counts, total local frequency, and dominant-value metadata required by reduction logic. Deterministic local ordering is supported by SortableValue.

Subtree reduction is driven by ReductionSignature, which captures the semantic identity of a full subtree under the active reduction strategy. Depending on the selected reduction settings, local subtree semantics are represented by ranked, unordered, or dominant-value descriptors via RankedLocalDescriptor, UnorderedLocalDescriptor, and DominantLocalDescriptor. Child structure is incorporated into the signature through ChildDescriptor, ensuring that canonical equivalence covers both local node content and all reachable descendants.

Canonicalization of semantically equivalent subtrees is coordinated by ReductionContext, which maintains the signature-to-node mapping for canonical reduced nodes. Canonical merged subtrees are represented by ReducedNode, whose aggregated local counts and canonical child references serve as the intermediate form between mutable construction and immutable freezing.

The final read-optimized structure is represented by CompiledNode. Compiled nodes expose compact aligned arrays of sorted edge labels, child references, ordered values, and ordered counts for efficient lookup and serialization. During binary deserialization, unresolved intermediate payload is carried in NodeData until canonical node references are re-linked into the final compiled form.

Several accessors in this subpackage intentionally expose internal mutable or array-backed state directly in order to avoid unnecessary copying on performance-sensitive internal paths. Such APIs are intended strictly for tightly related trie infrastructure within the implementation and must be treated as internal-use contracts.

In summary, this subpackage contains the internal semantic model and storage forms that allow the stemming implementation to move efficiently between build-time mutation, reduction-time canonical equivalence, and runtime immutable lookup.

  • Class
    Description
    Immutable compiled trie node optimized for read access.
    Local terminal value summary of a node.
    Mutable build-time node.
    Intermediate node data used during deserialization before child references are resolved.
    Canonical reduced node used during subtree merging.
    Reduction context used while canonicalizing mutable nodes.
    Immutable reduction signature of a full subtree.