Package org.egothor.stemmer.trie
FrequencyTrie compilation, reduction,
canonicalization, and binary reconstruction.
This subpackage contains the implementation-level data structures that support transformation of mutable build-time trie content into a compact immutable compiled representation. The types in this package are primarily intended for cooperation within the stemming implementation and are not designed as a general-purpose public extension surface.
Trie construction begins with mutable nodes represented by
MutableNode, which store child transitions
and local terminal value frequencies in insertion-preserving maps. Local node
value distributions are analyzed through
LocalValueSummary, which derives the
deterministically ordered local values, aligned counts, total local
frequency, and dominant-value metadata required by reduction logic.
Deterministic local ordering is supported by
SortableValue.
Subtree reduction is driven by
ReductionSignature, which captures the
semantic identity of a full subtree under the active reduction strategy.
Depending on the selected reduction settings, local subtree semantics are
represented by ranked, unordered, or dominant-value descriptors via
RankedLocalDescriptor,
UnorderedLocalDescriptor, and
DominantLocalDescriptor. Child structure is
incorporated into the signature through
ChildDescriptor, ensuring that canonical
equivalence covers both local node content and all reachable descendants.
Canonicalization of semantically equivalent subtrees is coordinated by
ReductionContext, which maintains the
signature-to-node mapping for canonical reduced nodes. Canonical merged
subtrees are represented by ReducedNode,
whose aggregated local counts and canonical child references serve as the
intermediate form between mutable construction and immutable freezing.
The final read-optimized structure is represented by
CompiledNode. Compiled nodes expose compact
aligned arrays of sorted edge labels, child references, ordered values, and
ordered counts for efficient lookup and serialization. During binary
deserialization, unresolved intermediate payload is carried in
NodeData until canonical node references are
re-linked into the final compiled form.
Several accessors in this subpackage intentionally expose internal mutable or array-backed state directly in order to avoid unnecessary copying on performance-sensitive internal paths. Such APIs are intended strictly for tightly related trie infrastructure within the implementation and must be treated as internal-use contracts.
In summary, this subpackage contains the internal semantic model and storage forms that allow the stemming implementation to move efficiently between build-time mutation, reduction-time canonical equivalence, and runtime immutable lookup.
-
ClassDescriptionCompiledNode<V>Immutable compiled trie node optimized for read access.Local terminal value summary of a node.MutableNode<V>Mutable build-time node.NodeData<V>Intermediate node data used during deserialization before child references are resolved.ReducedNode<V>Canonical reduced node used during subtree merging.Reduction context used while canonicalizing mutable nodes.Immutable reduction signature of a full subtree.