Reduction Semantics

This document explains how Radixor decides that two subtrees are equivalent, how the different reduction modes work, and how those choices affect observable runtime behavior.

Why reduction exists

Without reduction, the trie would still work, but many subtrees that mean the same thing would remain duplicated. The result would be a much larger runtime artifact than necessary.

Reduction solves that by merging semantically equivalent subtrees into one canonical representative.

The key idea is simple:

if two subtrees behave the same way under the semantic contract chosen for compilation, only one physical copy is needed.

Reduction is semantic, not merely structural

Radixor does not reduce nodes merely because they look similar locally. It reduces subtrees only when their meaning matches according to the selected mode.

That is why reduction is based on a signature that captures both:

the local semantics of the current node,
the structure and semantics of all descendant edges.

Conceptually:

Signature = (LocalDescriptor, SortedChildDescriptors)

Two subtrees are merged only if their signatures are equal.

Local descriptors

The local descriptor defines what “equivalent” means for the values stored at one node.

Radixor supports three semantic views.

Ranked descriptor

The ranked descriptor preserves the full ordered result semantics of getAll().

That means:

candidate membership is preserved,
local ordering is preserved,
observable ranked multi-result behavior remains stable.

This is the most semantically faithful mode when ambiguity handling matters.

Unordered descriptor

The unordered descriptor preserves the set of reachable results, but not their local ordering.

That means:

candidate membership is preserved,
ordering differences may be ignored,
more subtrees can be merged than in ranked mode.

This mode is useful when alternative candidates matter but exact ranking does not.

Dominant descriptor

The dominant descriptor focuses on the preferred result returned by get().

This mode is used only when the dominant local candidate is strong enough according to configured thresholds:

minimum winner percentage,
winner-over-second ratio.

If that local dominance is not strong enough, Radixor does not force dominant semantics anyway. It falls back to ranked semantics for that node to avoid unsafe over-reduction.

That fallback is one of the most important safeguards in the design.

Child descriptors

A subtree is not defined only by the values stored at the current node. It is also defined by what behavior is reachable through its children.

Each child contributes:

(edge character, child signature)

Children are sorted by edge character so that signatures remain deterministic and stable.

This matters because reduction must not depend on incidental map iteration order or other non-semantic implementation details.

Canonicalization

Once a subtree signature is computed, the reduction process checks whether an equivalent canonical subtree already exists.

If yes, the existing reduced node is reused.

If no, a new canonical reduced node is created and registered.

This turns reduction into a canonicalization process:

compute semantic identity,
find canonical representative,
reuse or create,
continue bottom-up.

That is how Radixor eliminates duplicated equivalent subtrees.

Count aggregation and compiled state

When multiple original build-time subtrees collapse into one canonical reduced node, local counts may be aggregated.

This is an important point for understanding compiled artifacts.

A compiled trie is not always a verbatim replay of original insertion history. It is a canonical runtime structure that preserves the semantics guaranteed by the chosen reduction mode.

This explains two things:

why compiled artifacts can become dramatically smaller,
why reconstructing a builder from a compiled trie reflects the compiled state rather than the full original unreduced history.

Reduction modes

`MERGE_SUBTREES_WITH_EQUIVALENT_RANKED_GET_ALL_RESULTS`

This mode merges subtrees only when their getAll() results are equivalent for every reachable key suffix and when local ordering is preserved.

Use this mode when:

ambiguity handling matters,
getAll() ordering should remain meaningful,
behavioral fidelity is more important than maximum compression.

This is the safest and most generally recommended mode.

`MERGE_SUBTREES_WITH_EQUIVALENT_UNORDERED_GET_ALL_RESULTS`

This mode also preserves getAll()-level membership equivalence for every reachable key suffix, but it ignores local ordering differences.

Use this mode when:

alternative candidates still matter,
exact ordering is less important,
stronger reduction is acceptable.

This mode is more aggressive than ranked mode, but less semantically rich.

`MERGE_SUBTREES_WITH_EQUIVALENT_DOMINANT_GET_RESULTS`

This mode focuses on preserving dominant get() semantics for every reachable key suffix, subject to dominance thresholds.

Use this mode when:

the main operational concern is the preferred result,
richer alternative-result behavior is less important,
stronger reduction is desirable.

Because non-dominant nodes fall back to ranked semantics, this mode is not simply “discard everything except the winner”. It is a controlled reduction strategy with a built-in safety condition.

Practical effect on runtime behavior

Reduction mode is not just a storage optimization setting. It affects what distinctions remain visible after compilation.

When ranked mode is used

You can rely on full ranked getAll() semantics being preserved.

When unordered mode is used

You can rely on candidate membership, but not necessarily on preserving the same local ranking distinctions.

When dominant mode is used

You optimize primarily for preferred-result semantics. Alternative-result behavior may still exist, but it is no longer the primary semantic contract of the reduction.

Choosing a mode

A practical rule of thumb is:

choose ranked if you are unsure,
choose unordered if alternative membership matters but ranking does not,
choose dominant only when your application is fundamentally driven by get() and you understand the trade-off.

Why this design works well

The reduction model succeeds because it does not confuse “smaller” with “acceptable”.

Instead, it makes the semantic contract explicit:

what exactly must be preserved,
what differences may be ignored,
when a more aggressive mode is safe,
when the system must fall back to a stricter interpretation.

That explicitness is what makes the compression trustworthy.

Mental model to keep

If you want one concise mental model for reduction, use this one:

build-time insertion collects examples,
reduction asks which subtrees mean the same thing,
the answer depends on the chosen semantic contract,
canonical representatives are shared,
the compiled trie preserves the behavior promised by that contract.

Reduction Semantics

Why reduction exists

Reduction is semantic, not merely structural

Local descriptors

Ranked descriptor

Unordered descriptor

Dominant descriptor

Child descriptors

Canonicalization

Count aggregation and compiled state

Reduction modes

MERGE_SUBTREES_WITH_EQUIVALENT_RANKED_GET_ALL_RESULTS

MERGE_SUBTREES_WITH_EQUIVALENT_UNORDERED_GET_ALL_RESULTS

MERGE_SUBTREES_WITH_EQUIVALENT_DOMINANT_GET_RESULTS

Practical effect on runtime behavior

When ranked mode is used

When unordered mode is used

When dominant mode is used

Choosing a mode

Why this design works well

Mental model to keep

Continue with

`MERGE_SUBTREES_WITH_EQUIVALENT_RANKED_GET_ALL_RESULTS`

`MERGE_SUBTREES_WITH_EQUIVALENT_UNORDERED_GET_ALL_RESULTS`

`MERGE_SUBTREES_WITH_EQUIVALENT_DOMINANT_GET_RESULTS`