Class StemmerPatchTrieLoader
Each dictionary is line-oriented and uses a tab-separated values layout. The first column on a line is interpreted as the stem, and all following tab-separated columns are treated as known variants of that stem.
For each line, the loader inserts:
- the stem itself mapped to the canonical no-op patch command
PatchCommandEncoder.NOOP_PATCH, when requested by the caller - every distinct variant mapped to the patch command transforming that variant to the stem using the traversal direction implied by the selected language or loader overload
Parsing is delegated to StemmerDictionaryParser, which also supports
line remarks introduced by # or // and ignores dictionary
items containing Unicode whitespace characters while reporting them through
aggregated warning log records.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enumSupported bundled stemmer dictionaries. -
Method Summary
Modifier and TypeMethodDescriptionstatic FrequencyTrie<String> load(String fileName, boolean storeOriginal, ReductionMode reductionMode) Loads a dictionary from a filesystem path string using default settings for the supplied reduction mode.static FrequencyTrie<String> load(String fileName, boolean storeOriginal, ReductionSettings reductionSettings) Loads a dictionary from a filesystem path string using explicit reduction settings.static FrequencyTrie<String> load(String fileName, boolean storeOriginal, ReductionSettings reductionSettings, WordTraversalDirection traversalDirection) Loads a dictionary from a filesystem path string using explicit reduction settings and explicit traversal direction.static FrequencyTrie<String> load(String fileName, boolean storeOriginal, ReductionSettings reductionSettings, WordTraversalDirection traversalDirection, CaseProcessingMode caseProcessingMode) Loads a dictionary from a filesystem path string using explicit reduction settings, explicit traversal direction, and explicit case processing mode.static FrequencyTrie<String> load(String fileName, boolean storeOriginal, ReductionSettings reductionSettings, WordTraversalDirection traversalDirection, CaseProcessingMode caseProcessingMode, DiacriticProcessingMode diacriticProcessingMode) Loads a dictionary from a filesystem path string using explicit reduction settings, explicit traversal direction, explicit case processing mode, and explicit diacritic processing mode.static FrequencyTrie<String> load(String fileName, boolean storeOriginal, TrieMetadata metadata) Loads a dictionary from a filesystem path string using explicit trie compilation metadata.static FrequencyTrie<String> load(Path path, boolean storeOriginal, ReductionMode reductionMode) Loads a dictionary from a filesystem path using default settings for the supplied reduction mode.static FrequencyTrie<String> load(Path path, boolean storeOriginal, ReductionSettings reductionSettings) Loads a dictionary from a filesystem path using explicit reduction settings.static FrequencyTrie<String> load(Path path, boolean storeOriginal, ReductionSettings reductionSettings, WordTraversalDirection traversalDirection) Loads a dictionary from a filesystem path using explicit reduction settings and explicit traversal direction.static FrequencyTrie<String> load(Path path, boolean storeOriginal, ReductionSettings reductionSettings, WordTraversalDirection traversalDirection, CaseProcessingMode caseProcessingMode) Loads a dictionary from a filesystem path using explicit reduction settings, explicit traversal direction, and explicit case processing mode.static FrequencyTrie<String> load(Path path, boolean storeOriginal, ReductionSettings reductionSettings, WordTraversalDirection traversalDirection, CaseProcessingMode caseProcessingMode, DiacriticProcessingMode diacriticProcessingMode) Loads a dictionary from a filesystem path using explicit reduction settings, traversal direction, case processing mode, and diacritic processing mode.static FrequencyTrie<String> load(Path path, boolean storeOriginal, TrieMetadata metadata) Loads a dictionary from a filesystem path using explicit trie compilation metadata.static FrequencyTrie<String> load(StemmerPatchTrieLoader.Language language, boolean storeOriginal, ReductionMode reductionMode) Loads a bundled dictionary using default settings for the supplied reduction mode.static FrequencyTrie<String> load(StemmerPatchTrieLoader.Language language, boolean storeOriginal, ReductionSettings reductionSettings) Loads a bundled dictionary using explicit reduction settings.static FrequencyTrie<String> load(StemmerPatchTrieLoader.Language language, boolean storeOriginal, TrieMetadata metadata) Loads a bundled dictionary using explicit trie compilation metadata.static FrequencyTrie<String> loadBinary(InputStream inputStream) Loads a GZip-compressed binary patch-command trie from an input stream.static FrequencyTrie<String> loadBinary(String fileName) Loads a GZip-compressed binary patch-command trie from a filesystem path string.static FrequencyTrie<String> loadBinary(Path path) Loads a GZip-compressed binary patch-command trie from a filesystem path.static TrieMetadataloadBinaryMetadata(InputStream inputStream) Loads only persisted metadata from a GZip-compressed binary patch-command trie stream.static TrieMetadataloadBinaryMetadata(String fileName) Loads only persisted metadata from a GZip-compressed binary patch-command trie file.static TrieMetadataloadBinaryMetadata(Path path) Loads only persisted metadata from a GZip-compressed binary patch-command trie file.static voidsaveBinary(FrequencyTrie<String> trie, String fileName) Saves a compiled patch-command trie as a GZip-compressed binary file.static voidsaveBinary(FrequencyTrie<String> trie, Path path) Saves a compiled patch-command trie as a GZip-compressed binary file.
-
Method Details
-
load
public static FrequencyTrie<String> load(StemmerPatchTrieLoader.Language language, boolean storeOriginal, ReductionSettings reductionSettings) throws IOException Loads a bundled dictionary using explicit reduction settings.This overload applies the following implicit compilation defaults in addition to the supplied
reductionSettings:- traversal direction is derived from
StemmerPatchTrieLoader.Language.isRightToLeft()(WordTraversalDirection.FORWARDfor right-to-left languages,WordTraversalDirection.BACKWARDotherwise) - case processing mode is
CaseProcessingMode.LOWERCASE_WITH_LOCALE_ROOT - diacritic processing mode is
DiacriticProcessingMode.AS_IS
The resolved settings are persisted into
TrieMetadataof the resulting trie.- Parameters:
language- bundled language dictionarystoreOriginal- whether the stem itself should be inserted using the canonical no-op patch commandreductionSettings- reduction settings- Returns:
- compiled patch-command trie
- Throws:
NullPointerException- if any argument isnullIOException- if the dictionary cannot be found or read
- traversal direction is derived from
-
load
public static FrequencyTrie<String> load(StemmerPatchTrieLoader.Language language, boolean storeOriginal, TrieMetadata metadata) throws IOException Loads a bundled dictionary using explicit trie compilation metadata.All semantic compilation settings (reduction mode and thresholds, traversal direction, case processing mode, and diacritic processing mode) are taken from the supplied metadata object and are persisted unchanged in the resulting trie.
- Parameters:
language- bundled language dictionarystoreOriginal- whether the stem itself should be inserted using the canonical no-op patch commandmetadata- trie metadata describing the compilation configuration- Returns:
- compiled patch-command trie
- Throws:
NullPointerException- if any argument isnullIOException- if the dictionary cannot be found or read
-
load
public static FrequencyTrie<String> load(StemmerPatchTrieLoader.Language language, boolean storeOriginal, ReductionMode reductionMode) throws IOException Loads a bundled dictionary using default settings for the supplied reduction mode.This overload is equivalent to calling
load(Language, boolean, ReductionSettings)withReductionSettings.withDefaults(ReductionMode)and therefore uses the same implicit defaults for traversal direction, case processing mode, and diacritic processing mode.- Parameters:
language- bundled language dictionarystoreOriginal- whether the stem itself should be inserted using the canonical no-op patch commandreductionMode- reduction mode- Returns:
- compiled patch-command trie
- Throws:
NullPointerException- if any argument isnullIOException- if the dictionary cannot be found or read
-
load
public static FrequencyTrie<String> load(Path path, boolean storeOriginal, ReductionSettings reductionSettings) throws IOException Loads a dictionary from a filesystem path using explicit reduction settings.This overload applies historical Egothor-compatible implicit defaults:
WordTraversalDirection.BACKWARD,CaseProcessingMode.LOWERCASE_WITH_LOCALE_ROOT, andDiacriticProcessingMode.AS_IS. These settings are persisted in resulting trie metadata.- Parameters:
path- path to the dictionary filestoreOriginal- whether the stem itself should be inserted using the canonical no-op patch commandreductionSettings- reduction settings- Returns:
- compiled patch-command trie
- Throws:
NullPointerException- if any argument isnullIOException- if the file cannot be opened or read
-
load
public static FrequencyTrie<String> load(Path path, boolean storeOriginal, ReductionSettings reductionSettings, WordTraversalDirection traversalDirection) throws IOException Loads a dictionary from a filesystem path using explicit reduction settings and explicit traversal direction.Implicit defaults still apply for unspecified dimensions:
CaseProcessingMode.LOWERCASE_WITH_LOCALE_ROOTandDiacriticProcessingMode.AS_IS.- Parameters:
path- path to the dictionary filestoreOriginal- whether the stem itself should be inserted using the canonical no-op patch commandreductionSettings- reduction settingstraversalDirection- traversal direction used for both trie keys and patch commands- Returns:
- compiled patch-command trie
- Throws:
NullPointerException- if any argument isnullIOException- if the file cannot be opened or read
-
load
public static FrequencyTrie<String> load(Path path, boolean storeOriginal, ReductionSettings reductionSettings, WordTraversalDirection traversalDirection, CaseProcessingMode caseProcessingMode) throws IOException Loads a dictionary from a filesystem path using explicit reduction settings, explicit traversal direction, and explicit case processing mode.This overload still defaults diacritic processing to
DiacriticProcessingMode.AS_IS.- Parameters:
path- path to the dictionary filestoreOriginal- whether the stem itself should be inserted using the canonical no-op patch commandreductionSettings- reduction settingstraversalDirection- traversal direction used for both trie keys and patch commandscaseProcessingMode- case processing mode used during dictionary parsing- Returns:
- compiled patch-command trie
- Throws:
NullPointerException- if any argument isnullIOException- if the file cannot be opened or read
-
load
public static FrequencyTrie<String> load(Path path, boolean storeOriginal, ReductionSettings reductionSettings, WordTraversalDirection traversalDirection, CaseProcessingMode caseProcessingMode, DiacriticProcessingMode diacriticProcessingMode) throws IOException Loads a dictionary from a filesystem path using explicit reduction settings, traversal direction, case processing mode, and diacritic processing mode.- Parameters:
path- path to the dictionary filestoreOriginal- whether the stem itself should be inserted using the canonical no-op patch commandreductionSettings- reduction settingstraversalDirection- traversal direction used for both trie keys and patch commandscaseProcessingMode- case processing mode used during dictionary parsingdiacriticProcessingMode- diacritic processing mode used during dictionary parsing- Returns:
- compiled patch-command trie
- Throws:
NullPointerException- if any argument isnullIOException- if the file cannot be opened or read
-
load
public static FrequencyTrie<String> load(Path path, boolean storeOriginal, TrieMetadata metadata) throws IOException Loads a dictionary from a filesystem path using explicit trie compilation metadata.The supplied metadata is the authoritative source of trie compilation semantics. Callers should ensure metadata matches how they expect to query the trie (for example, with or without lowercasing or diacritic stripping).
- Parameters:
path- path to the dictionary filestoreOriginal- whether the stem itself should be inserted using the canonical no-op patch commandmetadata- trie metadata describing the compilation configuration- Returns:
- compiled patch-command trie
- Throws:
NullPointerException- if any argument isnullIOException- if the file cannot be opened or read
-
load
public static FrequencyTrie<String> load(Path path, boolean storeOriginal, ReductionMode reductionMode) throws IOException Loads a dictionary from a filesystem path using default settings for the supplied reduction mode.This overload is equivalent to calling
load(Path, boolean, ReductionSettings)withReductionSettings.withDefaults(ReductionMode)and therefore uses implicit defaults (WordTraversalDirection.BACKWARD,CaseProcessingMode.LOWERCASE_WITH_LOCALE_ROOT,DiacriticProcessingMode.AS_IS).- Parameters:
path- path to the dictionary filestoreOriginal- whether the stem itself should be inserted using the canonical no-op patch commandreductionMode- reduction mode- Returns:
- compiled patch-command trie
- Throws:
NullPointerException- if any argument isnullIOException- if the file cannot be opened or read
-
load
public static FrequencyTrie<String> load(String fileName, boolean storeOriginal, ReductionSettings reductionSettings) throws IOException Loads a dictionary from a filesystem path string using explicit reduction settings.Same semantics as
load(Path, boolean, ReductionSettings)including implicit defaults (WordTraversalDirection.BACKWARD,CaseProcessingMode.LOWERCASE_WITH_LOCALE_ROOT,DiacriticProcessingMode.AS_IS).- Parameters:
fileName- file name or path stringstoreOriginal- whether the stem itself should be inserted using the canonical no-op patch commandreductionSettings- reduction settings- Returns:
- compiled patch-command trie
- Throws:
NullPointerException- if any argument isnullIOException- if the file cannot be opened or read
-
load
public static FrequencyTrie<String> load(String fileName, boolean storeOriginal, ReductionSettings reductionSettings, WordTraversalDirection traversalDirection) throws IOException Loads a dictionary from a filesystem path string using explicit reduction settings and explicit traversal direction.Same semantics as
load(Path, boolean, ReductionSettings, WordTraversalDirection). Implicit defaults remainCaseProcessingMode.LOWERCASE_WITH_LOCALE_ROOTandDiacriticProcessingMode.AS_IS.- Parameters:
fileName- file name or path stringstoreOriginal- whether the stem itself should be inserted using the canonical no-op patch commandreductionSettings- reduction settingstraversalDirection- traversal direction used for both trie keys and patch commands- Returns:
- compiled patch-command trie
- Throws:
NullPointerException- if any argument isnullIOException- if the file cannot be opened or read
-
load
public static FrequencyTrie<String> load(String fileName, boolean storeOriginal, ReductionSettings reductionSettings, WordTraversalDirection traversalDirection, CaseProcessingMode caseProcessingMode) throws IOException Loads a dictionary from a filesystem path string using explicit reduction settings, explicit traversal direction, and explicit case processing mode.Same semantics as
load(Path, boolean, ReductionSettings, WordTraversalDirection, CaseProcessingMode). Implicit default remainsDiacriticProcessingMode.AS_IS.- Parameters:
fileName- file name or path stringstoreOriginal- whether the stem itself should be inserted using the canonical no-op patch commandreductionSettings- reduction settingstraversalDirection- traversal direction used for both trie keys and patch commandscaseProcessingMode- case processing mode used during dictionary parsing- Returns:
- compiled patch-command trie
- Throws:
NullPointerException- if any argument isnullIOException- if the file cannot be opened or read
-
load
public static FrequencyTrie<String> load(String fileName, boolean storeOriginal, ReductionSettings reductionSettings, WordTraversalDirection traversalDirection, CaseProcessingMode caseProcessingMode, DiacriticProcessingMode diacriticProcessingMode) throws IOException Loads a dictionary from a filesystem path string using explicit reduction settings, explicit traversal direction, explicit case processing mode, and explicit diacritic processing mode.- Parameters:
fileName- file name or path stringstoreOriginal- whether the stem itself should be inserted using the canonical no-op patch commandreductionSettings- reduction settingstraversalDirection- traversal direction used for both trie keys and patch commandscaseProcessingMode- case processing mode used during dictionary parsingdiacriticProcessingMode- diacritic processing mode used during dictionary parsing- Returns:
- compiled patch-command trie
- Throws:
NullPointerException- if any argument isnullIOException- if the file cannot be opened or read
-
load
public static FrequencyTrie<String> load(String fileName, boolean storeOriginal, TrieMetadata metadata) throws IOException Loads a dictionary from a filesystem path string using explicit trie compilation metadata.Same semantics as
load(Path, boolean, TrieMetadata).- Parameters:
fileName- file name or path stringstoreOriginal- whether the stem itself should be inserted using the canonical no-op patch commandmetadata- trie metadata describing the compilation configuration- Returns:
- compiled patch-command trie
- Throws:
NullPointerException- if any argument isnullIOException- if the file cannot be opened or read
-
load
public static FrequencyTrie<String> load(String fileName, boolean storeOriginal, ReductionMode reductionMode) throws IOException Loads a dictionary from a filesystem path string using default settings for the supplied reduction mode.Equivalent to
load(Path, boolean, ReductionMode)and therefore uses implicit defaults (WordTraversalDirection.BACKWARD,CaseProcessingMode.LOWERCASE_WITH_LOCALE_ROOT,DiacriticProcessingMode.AS_IS).- Parameters:
fileName- file name or path stringstoreOriginal- whether the stem itself should be inserted using the canonical no-op patch commandreductionMode- reduction mode- Returns:
- compiled patch-command trie
- Throws:
NullPointerException- if any argument isnullIOException- if the file cannot be opened or read
-
loadBinary
Loads a GZip-compressed binary patch-command trie from a filesystem path.- Parameters:
path- path to the compressed binary trie file- Returns:
- compiled patch-command trie
- Throws:
NullPointerException- ifpathisnullIOException- if the file cannot be opened, decompressed, or read
-
loadBinary
Loads a GZip-compressed binary patch-command trie from a filesystem path string.- Parameters:
fileName- file name or path string- Returns:
- compiled patch-command trie
- Throws:
NullPointerException- iffileNameisnullIOException- if the file cannot be opened, decompressed, or read
-
loadBinary
Loads a GZip-compressed binary patch-command trie from an input stream.- Parameters:
inputStream- source input stream- Returns:
- compiled patch-command trie
- Throws:
NullPointerException- ifinputStreamisnullIOException- if the stream cannot be decompressed or read
-
loadBinaryMetadata
Loads only persisted metadata from a GZip-compressed binary patch-command trie file.- Parameters:
path- path to the compressed binary trie file- Returns:
- persisted trie metadata
- Throws:
NullPointerException- ifpathisnullIOException- if the file cannot be opened, decompressed, or read
-
loadBinaryMetadata
Loads only persisted metadata from a GZip-compressed binary patch-command trie file.- Parameters:
fileName- file name or path string- Returns:
- persisted trie metadata
- Throws:
NullPointerException- iffileNameisnullIOException- if the file cannot be opened, decompressed, or read
-
loadBinaryMetadata
Loads only persisted metadata from a GZip-compressed binary patch-command trie stream.- Parameters:
inputStream- source input stream- Returns:
- persisted trie metadata
- Throws:
NullPointerException- ifinputStreamisnullIOException- if the stream cannot be decompressed or read
-
saveBinary
Saves a compiled patch-command trie as a GZip-compressed binary file.- Parameters:
trie- compiled triepath- target file- Throws:
NullPointerException- if any argument isnullIOException- if writing fails
-
saveBinary
Saves a compiled patch-command trie as a GZip-compressed binary file.- Parameters:
trie- compiled triefileName- target file name or path string- Throws:
NullPointerException- if any argument isnullIOException- if writing fails
-