Package org.egothor.stemmer
Class StemmerPatchTrieLoader
java.lang.Object
org.egothor.stemmer.StemmerPatchTrieLoader
Loader of patch-command tries from bundled stemmer dictionaries.
Each dictionary is line-oriented. The first token on a line is interpreted as the stem, and all following tokens are treated as known variants of that stem.
For each line, the loader inserts:
- the stem itself mapped to the canonical no-op patch command
PatchCommandEncoder.NOOP_PATCH, when requested by the caller - every distinct variant mapped to the patch command transforming that variant to the stem
Parsing is delegated to StemmerDictionaryParser, which also supports
line remarks introduced by # or //.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enumSupported bundled stemmer dictionaries. -
Method Summary
Modifier and TypeMethodDescriptionstatic FrequencyTrie<String> load(String fileName, boolean storeOriginal, ReductionMode reductionMode) Loads a dictionary from a filesystem path string using default settings for the supplied reduction mode.static FrequencyTrie<String> load(String fileName, boolean storeOriginal, ReductionSettings reductionSettings) Loads a dictionary from a filesystem path string using explicit reduction settings.static FrequencyTrie<String> load(Path path, boolean storeOriginal, ReductionMode reductionMode) Loads a dictionary from a filesystem path using default settings for the supplied reduction mode.static FrequencyTrie<String> load(Path path, boolean storeOriginal, ReductionSettings reductionSettings) Loads a dictionary from a filesystem path using explicit reduction settings.static FrequencyTrie<String> load(StemmerPatchTrieLoader.Language language, boolean storeOriginal, ReductionMode reductionMode) Loads a bundled dictionary using default settings for the supplied reduction mode.static FrequencyTrie<String> load(StemmerPatchTrieLoader.Language language, boolean storeOriginal, ReductionSettings reductionSettings) Loads a bundled dictionary using explicit reduction settings.static FrequencyTrie<String> loadBinary(InputStream inputStream) Loads a GZip-compressed binary patch-command trie from an input stream.static FrequencyTrie<String> loadBinary(String fileName) Loads a GZip-compressed binary patch-command trie from a filesystem path string.static FrequencyTrie<String> loadBinary(Path path) Loads a GZip-compressed binary patch-command trie from a filesystem path.static voidsaveBinary(FrequencyTrie<String> trie, String fileName) Saves a compiled patch-command trie as a GZip-compressed binary file.static voidsaveBinary(FrequencyTrie<String> trie, Path path) Saves a compiled patch-command trie as a GZip-compressed binary file.
-
Method Details
-
load
public static FrequencyTrie<String> load(StemmerPatchTrieLoader.Language language, boolean storeOriginal, ReductionSettings reductionSettings) throws IOException Loads a bundled dictionary using explicit reduction settings.- Parameters:
language- bundled language dictionarystoreOriginal- whether the stem itself should be inserted using the canonical no-op patch commandreductionSettings- reduction settings- Returns:
- compiled patch-command trie
- Throws:
NullPointerException- if any argument isnullIOException- if the dictionary cannot be found or read
-
load
public static FrequencyTrie<String> load(StemmerPatchTrieLoader.Language language, boolean storeOriginal, ReductionMode reductionMode) throws IOException Loads a bundled dictionary using default settings for the supplied reduction mode.- Parameters:
language- bundled language dictionarystoreOriginal- whether the stem itself should be inserted using the canonical no-op patch commandreductionMode- reduction mode- Returns:
- compiled patch-command trie
- Throws:
NullPointerException- if any argument isnullIOException- if the dictionary cannot be found or read
-
load
public static FrequencyTrie<String> load(Path path, boolean storeOriginal, ReductionSettings reductionSettings) throws IOException Loads a dictionary from a filesystem path using explicit reduction settings.- Parameters:
path- path to the dictionary filestoreOriginal- whether the stem itself should be inserted using the canonical no-op patch commandreductionSettings- reduction settings- Returns:
- compiled patch-command trie
- Throws:
NullPointerException- if any argument isnullIOException- if the file cannot be opened or read
-
load
public static FrequencyTrie<String> load(Path path, boolean storeOriginal, ReductionMode reductionMode) throws IOException Loads a dictionary from a filesystem path using default settings for the supplied reduction mode.- Parameters:
path- path to the dictionary filestoreOriginal- whether the stem itself should be inserted using the canonical no-op patch commandreductionMode- reduction mode- Returns:
- compiled patch-command trie
- Throws:
NullPointerException- if any argument isnullIOException- if the file cannot be opened or read
-
load
public static FrequencyTrie<String> load(String fileName, boolean storeOriginal, ReductionSettings reductionSettings) throws IOException Loads a dictionary from a filesystem path string using explicit reduction settings.- Parameters:
fileName- file name or path stringstoreOriginal- whether the stem itself should be inserted using the canonical no-op patch commandreductionSettings- reduction settings- Returns:
- compiled patch-command trie
- Throws:
NullPointerException- if any argument isnullIOException- if the file cannot be opened or read
-
load
public static FrequencyTrie<String> load(String fileName, boolean storeOriginal, ReductionMode reductionMode) throws IOException Loads a dictionary from a filesystem path string using default settings for the supplied reduction mode.- Parameters:
fileName- file name or path stringstoreOriginal- whether the stem itself should be inserted using the canonical no-op patch commandreductionMode- reduction mode- Returns:
- compiled patch-command trie
- Throws:
NullPointerException- if any argument isnullIOException- if the file cannot be opened or read
-
loadBinary
Loads a GZip-compressed binary patch-command trie from a filesystem path.- Parameters:
path- path to the compressed binary trie file- Returns:
- compiled patch-command trie
- Throws:
NullPointerException- ifpathisnullIOException- if the file cannot be opened, decompressed, or read
-
loadBinary
Loads a GZip-compressed binary patch-command trie from a filesystem path string.- Parameters:
fileName- file name or path string- Returns:
- compiled patch-command trie
- Throws:
NullPointerException- iffileNameisnullIOException- if the file cannot be opened, decompressed, or read
-
loadBinary
Loads a GZip-compressed binary patch-command trie from an input stream.- Parameters:
inputStream- source input stream- Returns:
- compiled patch-command trie
- Throws:
NullPointerException- ifinputStreamisnullIOException- if the stream cannot be decompressed or read
-
saveBinary
Saves a compiled patch-command trie as a GZip-compressed binary file.- Parameters:
trie- compiled triepath- target file- Throws:
NullPointerException- if any argument isnullIOException- if writing fails
-
saveBinary
Saves a compiled patch-command trie as a GZip-compressed binary file.- Parameters:
trie- compiled triefileName- target file name or path string- Throws:
NullPointerException- if any argument isnullIOException- if writing fails
-