Class StemmerDictionaryParser

java.lang.Object
org.egothor.stemmer.StemmerDictionaryParser

public final class StemmerDictionaryParser extends Object
Parser of line-oriented stemmer dictionary files.

Each non-empty logical line consists of a stem followed by zero or more known word variants separated by whitespace. The first token is interpreted as the canonical stem, and every following token on the same line is interpreted as a variant belonging to that stem.

Input lines are normalized to lower case using Locale.ROOT. Leading and trailing whitespace is ignored.

The parser supports line remarks and trailing remarks. The remark markers # and // terminate the logical content of the line, and the remainder of that line is ignored.

This class is intentionally stateless and allocation-light so it can be used both by runtime loading and by offline compilation tooling.