HybridLemmatizer
in package
implements
LemmatizerInterface
Hybrid lemmatizer that combines dictionary and NLP-based approaches.
Strategy:
- Try dictionary lookup first (fast, predictable)
- Fall back to NLP service (spaCy) for unknown words
This provides the best of both worlds:
- Fast results for common words via dictionary
- High accuracy for uncommon words via NLP models
Table of Contents
Interfaces
- LemmatizerInterface
- Interface for lemmatization strategies.
Properties
Methods
- __construct() : mixed
- Create a hybrid lemmatizer.
- getSupportedLanguages() : array<string|int, string>
- Get the list of supported language codes.
- hasDictionarySupport() : bool
- Check if the dictionary supports a language.
- hasNlpSupport() : bool
- Check if the NLP service supports a language.
- lemmatize() : string|null
- Find the lemma (base form) of a word.
- lemmatizeBatch() : array<string, string|null>
- Lemmatize multiple words in batch.
- supportsLanguage() : bool
- Check if this lemmatizer supports a given language.
Properties
$dictionaryLemmatizer
private
DictionaryLemmatizer
$dictionaryLemmatizer
$nlpLemmatizer
private
NlpServiceLemmatizer
$nlpLemmatizer
Methods
__construct()
Create a hybrid lemmatizer.
public
__construct(DictionaryLemmatizer $dictionaryLemmatizer, NlpServiceLemmatizer $nlpLemmatizer) : mixed
Parameters
- $dictionaryLemmatizer : DictionaryLemmatizer
-
Primary (fast) lemmatizer
- $nlpLemmatizer : NlpServiceLemmatizer
-
Fallback (accurate) lemmatizer
getSupportedLanguages()
Get the list of supported language codes.
public
getSupportedLanguages() : array<string|int, string>
Return values
array<string|int, string> —Array of ISO language codes
hasDictionarySupport()
Check if the dictionary supports a language.
public
hasDictionarySupport(string $languageCode) : bool
Parameters
- $languageCode : string
-
Language code
Return values
boolhasNlpSupport()
Check if the NLP service supports a language.
public
hasNlpSupport(string $languageCode) : bool
Parameters
- $languageCode : string
-
Language code
Return values
boollemmatize()
Find the lemma (base form) of a word.
public
lemmatize(string $wordForm, string $languageCode) : string|null
Parameters
- $wordForm : string
- $languageCode : string
-
ISO language code (e.g., 'en', 'de', 'fr')
Return values
string|null —The lemma, or null if not found
lemmatizeBatch()
Lemmatize multiple words in batch.
public
lemmatizeBatch(array<string|int, mixed> $wordForms, string $languageCode) : array<string, string|null>
Parameters
- $wordForms : array<string|int, mixed>
- $languageCode : string
-
ISO language code
Return values
array<string, string|null> —Word => lemma mapping
supportsLanguage()
Check if this lemmatizer supports a given language.
public
supportsLanguage(string $languageCode) : bool
Parameters
- $languageCode : string
-
ISO language code
Return values
bool —True if the language is supported