Documentation

HybridLemmatizer
in package
implements LemmatizerInterface

Hybrid lemmatizer that combines dictionary and NLP-based approaches.

Strategy:

  1. Try dictionary lookup first (fast, predictable)
  2. Fall back to NLP service (spaCy) for unknown words

This provides the best of both worlds:

  • Fast results for common words via dictionary
  • High accuracy for uncommon words via NLP models

Table of Contents

Interfaces

LemmatizerInterface
Interface for lemmatization strategies.

Properties

$dictionaryLemmatizer  : DictionaryLemmatizer
$nlpLemmatizer  : NlpServiceLemmatizer

Methods

__construct()  : mixed
Create a hybrid lemmatizer.
getSupportedLanguages()  : array<string|int, string>
Get the list of supported language codes.
hasDictionarySupport()  : bool
Check if the dictionary supports a language.
hasNlpSupport()  : bool
Check if the NLP service supports a language.
lemmatize()  : string|null
Find the lemma (base form) of a word.
lemmatizeBatch()  : array<string, string|null>
Lemmatize multiple words in batch.
supportsLanguage()  : bool
Check if this lemmatizer supports a given language.

Properties

Methods

getSupportedLanguages()

Get the list of supported language codes.

public getSupportedLanguages() : array<string|int, string>
Return values
array<string|int, string>

Array of ISO language codes

hasDictionarySupport()

Check if the dictionary supports a language.

public hasDictionarySupport(string $languageCode) : bool
Parameters
$languageCode : string

Language code

Return values
bool

hasNlpSupport()

Check if the NLP service supports a language.

public hasNlpSupport(string $languageCode) : bool
Parameters
$languageCode : string

Language code

Return values
bool

lemmatize()

Find the lemma (base form) of a word.

public lemmatize(string $wordForm, string $languageCode) : string|null
Parameters
$wordForm : string
$languageCode : string

ISO language code (e.g., 'en', 'de', 'fr')

Return values
string|null

The lemma, or null if not found

lemmatizeBatch()

Lemmatize multiple words in batch.

public lemmatizeBatch(array<string|int, mixed> $wordForms, string $languageCode) : array<string, string|null>
Parameters
$wordForms : array<string|int, mixed>
$languageCode : string

ISO language code

Return values
array<string, string|null>

Word => lemma mapping

supportsLanguage()

Check if this lemmatizer supports a given language.

public supportsLanguage(string $languageCode) : bool
Parameters
$languageCode : string

ISO language code

Return values
bool

True if the language is supported


        
On this page

Search results