NlpServiceLemmatizer
in package
implements
LemmatizerInterface
Lemmatizer that uses the NLP microservice (spaCy).
This lemmatizer communicates with the Python NLP microservice for high-accuracy lemmatization using spaCy models. It supports 25+ languages with context-aware lemmatization.
The NLP service must be running and accessible via NLP_SERVICE_URL.
Table of Contents
Interfaces
- LemmatizerInterface
- Interface for lemmatization strategies.
Constants
- SPACY_MODELS = ['en' => 'en_core_web_sm', 'de' => 'de_core_news_sm', 'fr' => 'fr_core_news_sm', 'es' => 'es_core_news_sm', 'pt' => 'pt_core_news_sm', 'it' => 'it_core_news_sm', 'nl' => 'nl_core_news_sm', 'el' => 'el_core_news_sm', 'nb' => 'nb_core_news_sm', 'lt' => 'lt_core_news_sm', 'pl' => 'pl_core_news_sm', 'ro' => 'ro_core_news_sm', 'ru' => 'ru_core_news_sm', 'ca' => 'ca_core_news_sm', 'da' => 'da_core_news_sm', 'fi' => 'fi_core_news_sm', 'hr' => 'hr_core_news_sm', 'ko' => 'ko_core_news_sm', 'mk' => 'mk_core_news_sm', 'sl' => 'sl_core_news_sm', 'sv' => 'sv_core_news_sm', 'uk' => 'uk_core_news_sm', 'zh' => 'zh_core_web_sm', 'ja' => 'ja_core_news_sm']
- Supported spaCy models by language code.
Properties
- $handler : NlpServiceHandler
- $lemmatizer : string
- $supportedLanguages : array<string, bool>|null
Methods
- __construct() : mixed
- Create a new NLP service lemmatizer.
- getAllPotentialLanguages() : array<string|int, string>
- Get all potentially supported languages (including uninstalled models).
- getLemmatizerInfo() : array<string|int, mixed>
- Get detailed info about available lemmatizers.
- getSupportedLanguages() : array<string|int, string>
- Get the list of supported language codes.
- isServiceAvailable() : bool
- Check if the NLP service is available.
- lemmatize() : string|null
- Find the lemma (base form) of a word.
- lemmatizeBatch() : array<string, string|null>
- Lemmatize multiple words in batch.
- supportsLanguage() : bool
- Check if this lemmatizer supports a given language.
- loadSupportedLanguages() : void
- Load supported languages from the NLP service.
- normalizeLanguageCode() : string
- Normalize language code to base form.
Constants
SPACY_MODELS
Supported spaCy models by language code.
private
array<string, string>
SPACY_MODELS
= ['en' => 'en_core_web_sm', 'de' => 'de_core_news_sm', 'fr' => 'fr_core_news_sm', 'es' => 'es_core_news_sm', 'pt' => 'pt_core_news_sm', 'it' => 'it_core_news_sm', 'nl' => 'nl_core_news_sm', 'el' => 'el_core_news_sm', 'nb' => 'nb_core_news_sm', 'lt' => 'lt_core_news_sm', 'pl' => 'pl_core_news_sm', 'ro' => 'ro_core_news_sm', 'ru' => 'ru_core_news_sm', 'ca' => 'ca_core_news_sm', 'da' => 'da_core_news_sm', 'fi' => 'fi_core_news_sm', 'hr' => 'hr_core_news_sm', 'ko' => 'ko_core_news_sm', 'mk' => 'mk_core_news_sm', 'sl' => 'sl_core_news_sm', 'sv' => 'sv_core_news_sm', 'uk' => 'uk_core_news_sm', 'zh' => 'zh_core_web_sm', 'ja' => 'ja_core_news_sm']
Properties
$handler
private
NlpServiceHandler
$handler
$lemmatizer
private
string
$lemmatizer
$supportedLanguages
private
array<string, bool>|null
$supportedLanguages
= null
Cached language support info
Methods
__construct()
Create a new NLP service lemmatizer.
public
__construct([NlpServiceHandler|null $handler = null ][, string $lemmatizer = 'spacy' ]) : mixed
Parameters
- $handler : NlpServiceHandler|null = null
-
NLP service handler (auto-created if null)
- $lemmatizer : string = 'spacy'
-
Lemmatizer type ('spacy')
getAllPotentialLanguages()
Get all potentially supported languages (including uninstalled models).
public
getAllPotentialLanguages() : array<string|int, string>
Return values
array<string|int, string>getLemmatizerInfo()
Get detailed info about available lemmatizers.
public
getLemmatizerInfo() : array<string|int, mixed>
Return values
array<string|int, mixed>getSupportedLanguages()
Get the list of supported language codes.
public
getSupportedLanguages() : array<string|int, string>
Return values
array<string|int, string> —Array of ISO language codes
isServiceAvailable()
Check if the NLP service is available.
public
isServiceAvailable() : bool
Return values
boollemmatize()
Find the lemma (base form) of a word.
public
lemmatize(string $wordForm, string $languageCode) : string|null
Parameters
- $wordForm : string
- $languageCode : string
-
ISO language code (e.g., 'en', 'de', 'fr')
Return values
string|null —The lemma, or null if not found
lemmatizeBatch()
Lemmatize multiple words in batch.
public
lemmatizeBatch(array<string|int, mixed> $wordForms, string $languageCode) : array<string, string|null>
Parameters
- $wordForms : array<string|int, mixed>
- $languageCode : string
-
ISO language code
Return values
array<string, string|null> —Word => lemma mapping
supportsLanguage()
Check if this lemmatizer supports a given language.
public
supportsLanguage(string $languageCode) : bool
Parameters
- $languageCode : string
-
ISO language code
Return values
bool —True if the language is supported
loadSupportedLanguages()
Load supported languages from the NLP service.
private
loadSupportedLanguages() : void
normalizeLanguageCode()
Normalize language code to base form.
private
normalizeLanguageCode(string $languageCode) : string
Converts codes like 'en-US', 'en_GB', 'eng' to 'en'.
Parameters
- $languageCode : string
-
Language code
Return values
string —Normalized code