Documentation

NlpServiceLemmatizer
in package
implements LemmatizerInterface

Lemmatizer that uses the NLP microservice (spaCy).

This lemmatizer communicates with the Python NLP microservice for high-accuracy lemmatization using spaCy models. It supports 25+ languages with context-aware lemmatization.

The NLP service must be running and accessible via NLP_SERVICE_URL.

Table of Contents

Interfaces

LemmatizerInterface
Interface for lemmatization strategies.

Constants

SPACY_MODELS  = ['en' => 'en_core_web_sm', 'de' => 'de_core_news_sm', 'fr' => 'fr_core_news_sm', 'es' => 'es_core_news_sm', 'pt' => 'pt_core_news_sm', 'it' => 'it_core_news_sm', 'nl' => 'nl_core_news_sm', 'el' => 'el_core_news_sm', 'nb' => 'nb_core_news_sm', 'lt' => 'lt_core_news_sm', 'pl' => 'pl_core_news_sm', 'ro' => 'ro_core_news_sm', 'ru' => 'ru_core_news_sm', 'ca' => 'ca_core_news_sm', 'da' => 'da_core_news_sm', 'fi' => 'fi_core_news_sm', 'hr' => 'hr_core_news_sm', 'ko' => 'ko_core_news_sm', 'mk' => 'mk_core_news_sm', 'sl' => 'sl_core_news_sm', 'sv' => 'sv_core_news_sm', 'uk' => 'uk_core_news_sm', 'zh' => 'zh_core_web_sm', 'ja' => 'ja_core_news_sm']
Supported spaCy models by language code.

Properties

$handler  : NlpServiceHandler
$lemmatizer  : string
$supportedLanguages  : array<string, bool>|null

Methods

__construct()  : mixed
Create a new NLP service lemmatizer.
getAllPotentialLanguages()  : array<string|int, string>
Get all potentially supported languages (including uninstalled models).
getLemmatizerInfo()  : array<string|int, mixed>
Get detailed info about available lemmatizers.
getSupportedLanguages()  : array<string|int, string>
Get the list of supported language codes.
isServiceAvailable()  : bool
Check if the NLP service is available.
lemmatize()  : string|null
Find the lemma (base form) of a word.
lemmatizeBatch()  : array<string, string|null>
Lemmatize multiple words in batch.
supportsLanguage()  : bool
Check if this lemmatizer supports a given language.
loadSupportedLanguages()  : void
Load supported languages from the NLP service.
normalizeLanguageCode()  : string
Normalize language code to base form.

Constants

SPACY_MODELS

Supported spaCy models by language code.

private array<string, string> SPACY_MODELS = ['en' => 'en_core_web_sm', 'de' => 'de_core_news_sm', 'fr' => 'fr_core_news_sm', 'es' => 'es_core_news_sm', 'pt' => 'pt_core_news_sm', 'it' => 'it_core_news_sm', 'nl' => 'nl_core_news_sm', 'el' => 'el_core_news_sm', 'nb' => 'nb_core_news_sm', 'lt' => 'lt_core_news_sm', 'pl' => 'pl_core_news_sm', 'ro' => 'ro_core_news_sm', 'ru' => 'ru_core_news_sm', 'ca' => 'ca_core_news_sm', 'da' => 'da_core_news_sm', 'fi' => 'fi_core_news_sm', 'hr' => 'hr_core_news_sm', 'ko' => 'ko_core_news_sm', 'mk' => 'mk_core_news_sm', 'sl' => 'sl_core_news_sm', 'sv' => 'sv_core_news_sm', 'uk' => 'uk_core_news_sm', 'zh' => 'zh_core_web_sm', 'ja' => 'ja_core_news_sm']

Properties

$supportedLanguages

private array<string, bool>|null $supportedLanguages = null

Cached language support info

Methods

__construct()

Create a new NLP service lemmatizer.

public __construct([NlpServiceHandler|null $handler = null ][, string $lemmatizer = 'spacy' ]) : mixed
Parameters
$handler : NlpServiceHandler|null = null

NLP service handler (auto-created if null)

$lemmatizer : string = 'spacy'

Lemmatizer type ('spacy')

getAllPotentialLanguages()

Get all potentially supported languages (including uninstalled models).

public getAllPotentialLanguages() : array<string|int, string>
Return values
array<string|int, string>

getLemmatizerInfo()

Get detailed info about available lemmatizers.

public getLemmatizerInfo() : array<string|int, mixed>
Return values
array<string|int, mixed>

getSupportedLanguages()

Get the list of supported language codes.

public getSupportedLanguages() : array<string|int, string>
Return values
array<string|int, string>

Array of ISO language codes

isServiceAvailable()

Check if the NLP service is available.

public isServiceAvailable() : bool
Return values
bool

lemmatize()

Find the lemma (base form) of a word.

public lemmatize(string $wordForm, string $languageCode) : string|null
Parameters
$wordForm : string
$languageCode : string

ISO language code (e.g., 'en', 'de', 'fr')

Return values
string|null

The lemma, or null if not found

lemmatizeBatch()

Lemmatize multiple words in batch.

public lemmatizeBatch(array<string|int, mixed> $wordForms, string $languageCode) : array<string, string|null>
Parameters
$wordForms : array<string|int, mixed>
$languageCode : string

ISO language code

Return values
array<string, string|null>

Word => lemma mapping

supportsLanguage()

Check if this lemmatizer supports a given language.

public supportsLanguage(string $languageCode) : bool
Parameters
$languageCode : string

ISO language code

Return values
bool

True if the language is supported

loadSupportedLanguages()

Load supported languages from the NLP service.

private loadSupportedLanguages() : void

normalizeLanguageCode()

Normalize language code to base form.

private normalizeLanguageCode(string $languageCode) : string

Converts codes like 'en-US', 'en_GB', 'eng' to 'en'.

Parameters
$languageCode : string

Language code

Return values
string

Normalized code


        
On this page

Search results