Documentation

DictionaryLemmatizer implements LemmatizerInterface

Lemmatizer that uses dictionary files for lookup.

Dictionary files are TSV format with columns: word_form, lemma Files are loaded from data/lemma-dictionaries/{lang}_lemmas.tsv

Tags
since
3.0.0

Table of Contents

Interfaces

LemmatizerInterface
Interface for lemmatization strategies.

Properties

$availableLanguages  : array<string|int, string>|null
List of available dictionaries (language codes with dictionary files).
$dictionaries  : array<string, array<string, string>>
Loaded dictionaries keyed by language code.
$dictionaryPath  : string
Base directory for dictionary files.

Methods

__construct()  : mixed
Constructor.
clearCache()  : void
Clear all loaded dictionaries from memory.
getDictionaryPath()  : string
Get the dictionary path.
getStatistics()  : array<string, array{entries: int, file_size: int|false}>
Get statistics about loaded dictionaries.
getSupportedLanguages()  : array<string|int, string>
Get the list of supported language codes.
lemmatize()  : string|null
Find the lemma (base form) of a word.
lemmatizeBatch()  : array<string, string|null>
Lemmatize multiple words in batch.
loadDictionary()  : bool
Load a dictionary file for a language.
supportsLanguage()  : bool
Check if this lemmatizer supports a given language.
dictionaryFileExists()  : bool
Check if a dictionary file exists.
ensureDictionaryLoaded()  : void
Ensure a dictionary is loaded.
getDefaultDictionaryPath()  : string
Get the default dictionary path.
getDictionaryFilePath()  : string
Get the file path for a language dictionary.
normalizeLanguageCode()  : string
Normalize a language code to standard format.
parseDictionaryLine()  : void
Parse a single dictionary line.

Properties

$availableLanguages

List of available dictionaries (language codes with dictionary files).

private array<string|int, string>|null $availableLanguages = null

$dictionaries

Loaded dictionaries keyed by language code.

private array<string, array<string, string>> $dictionaries = []

Methods

__construct()

Constructor.

public __construct([string|null $dictionaryPath = null ]) : mixed
Parameters
$dictionaryPath : string|null = null

Base path for dictionary files

getDictionaryPath()

Get the dictionary path.

public getDictionaryPath() : string
Return values
string

getStatistics()

Get statistics about loaded dictionaries.

public getStatistics() : array<string, array{entries: int, file_size: int|false}>
Return values
array<string, array{entries: int, file_size: int|false}>

getSupportedLanguages()

Get the list of supported language codes.

public getSupportedLanguages() : array<string|int, string>
Return values
array<string|int, string>

Array of ISO language codes

lemmatize()

Find the lemma (base form) of a word.

public lemmatize(string $word, string $languageCode) : string|null
Parameters
$word : string

The word to lemmatize

$languageCode : string

ISO language code (e.g., 'en', 'de', 'fr')

Return values
string|null

The lemma, or null if not found

lemmatizeBatch()

Lemmatize multiple words in batch.

public lemmatizeBatch(array<string|int, mixed> $words, string $languageCode) : array<string, string|null>
Parameters
$words : array<string|int, mixed>

Array of words to lemmatize

$languageCode : string

ISO language code

Return values
array<string, string|null>

Word => lemma mapping

loadDictionary()

Load a dictionary file for a language.

public loadDictionary(string $languageCode) : bool
Parameters
$languageCode : string

Normalized language code

Return values
bool

True if loaded successfully

supportsLanguage()

Check if this lemmatizer supports a given language.

public supportsLanguage(string $languageCode) : bool
Parameters
$languageCode : string

ISO language code

Return values
bool

True if the language is supported

dictionaryFileExists()

Check if a dictionary file exists.

private dictionaryFileExists(string $languageCode) : bool
Parameters
$languageCode : string

The language code

Return values
bool

ensureDictionaryLoaded()

Ensure a dictionary is loaded.

private ensureDictionaryLoaded(string $languageCode) : void
Parameters
$languageCode : string

The language code

getDefaultDictionaryPath()

Get the default dictionary path.

private getDefaultDictionaryPath() : string
Return values
string

getDictionaryFilePath()

Get the file path for a language dictionary.

private getDictionaryFilePath(string $languageCode) : string
Parameters
$languageCode : string

The language code

Return values
string

The file path

normalizeLanguageCode()

Normalize a language code to standard format.

private normalizeLanguageCode(string $code) : string

Handles variations like "en-US" -> "en", "eng" -> "en"

Parameters
$code : string

The language code

Return values
string

Normalized code

parseDictionaryLine()

Parse a single dictionary line.

private parseDictionaryLine(string $line, string $languageCode) : void
Parameters
$line : string

The line to parse

$languageCode : string

The language code


        
On this page

Search results