Documentation

LemmaBatchService

Service for suggesting, applying, propagating, and linking lemmas.

Tags
since
3.0.0

Table of Contents

Properties

$lemmatizer  : LemmatizerInterface
$repository  : MySqlTermRepository

Methods

__construct()  : mixed
Constructor.
applyLemmasToVocabulary()  : array{processed: int, updated: int, skipped: int}
Apply lemmas to existing vocabulary for a language.
findWordIdByLemma()  : int|null
Find a word ID by its lemma.
linkTextItemsByLemma()  : array{linked: int, unmatched: int, errors: int}
Link unmatched text items to words by lemma.
linkTextItemsByLemmaSql()  : int
Link text items directly using SQL (efficient for large datasets).
propagateLemma()  : int
Copy lemma from one term to all related terms.
setLemma()  : bool
Set lemma for a specific term.
suggestLemma()  : string|null
Suggest a lemma for a word.
suggestLemmasBatch()  : array<string, string|null>
Suggest lemmas for multiple words.
fetchTermsWithoutLemma()  : array<int, array<string, mixed>>
Fetch terms without a lemma.
fetchUnmatchedTextItems()  : array<int, array<string, mixed>>
Fetch unmatched text items (Ti2WoID IS NULL) for a language.
linkItemsToWord()  : int
Link text items to a word.
updateTermLemma()  : void
Update the lemma for a term.

Properties

Methods

applyLemmasToVocabulary()

Apply lemmas to existing vocabulary for a language.

public applyLemmasToVocabulary(int $languageId, string $languageCode[, int $batchSize = 100 ]) : array{processed: int, updated: int, skipped: int}
Parameters
$languageId : int

Language ID

$languageCode : string

ISO language code for lemmatizer

$batchSize : int = 100

Number of words to process per batch

Return values
array{processed: int, updated: int, skipped: int}

findWordIdByLemma()

Find a word ID by its lemma.

public findWordIdByLemma(int $languageId, string $lemmaLc) : int|null

Returns the word that has this lemma (preferring the base form).

Parameters
$languageId : int

Language ID

$lemmaLc : string

Lowercase lemma to match

Return values
int|null

Word ID or null if not found

linkTextItemsByLemma()

Link unmatched text items to words by lemma.

public linkTextItemsByLemma(int $languageId, string $languageCode[, int|null $textId = null ]) : array{linked: int, unmatched: int, errors: int}

When a text item doesn't have an exact word match (Ti2WoID IS NULL), this method tries to find a word whose lemma matches the text item's lemmatized form.

Example: Text item "runs" with no exact match -> lemmatize to "run" -> find word with WoLemmaLC = "run" -> link text item to that word

Parameters
$languageId : int

Language ID

$languageCode : string

ISO language code for lemmatizer

$textId : int|null = null

Optional: limit to specific text

Return values
array{linked: int, unmatched: int, errors: int}

linkTextItemsByLemmaSql()

Link text items directly using SQL (efficient for large datasets).

public linkTextItemsByLemmaSql(int $languageId[, int|null $textId = null ]) : int

This method links text items to words where the text item's lowercase text matches a word's lemma. It's more efficient than the PHP-based approach for large datasets.

Parameters
$languageId : int

Language ID

$textId : int|null = null

Optional text ID filter

Return values
int

Number of text items linked

propagateLemma()

Copy lemma from one term to all related terms.

public propagateLemma(int $termId, int $languageId, string $languageCode) : int

When a user sets a lemma for "running", this can propagate the lemma "run" to other forms like "runs", "ran" if they match the lemmatizer's suggestions.

Parameters
$termId : int

Source term ID

$languageId : int

Language ID

$languageCode : string

Language code for lemmatizer

Return values
int

Number of terms updated

setLemma()

Set lemma for a specific term.

public setLemma(int $termId, string $lemma) : bool
Parameters
$termId : int

Term ID

$lemma : string

The lemma to set

Return values
bool

True if updated

suggestLemma()

Suggest a lemma for a word.

public suggestLemma(string $word, string $languageCode) : string|null
Parameters
$word : string

The word to lemmatize

$languageCode : string

ISO language code (e.g., 'en', 'de')

Return values
string|null

The suggested lemma, or null if not found

suggestLemmasBatch()

Suggest lemmas for multiple words.

public suggestLemmasBatch(array<string|int, string> $words, string $languageCode) : array<string, string|null>
Parameters
$words : array<string|int, string>

Array of words

$languageCode : string

ISO language code

Return values
array<string, string|null>

Word => lemma mapping

fetchTermsWithoutLemma()

Fetch terms without a lemma.

private fetchTermsWithoutLemma(int $languageId, int $limit, int $offset) : array<int, array<string, mixed>>
Parameters
$languageId : int

Language ID

$limit : int

Maximum number to fetch

$offset : int

Starting offset

Return values
array<int, array<string, mixed>>

fetchUnmatchedTextItems()

Fetch unmatched text items (Ti2WoID IS NULL) for a language.

private fetchUnmatchedTextItems(int $languageId[, int|null $textId = null ]) : array<int, array<string, mixed>>
Parameters
$languageId : int

Language ID

$textId : int|null = null

Optional text ID filter

Return values
array<int, array<string, mixed>>

linkItemsToWord()

Link text items to a word.

private linkItemsToWord(array<int, array<string, mixed>> $items, int $wordId) : int
Parameters
$items : array<int, array<string, mixed>>

Text items to link

$wordId : int

Word ID to link to

Return values
int

Number of items linked

updateTermLemma()

Update the lemma for a term.

private updateTermLemma(int $termId, string $lemma) : void
Parameters
$termId : int

Term ID

$lemma : string

The lemma to set


        
On this page

Search results