LemmaBatchService
in package
Service for suggesting, applying, propagating, and linking lemmas.
Tags
Table of Contents
Properties
Methods
- __construct() : mixed
- Constructor.
- applyLemmasToVocabulary() : array{processed: int, updated: int, skipped: int}
- Apply lemmas to existing vocabulary for a language.
- findWordIdByLemma() : int|null
- Find a word ID by its lemma.
- linkTextItemsByLemma() : array{linked: int, unmatched: int, errors: int}
- Link unmatched text items to words by lemma.
- linkTextItemsByLemmaSql() : int
- Link text items directly using SQL (efficient for large datasets).
- propagateLemma() : int
- Copy lemma from one term to all related terms.
- setLemma() : bool
- Set lemma for a specific term.
- suggestLemma() : string|null
- Suggest a lemma for a word.
- suggestLemmasBatch() : array<string, string|null>
- Suggest lemmas for multiple words.
- fetchTermsWithoutLemma() : array<int, array<string, mixed>>
- Fetch terms without a lemma.
- fetchUnmatchedTextItems() : array<int, array<string, mixed>>
- Fetch unmatched text items (Ti2WoID IS NULL) for a language.
- linkItemsToWord() : int
- Link text items to a word.
- updateTermLemma() : void
- Update the lemma for a term.
Properties
$lemmatizer
private
LemmatizerInterface
$lemmatizer
$repository
private
MySqlTermRepository
$repository
Methods
__construct()
Constructor.
public
__construct(LemmatizerInterface $lemmatizer, MySqlTermRepository $repository) : mixed
Parameters
- $lemmatizer : LemmatizerInterface
-
Lemmatizer implementation
- $repository : MySqlTermRepository
-
Term repository
applyLemmasToVocabulary()
Apply lemmas to existing vocabulary for a language.
public
applyLemmasToVocabulary(int $languageId, string $languageCode[, int $batchSize = 100 ]) : array{processed: int, updated: int, skipped: int}
Parameters
- $languageId : int
-
Language ID
- $languageCode : string
-
ISO language code for lemmatizer
- $batchSize : int = 100
-
Number of words to process per batch
Return values
array{processed: int, updated: int, skipped: int}findWordIdByLemma()
Find a word ID by its lemma.
public
findWordIdByLemma(int $languageId, string $lemmaLc) : int|null
Returns the word that has this lemma (preferring the base form).
Parameters
- $languageId : int
-
Language ID
- $lemmaLc : string
-
Lowercase lemma to match
Return values
int|null —Word ID or null if not found
linkTextItemsByLemma()
Link unmatched text items to words by lemma.
public
linkTextItemsByLemma(int $languageId, string $languageCode[, int|null $textId = null ]) : array{linked: int, unmatched: int, errors: int}
When a text item doesn't have an exact word match (Ti2WoID IS NULL), this method tries to find a word whose lemma matches the text item's lemmatized form.
Example: Text item "runs" with no exact match -> lemmatize to "run" -> find word with WoLemmaLC = "run" -> link text item to that word
Parameters
- $languageId : int
-
Language ID
- $languageCode : string
-
ISO language code for lemmatizer
- $textId : int|null = null
-
Optional: limit to specific text
Return values
array{linked: int, unmatched: int, errors: int}linkTextItemsByLemmaSql()
Link text items directly using SQL (efficient for large datasets).
public
linkTextItemsByLemmaSql(int $languageId[, int|null $textId = null ]) : int
This method links text items to words where the text item's lowercase text matches a word's lemma. It's more efficient than the PHP-based approach for large datasets.
Parameters
- $languageId : int
-
Language ID
- $textId : int|null = null
-
Optional text ID filter
Return values
int —Number of text items linked
propagateLemma()
Copy lemma from one term to all related terms.
public
propagateLemma(int $termId, int $languageId, string $languageCode) : int
When a user sets a lemma for "running", this can propagate the lemma "run" to other forms like "runs", "ran" if they match the lemmatizer's suggestions.
Parameters
- $termId : int
-
Source term ID
- $languageId : int
-
Language ID
- $languageCode : string
-
Language code for lemmatizer
Return values
int —Number of terms updated
setLemma()
Set lemma for a specific term.
public
setLemma(int $termId, string $lemma) : bool
Parameters
- $termId : int
-
Term ID
- $lemma : string
-
The lemma to set
Return values
bool —True if updated
suggestLemma()
Suggest a lemma for a word.
public
suggestLemma(string $word, string $languageCode) : string|null
Parameters
- $word : string
-
The word to lemmatize
- $languageCode : string
-
ISO language code (e.g., 'en', 'de')
Return values
string|null —The suggested lemma, or null if not found
suggestLemmasBatch()
Suggest lemmas for multiple words.
public
suggestLemmasBatch(array<string|int, string> $words, string $languageCode) : array<string, string|null>
Parameters
- $words : array<string|int, string>
-
Array of words
- $languageCode : string
-
ISO language code
Return values
array<string, string|null> —Word => lemma mapping
fetchTermsWithoutLemma()
Fetch terms without a lemma.
private
fetchTermsWithoutLemma(int $languageId, int $limit, int $offset) : array<int, array<string, mixed>>
Parameters
- $languageId : int
-
Language ID
- $limit : int
-
Maximum number to fetch
- $offset : int
-
Starting offset
Return values
array<int, array<string, mixed>>fetchUnmatchedTextItems()
Fetch unmatched text items (Ti2WoID IS NULL) for a language.
private
fetchUnmatchedTextItems(int $languageId[, int|null $textId = null ]) : array<int, array<string, mixed>>
Parameters
- $languageId : int
-
Language ID
- $textId : int|null = null
-
Optional text ID filter
Return values
array<int, array<string, mixed>>linkItemsToWord()
Link text items to a word.
private
linkItemsToWord(array<int, array<string, mixed>> $items, int $wordId) : int
Parameters
- $items : array<int, array<string, mixed>>
-
Text items to link
- $wordId : int
-
Word ID to link to
Return values
int —Number of items linked
updateTermLemma()
Update the lemma for a term.
private
updateTermLemma(int $termId, string $lemma) : void
Parameters
- $termId : int
-
Term ID
- $lemma : string
-
The lemma to set