Documentation

WiktionaryEnrichmentService
in package

Enriches imported vocabulary with translations from kaikki.org (Wiktextract structured data) or monolingual definitions from Wiktionary APIs.

Designed to be called in small batches via AJAX polling, so the UI can show progress without blocking.

Table of Contents

Constants

BATCH_SIZE  = 20
FETCH_TIMEOUT  = 10
KAIKKI_BASE_URL  = 'https://kaikki.org/dictionary'
MAX_CONSECUTIVE_FAILURES  = 5
WIKTIONARY_API_TEMPLATE  = 'https://%s.wiktionary.org/w/api.php'

Methods

buildKaikkiUrl()  : string
Build the kaikki.org URL for a word.
countTotal()  : int
Count total words for a language (for progress calculation).
countUnenriched()  : int
Count remaining unenriched words for progress tracking.
enrichBatchDefinition()  : array{enriched: int, failed: int, remaining: int, total: int, warning: string}
Enrich a batch of words with monolingual definitions from Wiktionary.
enrichBatchTranslation()  : array{enriched: int, failed: int, remaining: int, total: int, warning: string}
Enrich a batch of words with English translations from kaikki.org.
fetchKaikkiTranslation()  : string|null
Fetch English translation from kaikki.org for a single word.
fetchWiktionaryDefinition()  : string|null
Fetch monolingual definition from Wiktionary API.
getUnenrichedWords()  : array<int, array{WoID: int, WoText: string}>
Get the next batch of unenriched words for a language.
parseKaikkiResponse()  : string|null
Parse kaikki.org JSONL response to extract the first English gloss.
parseWikitext()  : string|null
Parse wikitext to extract the first definition line.
cleanWikitext()  : string
Clean wikitext markup to produce readable text.
fetchFromWiktionaryApi()  : string|null
Fetch a definition from the Wiktionary parse API.
httpGet()  : string|null
Perform an HTTP GET with timeout.
updateTranslation()  : void
Update a word's translation in the database.

Constants

Methods

buildKaikkiUrl()

Build the kaikki.org URL for a word.

public buildKaikkiUrl(string $word, string $kaikkiLangName) : string

Path format: /dictionary/{Language}/meaning/{w[0]}/{w[0:2]}/{word}.jsonl

Parameters
$word : string
$kaikkiLangName : string
Return values
string

countTotal()

Count total words for a language (for progress calculation).

public countTotal(int $langId) : int
Parameters
$langId : int
Return values
int

countUnenriched()

Count remaining unenriched words for progress tracking.

public countUnenriched(int $langId) : int
Parameters
$langId : int
Return values
int

enrichBatchDefinition()

Enrich a batch of words with monolingual definitions from Wiktionary.

public enrichBatchDefinition(int $langId, string $languageName) : array{enriched: int, failed: int, remaining: int, total: int, warning: string}
Parameters
$langId : int
$languageName : string
Return values
array{enriched: int, failed: int, remaining: int, total: int, warning: string}

enrichBatchTranslation()

Enrich a batch of words with English translations from kaikki.org.

public enrichBatchTranslation(int $langId, string $languageName) : array{enriched: int, failed: int, remaining: int, total: int, warning: string}
Parameters
$langId : int
$languageName : string
Return values
array{enriched: int, failed: int, remaining: int, total: int, warning: string}

fetchKaikkiTranslation()

Fetch English translation from kaikki.org for a single word.

public fetchKaikkiTranslation(string $word, string $kaikkiLangName) : string|null
Parameters
$word : string
$kaikkiLangName : string
Return values
string|null

First English gloss, or null on failure

fetchWiktionaryDefinition()

Fetch monolingual definition from Wiktionary API.

public fetchWiktionaryDefinition(string $word, string $wiktCode, string $kaikkiLangName) : string|null

Strategy: first try kaikki.org for the raw_glosses/glosses in the target language. If that fails, fall back to the Wiktionary parse API and extract the first definition line from wikitext.

Parameters
$word : string
$wiktCode : string
$kaikkiLangName : string
Return values
string|null

First definition, or null on failure

getUnenrichedWords()

Get the next batch of unenriched words for a language.

public getUnenrichedWords(int $langId[, int $batchSize = self::BATCH_SIZE ]) : array<int, array{WoID: int, WoText: string}>
Parameters
$langId : int
$batchSize : int = self::BATCH_SIZE
Return values
array<int, array{WoID: int, WoText: string}>

parseKaikkiResponse()

Parse kaikki.org JSONL response to extract the first English gloss.

public parseKaikkiResponse(string $jsonl) : string|null

Prefers non-form-of entries (lexical definitions over inflection forms).

Parameters
$jsonl : string
Return values
string|null

First gloss or null

parseWikitext()

Parse wikitext to extract the first definition line.

public parseWikitext(string $wikitext) : string|null

Wikitext definitions look like:

[[house]]

{{lb|es|architecture}} [[building]]

Parameters
$wikitext : string
Return values
string|null

Cleaned definition or null

cleanWikitext()

Clean wikitext markup to produce readable text.

private cleanWikitext(string $text) : string
Parameters
$text : string
Return values
string

fetchFromWiktionaryApi()

Fetch a definition from the Wiktionary parse API.

private fetchFromWiktionaryApi(string $word, string $wiktCode) : string|null

Uses {lang}.wiktionary.org to get a monolingual definition.

Parameters
$word : string
$wiktCode : string
Return values
string|null

First definition line or null

httpGet()

Perform an HTTP GET with timeout.

private httpGet(string $url) : string|null
Parameters
$url : string
Return values
string|null

Response body or null on failure

updateTranslation()

Update a word's translation in the database.

private updateTranslation(int $wordId, string $translation) : void
Parameters
$wordId : int
$translation : string

        
On this page

Search results