Documentation

FrequencyImportService
in package

Fetches word frequency lists from the FrequencyWords project and bulk-imports them as starter vocabulary for a language.

Tags
see
https://github.com/hermitdave/FrequencyWords

Table of Contents

Constants

BASE_URL  = 'https://raw.githubusercontent.com/hermitdave/FrequencyWords/master/content/2018'
BATCH_SIZE  = 500
FETCH_TIMEOUT  = 30

Methods

fetchFrequencyList()  : array<int, string>
Fetch the frequency word list from GitHub.
importWords()  : array{imported: int, skipped: int, total: int}
Import top-N frequency words into the words table.
isAvailableForLanguage()  : bool
Check if frequency data is available for a language.
insertBatch()  : int
Insert a batch of words using INSERT IGNORE with prepared statements.
parseFrequencyList()  : array<int, string>
Parse the FrequencyWords text format.

Constants

BASE_URL

private mixed BASE_URL = 'https://raw.githubusercontent.com/hermitdave/FrequencyWords/master/content/2018'

Methods

fetchFrequencyList()

Fetch the frequency word list from GitHub.

public fetchFrequencyList(string $languageName) : array<int, string>
Parameters
$languageName : string
Tags
throws
RuntimeException

On network failure

Return values
array<int, string>

Words in frequency order (most common first)

importWords()

Import top-N frequency words into the words table.

public importWords(int $langId, string $languageName, int $count) : array{imported: int, skipped: int, total: int}

Words are inserted with status=1 and empty translation, ready for later enrichment.

Parameters
$langId : int
$languageName : string
$count : int
Return values
array{imported: int, skipped: int, total: int}

isAvailableForLanguage()

Check if frequency data is available for a language.

public isAvailableForLanguage(string $languageName) : bool
Parameters
$languageName : string
Return values
bool

insertBatch()

Insert a batch of words using INSERT IGNORE with prepared statements.

private insertBatch(array<int, string> $words, int $langId, int|null $userId) : int
Parameters
$words : array<int, string>

Words to insert

$langId : int

Language ID

$userId : int|null

User ID for multi-user mode

Return values
int

Number of rows actually inserted

parseFrequencyList()

Parse the FrequencyWords text format.

private parseFrequencyList(string $content) : array<int, string>

Each line is: "word frequency\n" (space-delimited).

Parameters
$content : string
Return values
array<int, string>

Words only, in frequency order


        
On this page

Search results