FrequencyImportService
in package
Fetches word frequency lists from the FrequencyWords project and bulk-imports them as starter vocabulary for a language.
Tags
Table of Contents
Constants
- BASE_URL = 'https://raw.githubusercontent.com/hermitdave/FrequencyWords/master/content/2018'
- BATCH_SIZE = 500
- FETCH_TIMEOUT = 30
Methods
- fetchFrequencyList() : array<int, string>
- Fetch the frequency word list from GitHub.
- importWords() : array{imported: int, skipped: int, total: int}
- Import top-N frequency words into the words table.
- isAvailableForLanguage() : bool
- Check if frequency data is available for a language.
- insertBatch() : int
- Insert a batch of words using INSERT IGNORE with prepared statements.
- parseFrequencyList() : array<int, string>
- Parse the FrequencyWords text format.
Constants
BASE_URL
private
mixed
BASE_URL
= 'https://raw.githubusercontent.com/hermitdave/FrequencyWords/master/content/2018'
BATCH_SIZE
private
mixed
BATCH_SIZE
= 500
FETCH_TIMEOUT
private
mixed
FETCH_TIMEOUT
= 30
Methods
fetchFrequencyList()
Fetch the frequency word list from GitHub.
public
fetchFrequencyList(string $languageName) : array<int, string>
Parameters
- $languageName : string
Tags
Return values
array<int, string> —Words in frequency order (most common first)
importWords()
Import top-N frequency words into the words table.
public
importWords(int $langId, string $languageName, int $count) : array{imported: int, skipped: int, total: int}
Words are inserted with status=1 and empty translation, ready for later enrichment.
Parameters
- $langId : int
- $languageName : string
- $count : int
Return values
array{imported: int, skipped: int, total: int}isAvailableForLanguage()
Check if frequency data is available for a language.
public
isAvailableForLanguage(string $languageName) : bool
Parameters
- $languageName : string
Return values
boolinsertBatch()
Insert a batch of words using INSERT IGNORE with prepared statements.
private
insertBatch(array<int, string> $words, int $langId, int|null $userId) : int
Parameters
- $words : array<int, string>
-
Words to insert
- $langId : int
-
Language ID
- $userId : int|null
-
User ID for multi-user mode
Return values
int —Number of rows actually inserted
parseFrequencyList()
Parse the FrequencyWords text format.
private
parseFrequencyList(string $content) : array<int, string>
Each line is: "word frequency\n" (space-delimited).
Parameters
- $content : string
Return values
array<int, string> —Words only, in frequency order