Documentation

JsonImporter implements ImporterInterface

Importer for JSON dictionary files.

Supports arrays of entries or objects with term keys.

Tags
since
3.0.0

Table of Contents

Interfaces

ImporterInterface
Interface for dictionary importers.

Constants

MAX_FILE_SIZE  = 100 * 1024 * 1024
Hard cap on the JSON input file size.
DEFAULT_FIELD_MAP  = ['term' => ['term', 'word', 'headword', 'entry', 'lemma'], 'definition' => ['definition', 'meaning', 'translation', 'gloss', 'def'], 'reading' => ['reading', 'pronunciation', 'phonetic', 'furigana', 'pinyin'], 'pos' => ['pos', 'partOfSpeech', 'part_of_speech', 'category']]
Default field mapping for JSON entries.

Methods

canImport()  : bool
Validate that a file can be imported.
detectStructure()  : array{type: string, fieldNames: string[]}
Detect the structure of a JSON file.
getSupportedExtensions()  : array<string|int, string>
Get the supported file extensions for this importer.
parse()  : iterable<string|int, array{term: string, definition: string, reading?: ?string, pos?: ?string}>
Parse a dictionary file and yield entries.
preview()  : array<string|int, array{term: string, definition: string, reading?: ?string, pos?: ?string}>
Get a preview of the first N entries.
findField()  : mixed
Find a field value using custom mapping or default patterns.
mapItemToEntry()  : array{term: string, definition: string, reading?: ?string, pos?: ?string}|null
Map a JSON item (array entry) to a dictionary entry.
mapObjectEntryToEntry()  : array{term: string, definition: string, reading?: ?string, pos?: ?string}|null
Map an object entry (term => data) to a dictionary entry.
parseSimple()  : Generator<string|int, array{term: string, definition: string, reading?: ?string, pos?: ?string}>
Parse JSON file by loading it entirely into memory.
parseStreaming()  : Generator<string|int, array{term: string, definition: string, reading?: ?string, pos?: ?string}>
Parse JSON file using streaming for large files.
validateFile()  : void
Validate that the file exists, is readable, and within the size cap.

Constants

MAX_FILE_SIZE

Hard cap on the JSON input file size.

public mixed MAX_FILE_SIZE = 100 * 1024 * 1024

parseStreaming() currently delegates to parseSimple() (no real streaming parser in tree), so the file gets loaded entirely into memory before json_decode. A 500 MB JSON dictionary OOMs the PHP worker; 100 MB is well above any plausible legitimate single-language dictionary (LWT's own JSON exports of 100k+ terms are still under 20 MB).

DEFAULT_FIELD_MAP

Default field mapping for JSON entries.

private mixed DEFAULT_FIELD_MAP = ['term' => ['term', 'word', 'headword', 'entry', 'lemma'], 'definition' => ['definition', 'meaning', 'translation', 'gloss', 'def'], 'reading' => ['reading', 'pronunciation', 'phonetic', 'furigana', 'pinyin'], 'pos' => ['pos', 'partOfSpeech', 'part_of_speech', 'category']]

Methods

canImport()

Validate that a file can be imported.

public canImport(string $filePath[, string|null $originalName = null ]) : bool
Parameters
$filePath : string

Path to the file (may be a PHP upload tmp_name without an extension)

$originalName : string|null = null

Original filename, used for extension-based detection when $filePath has none (e.g. PHP $_FILES tmp_name)

Return values
bool

True if the file can be imported

detectStructure()

Detect the structure of a JSON file.

public detectStructure(string $filePath) : array{type: string, fieldNames: string[]}
Parameters
$filePath : string

Path to the file

Return values
array{type: string, fieldNames: string[]}

Structure info

getSupportedExtensions()

Get the supported file extensions for this importer.

public getSupportedExtensions() : array<string|int, string>
Return values
array<string|int, string>

parse()

Parse a dictionary file and yield entries.

public parse(string $filePath[, array<string|int, mixed> $options = [] ]) : iterable<string|int, array{term: string, definition: string, reading?: ?string, pos?: ?string}>
Parameters
$filePath : string

Path to the dictionary file

$options : array<string|int, mixed> = []

Import options (format-specific)

Return values
iterable<string|int, array{term: string, definition: string, reading?: ?string, pos?: ?string}>

preview()

Get a preview of the first N entries.

public preview(string $filePath[, int $limit = 10 ][, array<string|int, mixed> $options = [] ]) : array<string|int, array{term: string, definition: string, reading?: ?string, pos?: ?string}>
Parameters
$filePath : string

Path to the dictionary file

$limit : int = 10

Number of entries to preview

$options : array<string|int, mixed> = []

Import options

Return values
array<string|int, array{term: string, definition: string, reading?: ?string, pos?: ?string}>

findField()

Find a field value using custom mapping or default patterns.

private findField(array<string, mixed> $item, string $fieldType, array<string, string>|null $fieldMap) : mixed
Parameters
$item : array<string, mixed>

JSON item

$fieldType : string

Field type (term, definition, etc.)

$fieldMap : array<string, string>|null

Custom field mapping

Return values
mixed

Field value or null

mapItemToEntry()

Map a JSON item (array entry) to a dictionary entry.

private mapItemToEntry(array<string, mixed> $item, array<string, string>|null $fieldMap) : array{term: string, definition: string, reading?: ?string, pos?: ?string}|null
Parameters
$item : array<string, mixed>

JSON item

$fieldMap : array<string, string>|null

Custom field mapping

Return values
array{term: string, definition: string, reading?: ?string, pos?: ?string}|null

mapObjectEntryToEntry()

Map an object entry (term => data) to a dictionary entry.

private mapObjectEntryToEntry(string $term, mixed $value, array<string, string>|null $fieldMap) : array{term: string, definition: string, reading?: ?string, pos?: ?string}|null
Parameters
$term : string

The term (object key)

$value : mixed

The entry data

$fieldMap : array<string, string>|null

Custom field mapping

Return values
array{term: string, definition: string, reading?: ?string, pos?: ?string}|null

parseSimple()

Parse JSON file by loading it entirely into memory.

private parseSimple(string $filePath, array<string, string>|null $fieldMap) : Generator<string|int, array{term: string, definition: string, reading?: ?string, pos?: ?string}>
Parameters
$filePath : string

Path to the file

$fieldMap : array<string, string>|null

Custom field mapping

Return values
Generator<string|int, array{term: string, definition: string, reading?: ?string, pos?: ?string}>

parseStreaming()

Parse JSON file using streaming for large files.

private parseStreaming(string $filePath, array<string, string>|null $fieldMap) : Generator<string|int, array{term: string, definition: string, reading?: ?string, pos?: ?string}>
Parameters
$filePath : string

Path to the file

$fieldMap : array<string, string>|null

Custom field mapping

Tags
todo

Real streaming via halaxa/json-machine. Today this is the same in-memory parse as parseSimple(); the MAX_FILE_SIZE cap is what prevents OOM in the meantime. If you raise the cap, replace this body first.

Return values
Generator<string|int, array{term: string, definition: string, reading?: ?string, pos?: ?string}>

validateFile()

Validate that the file exists, is readable, and within the size cap.

private validateFile(string $filePath) : void
Parameters
$filePath : string

Path to the file

Tags
throws
RuntimeException

If file is invalid or oversized


        
On this page

Search results