Documentation

StarDictImporter implements ImporterInterface

Importer for StarDict dictionary files.

Parses .ifo (info), .idx (index), and .dict (data) files. Supports both compressed (.dict.dz) and uncompressed (.dict) formats.

Tags
since
3.0.0

Table of Contents

Interfaces

ImporterInterface
Interface for dictionary importers.

Constants

POS_TAGS  = ['noun', 'verb', 'adjective', 'adverb', 'pronoun', 'preposition', 'conjunction', 'interjection', 'determiner', 'particle', 'article', 'numeral', 'classifier', 'prefix', 'suffix', 'infix', 'affix', 'phrase', 'proverb', 'idiom', 'abbreviation', 'initialism', 'proper noun', 'adj', 'adv', 'prep', 'conj', 'det', 'pron']
Known part-of-speech tags (lowercase) used to detect POS in entry data.

Properties

$info  : array<string, string>
Dictionary metadata from .ifo file.

Methods

canImport()  : bool
Validate that a file can be imported.
getInfo()  : array<string, string>
Get dictionary metadata.
getSupportedExtensions()  : array<string|int, string>
Get the supported file extensions for this importer.
parse()  : iterable<string|int, array{term: string, definition: string, reading?: ?string, pos?: ?string}>
Parse a dictionary file and yield entries.
preview()  : array<string|int, array{term: string, definition: string, reading?: ?string, pos?: ?string}>
Get a preview of the first N entries.
cleanSegment()  : string
Clean up a single text segment.
findDictFile()  : string|null
Find the dictionary data file (.dict or .dict.dz).
findIdxFile()  : string|null
Find the index file (.idx or .idx.gz).
getBasePath()  : string
Get the base path (without extension) for a StarDict file.
openDictFile()  : resource|false
Open the dictionary data file.
parseFields()  : array{definition: string, pos: string|null}|null
Parse raw entry data into definition and optional POS.
parseIdx()  : Generator<string|int, array{term: string, offset: int, size: int}>
Parse the .idx (index) file.
parseIfo()  : void
Parse the .ifo (info) file.
readEntry()  : array{definition: string, pos: string|null}|null
Read and parse an entry from the dictionary file.

Constants

POS_TAGS

Known part-of-speech tags (lowercase) used to detect POS in entry data.

private mixed POS_TAGS = ['noun', 'verb', 'adjective', 'adverb', 'pronoun', 'preposition', 'conjunction', 'interjection', 'determiner', 'particle', 'article', 'numeral', 'classifier', 'prefix', 'suffix', 'infix', 'affix', 'phrase', 'proverb', 'idiom', 'abbreviation', 'initialism', 'proper noun', 'adj', 'adv', 'prep', 'conj', 'det', 'pron']

Properties

$info

Dictionary metadata from .ifo file.

private array<string, string> $info = []

Methods

canImport()

Validate that a file can be imported.

public canImport(string $filePath) : bool
Parameters
$filePath : string

Path to the file

Return values
bool

True if the file can be imported

getInfo()

Get dictionary metadata.

public getInfo(string $filePath) : array<string, string>
Parameters
$filePath : string

Path to any StarDict file

Return values
array<string, string>

Dictionary info

getSupportedExtensions()

Get the supported file extensions for this importer.

public getSupportedExtensions() : array<string|int, string>
Return values
array<string|int, string>

parse()

Parse a dictionary file and yield entries.

public parse(string $filePath[, array<string|int, mixed> $options = [] ]) : iterable<string|int, array{term: string, definition: string, reading?: ?string, pos?: ?string}>
Parameters
$filePath : string

Path to the dictionary file

$options : array<string|int, mixed> = []

Import options (format-specific)

Return values
iterable<string|int, array{term: string, definition: string, reading?: ?string, pos?: ?string}>

preview()

Get a preview of the first N entries.

public preview(string $filePath[, int $limit = 10 ][, array<string|int, mixed> $options = [] ]) : array<string|int, array{term: string, definition: string, reading?: ?string, pos?: ?string}>
Parameters
$filePath : string

Path to the dictionary file

$limit : int = 10

Number of entries to preview

$options : array<string|int, mixed> = []

Import options

Return values
array<string|int, array{term: string, definition: string, reading?: ?string, pos?: ?string}>

cleanSegment()

Clean up a single text segment.

private cleanSegment(string $segment) : string
Parameters
$segment : string

Raw segment

Return values
string

Cleaned segment

findDictFile()

Find the dictionary data file (.dict or .dict.dz).

private findDictFile(string $basePath) : string|null
Parameters
$basePath : string

Base path without extension

Return values
string|null

Path to dict file or null

findIdxFile()

Find the index file (.idx or .idx.gz).

private findIdxFile(string $basePath) : string|null
Parameters
$basePath : string

Base path without extension

Return values
string|null

Path to idx file or null

getBasePath()

Get the base path (without extension) for a StarDict file.

private getBasePath(string $filePath) : string
Parameters
$filePath : string

Path to any StarDict file

Return values
string

Base path

openDictFile()

Open the dictionary data file.

private openDictFile(string $dictPath) : resource|false
Parameters
$dictPath : string

Path to dict file

Return values
resource|false

File handle or false

parseFields()

Parse raw entry data into definition and optional POS.

private parseFields(string $data) : array{definition: string, pos: string|null}|null

StarDict entries may contain multiple null-byte-separated fields. The first field is often a POS tag (e.g., "noun", "verb").

Parameters
$data : string

Raw entry data

Return values
array{definition: string, pos: string|null}|null

parseIdx()

Parse the .idx (index) file.

private parseIdx(string $idxPath) : Generator<string|int, array{term: string, offset: int, size: int}>
Parameters
$idxPath : string

Path to .idx file

Return values
Generator<string|int, array{term: string, offset: int, size: int}>

parseIfo()

Parse the .ifo (info) file.

private parseIfo(string $ifoPath) : void
Parameters
$ifoPath : string

Path to .ifo file

Tags
throws
RuntimeException

If file is invalid

readEntry()

Read and parse an entry from the dictionary file.

private readEntry(resource $handle, int $offset, int $size) : array{definition: string, pos: string|null}|null
Parameters
$handle : resource

File handle

$offset : int

Byte offset

$size : int

Data size

Return values
array{definition: string, pos: string|null}|null

Parsed entry


        
On this page

Search results