StarDictImporter
in package
implements
ImporterInterface
Importer for StarDict dictionary files.
Parses .ifo (info), .idx (index), and .dict (data) files. Supports both compressed (.dict.dz) and uncompressed (.dict) formats.
Tags
Table of Contents
Interfaces
- ImporterInterface
- Interface for dictionary importers.
Constants
- POS_TAGS = ['noun', 'verb', 'adjective', 'adverb', 'pronoun', 'preposition', 'conjunction', 'interjection', 'determiner', 'particle', 'article', 'numeral', 'classifier', 'prefix', 'suffix', 'infix', 'affix', 'phrase', 'proverb', 'idiom', 'abbreviation', 'initialism', 'proper noun', 'adj', 'adv', 'prep', 'conj', 'det', 'pron']
- Known part-of-speech tags (lowercase) used to detect POS in entry data.
Properties
- $info : array<string, string>
- Dictionary metadata from .ifo file.
Methods
- canImport() : bool
- Validate that a file can be imported.
- getInfo() : array<string, string>
- Get dictionary metadata.
- getSupportedExtensions() : array<string|int, string>
- Get the supported file extensions for this importer.
- parse() : iterable<string|int, array{term: string, definition: string, reading?: ?string, pos?: ?string}>
- Parse a dictionary file and yield entries.
- preview() : array<string|int, array{term: string, definition: string, reading?: ?string, pos?: ?string}>
- Get a preview of the first N entries.
- cleanSegment() : string
- Clean up a single text segment.
- findDictFile() : string|null
- Find the dictionary data file (.dict or .dict.dz).
- findIdxFile() : string|null
- Find the index file (.idx or .idx.gz).
- getBasePath() : string
- Get the base path (without extension) for a StarDict file.
- openDictFile() : resource|false
- Open the dictionary data file.
- parseFields() : array{definition: string, pos: string|null}|null
- Parse raw entry data into definition and optional POS.
- parseIdx() : Generator<string|int, array{term: string, offset: int, size: int}>
- Parse the .idx (index) file.
- parseIfo() : void
- Parse the .ifo (info) file.
- readEntry() : array{definition: string, pos: string|null}|null
- Read and parse an entry from the dictionary file.
Constants
POS_TAGS
Known part-of-speech tags (lowercase) used to detect POS in entry data.
private
mixed
POS_TAGS
= ['noun', 'verb', 'adjective', 'adverb', 'pronoun', 'preposition', 'conjunction', 'interjection', 'determiner', 'particle', 'article', 'numeral', 'classifier', 'prefix', 'suffix', 'infix', 'affix', 'phrase', 'proverb', 'idiom', 'abbreviation', 'initialism', 'proper noun', 'adj', 'adv', 'prep', 'conj', 'det', 'pron']
Properties
$info
Dictionary metadata from .ifo file.
private
array<string, string>
$info
= []
Methods
canImport()
Validate that a file can be imported.
public
canImport(string $filePath) : bool
Parameters
- $filePath : string
-
Path to the file
Return values
bool —True if the file can be imported
getInfo()
Get dictionary metadata.
public
getInfo(string $filePath) : array<string, string>
Parameters
- $filePath : string
-
Path to any StarDict file
Return values
array<string, string> —Dictionary info
getSupportedExtensions()
Get the supported file extensions for this importer.
public
getSupportedExtensions() : array<string|int, string>
Return values
array<string|int, string>parse()
Parse a dictionary file and yield entries.
public
parse(string $filePath[, array<string|int, mixed> $options = [] ]) : iterable<string|int, array{term: string, definition: string, reading?: ?string, pos?: ?string}>
Parameters
- $filePath : string
-
Path to the dictionary file
- $options : array<string|int, mixed> = []
-
Import options (format-specific)
Return values
iterable<string|int, array{term: string, definition: string, reading?: ?string, pos?: ?string}>preview()
Get a preview of the first N entries.
public
preview(string $filePath[, int $limit = 10 ][, array<string|int, mixed> $options = [] ]) : array<string|int, array{term: string, definition: string, reading?: ?string, pos?: ?string}>
Parameters
- $filePath : string
-
Path to the dictionary file
- $limit : int = 10
-
Number of entries to preview
- $options : array<string|int, mixed> = []
-
Import options
Return values
array<string|int, array{term: string, definition: string, reading?: ?string, pos?: ?string}>cleanSegment()
Clean up a single text segment.
private
cleanSegment(string $segment) : string
Parameters
- $segment : string
-
Raw segment
Return values
string —Cleaned segment
findDictFile()
Find the dictionary data file (.dict or .dict.dz).
private
findDictFile(string $basePath) : string|null
Parameters
- $basePath : string
-
Base path without extension
Return values
string|null —Path to dict file or null
findIdxFile()
Find the index file (.idx or .idx.gz).
private
findIdxFile(string $basePath) : string|null
Parameters
- $basePath : string
-
Base path without extension
Return values
string|null —Path to idx file or null
getBasePath()
Get the base path (without extension) for a StarDict file.
private
getBasePath(string $filePath) : string
Parameters
- $filePath : string
-
Path to any StarDict file
Return values
string —Base path
openDictFile()
Open the dictionary data file.
private
openDictFile(string $dictPath) : resource|false
Parameters
- $dictPath : string
-
Path to dict file
Return values
resource|false —File handle or false
parseFields()
Parse raw entry data into definition and optional POS.
private
parseFields(string $data) : array{definition: string, pos: string|null}|null
StarDict entries may contain multiple null-byte-separated fields. The first field is often a POS tag (e.g., "noun", "verb").
Parameters
- $data : string
-
Raw entry data
Return values
array{definition: string, pos: string|null}|nullparseIdx()
Parse the .idx (index) file.
private
parseIdx(string $idxPath) : Generator<string|int, array{term: string, offset: int, size: int}>
Parameters
- $idxPath : string
-
Path to .idx file
Return values
Generator<string|int, array{term: string, offset: int, size: int}>parseIfo()
Parse the .ifo (info) file.
private
parseIfo(string $ifoPath) : void
Parameters
- $ifoPath : string
-
Path to .ifo file
Tags
readEntry()
Read and parse an entry from the dictionary file.
private
readEntry(resource $handle, int $offset, int $size) : array{definition: string, pos: string|null}|null
Parameters
- $handle : resource
-
File handle
- $offset : int
-
Byte offset
- $size : int
-
Data size
Return values
array{definition: string, pos: string|null}|null —Parsed entry