Documentation

ExternalParser implements ParserInterface

Generic external parser that executes command-line tokenizers.

This parser wraps external tokenization programs (like Jieba, Sudachi, etc.) and converts their output into LWT's token format. The parser configuration is loaded from config/parsers.php to ensure only allowed programs can run.

Security: Binary paths come only from the server-side config file, not from user input. All command arguments are properly escaped.

Tags
since
3.0.0

Table of Contents

Interfaces

ParserInterface
Interface for text parsers that tokenize text into words and sentences.

Properties

$availabilityMessage  : string
$available  : bool|null
$config  : ExternalParserConfig

Methods

__construct()  : mixed
Create a new external parser.
getAvailabilityMessage()  : string
Get a description of why this parser might not be available.
getName()  : string
Get human-readable name for UI display.
getType()  : string
Get unique identifier for this parser type.
isAvailable()  : bool
Check if this parser is available on the current system.
parse()  : ParserResult
Parse text into a structured result with sentences and tokens.
buildCommand()  : string
Build the command string.
checkAvailability()  : void
Check if the configured binary is available on the system.
checkUnixPath()  : void
Check if binary is available on Unix-like systems.
checkWindowsPath()  : void
Check if binary is available on Windows.
isAbsolutePath()  : bool
Check if a path is absolute.
isSentenceEnd()  : bool
Check if a token ends a sentence.
isWord()  : bool
Determine if a token is a word (learnable content).
parseLineOutput()  : ParserResult
Parse line-style output (one token per line).
parseOutput()  : ParserResult
Parse the external parser output into a ParserResult.
parseWakatiOutput()  : ParserResult
Parse wakati-style output (space-separated tokens).
preprocessText()  : string
Preprocess text before parsing.
runParser()  : string
Run the external parser and return its output.
runWithFile()  : string
Run command with text in a temporary file.
runWithStdin()  : string
Run command with text piped to stdin.

Properties

Methods

getAvailabilityMessage()

Get a description of why this parser might not be available.

public getAvailabilityMessage() : string
Return values
string

Description of missing dependencies or empty if available

getName()

Get human-readable name for UI display.

public getName() : string
Return values
string

Human-readable parser name

getType()

Get unique identifier for this parser type.

public getType() : string
Return values
string

Parser type identifier

isAvailable()

Check if this parser is available on the current system.

public isAvailable() : bool
Return values
bool

True if parser can be used, false otherwise

parse()

Parse text into a structured result with sentences and tokens.

public parse(string $text, ParserConfig $config) : ParserResult
Parameters
$text : string

Text to parse (already preprocessed)

$config : ParserConfig

Parser configuration from language settings

Return values
ParserResult

Parsing result containing sentences and tokens

buildCommand()

Build the command string.

private buildCommand() : string
Return values
string

Command to execute

checkAvailability()

Check if the configured binary is available on the system.

private checkAvailability() : void

checkUnixPath()

Check if binary is available on Unix-like systems.

private checkUnixPath(string $binary) : void
Parameters
$binary : string

Binary name to check

checkWindowsPath()

Check if binary is available on Windows.

private checkWindowsPath(string $binary) : void
Parameters
$binary : string

Binary name to check

isAbsolutePath()

Check if a path is absolute.

private isAbsolutePath(string $path) : bool
Parameters
$path : string

Path to check

Return values
bool

True if path is absolute

isSentenceEnd()

Check if a token ends a sentence.

private isSentenceEnd(string $token, ParserConfig $config) : bool
Parameters
$token : string

Token text

$config : ParserConfig

Parser configuration

Return values
bool

True if token is sentence-ending punctuation

isWord()

Determine if a token is a word (learnable content).

private isWord(string $token, ParserConfig $config) : bool
Parameters
$token : string

Token text

$config : ParserConfig

Parser configuration

Return values
bool

True if token is a word

preprocessText()

Preprocess text before parsing.

private preprocessText(string $text) : string
Parameters
$text : string

Raw text

Return values
string

Preprocessed text

runParser()

Run the external parser and return its output.

private runParser(string $text) : string
Parameters
$text : string

Text to parse

Tags
throws
RuntimeException

If parser execution fails

Return values
string

Parser output

runWithFile()

Run command with text in a temporary file.

private runWithFile(string $text) : string
Parameters
$text : string

Text to write to file

Tags
throws
RuntimeException

If execution fails

Return values
string

Command output

runWithStdin()

Run command with text piped to stdin.

private runWithStdin(string $command, string $text) : string
Parameters
$command : string

Command to execute

$text : string

Text to pipe

Tags
throws
RuntimeException

If execution fails

Return values
string

Command output


        
On this page

Search results