ExternalParser
in package
implements
ParserInterface
Generic external parser that executes command-line tokenizers.
This parser wraps external tokenization programs (like Jieba, Sudachi, etc.) and converts their output into LWT's token format. The parser configuration is loaded from config/parsers.php to ensure only allowed programs can run.
Security: Binary paths come only from the server-side config file, not from user input. All command arguments are properly escaped.
Tags
Table of Contents
Interfaces
- ParserInterface
- Interface for text parsers that tokenize text into words and sentences.
Properties
- $availabilityMessage : string
- $available : bool|null
- $config : ExternalParserConfig
Methods
- __construct() : mixed
- Create a new external parser.
- getAvailabilityMessage() : string
- Get a description of why this parser might not be available.
- getName() : string
- Get human-readable name for UI display.
- getType() : string
- Get unique identifier for this parser type.
- isAvailable() : bool
- Check if this parser is available on the current system.
- parse() : ParserResult
- Parse text into a structured result with sentences and tokens.
- buildCommand() : string
- Build the command string.
- checkAvailability() : void
- Check if the configured binary is available on the system.
- checkUnixPath() : void
- Check if binary is available on Unix-like systems.
- checkWindowsPath() : void
- Check if binary is available on Windows.
- isAbsolutePath() : bool
- Check if a path is absolute.
- isSentenceEnd() : bool
- Check if a token ends a sentence.
- isWord() : bool
- Determine if a token is a word (learnable content).
- parseLineOutput() : ParserResult
- Parse line-style output (one token per line).
- parseOutput() : ParserResult
- Parse the external parser output into a ParserResult.
- parseWakatiOutput() : ParserResult
- Parse wakati-style output (space-separated tokens).
- preprocessText() : string
- Preprocess text before parsing.
- runParser() : string
- Run the external parser and return its output.
- runWithFile() : string
- Run command with text in a temporary file.
- runWithStdin() : string
- Run command with text piped to stdin.
Properties
$availabilityMessage
private
string
$availabilityMessage
= ''
$available
private
bool|null
$available
= null
$config
private
ExternalParserConfig
$config
Methods
__construct()
Create a new external parser.
public
__construct(ExternalParserConfig $config) : mixed
Parameters
- $config : ExternalParserConfig
-
Parser configuration from config file
getAvailabilityMessage()
Get a description of why this parser might not be available.
public
getAvailabilityMessage() : string
Return values
string —Description of missing dependencies or empty if available
getName()
Get human-readable name for UI display.
public
getName() : string
Return values
string —Human-readable parser name
getType()
Get unique identifier for this parser type.
public
getType() : string
Return values
string —Parser type identifier
isAvailable()
Check if this parser is available on the current system.
public
isAvailable() : bool
Return values
bool —True if parser can be used, false otherwise
parse()
Parse text into a structured result with sentences and tokens.
public
parse(string $text, ParserConfig $config) : ParserResult
Parameters
- $text : string
-
Text to parse (already preprocessed)
- $config : ParserConfig
-
Parser configuration from language settings
Return values
ParserResult —Parsing result containing sentences and tokens
buildCommand()
Build the command string.
private
buildCommand() : string
Return values
string —Command to execute
checkAvailability()
Check if the configured binary is available on the system.
private
checkAvailability() : void
checkUnixPath()
Check if binary is available on Unix-like systems.
private
checkUnixPath(string $binary) : void
Parameters
- $binary : string
-
Binary name to check
checkWindowsPath()
Check if binary is available on Windows.
private
checkWindowsPath(string $binary) : void
Parameters
- $binary : string
-
Binary name to check
isAbsolutePath()
Check if a path is absolute.
private
isAbsolutePath(string $path) : bool
Parameters
- $path : string
-
Path to check
Return values
bool —True if path is absolute
isSentenceEnd()
Check if a token ends a sentence.
private
isSentenceEnd(string $token, ParserConfig $config) : bool
Parameters
- $token : string
-
Token text
- $config : ParserConfig
-
Parser configuration
Return values
bool —True if token is sentence-ending punctuation
isWord()
Determine if a token is a word (learnable content).
private
isWord(string $token, ParserConfig $config) : bool
Parameters
- $token : string
-
Token text
- $config : ParserConfig
-
Parser configuration
Return values
bool —True if token is a word
parseLineOutput()
Parse line-style output (one token per line).
private
parseLineOutput(string $output, ParserConfig $config) : ParserResult
Parameters
- $output : string
-
Parser output
- $config : ParserConfig
-
Parser configuration
Return values
ParserResult —Parsed result
parseOutput()
Parse the external parser output into a ParserResult.
private
parseOutput(string $output, ParserConfig $config) : ParserResult
Parameters
- $output : string
-
Parser output
- $config : ParserConfig
-
Parser configuration
Return values
ParserResult —Parsed result
parseWakatiOutput()
Parse wakati-style output (space-separated tokens).
private
parseWakatiOutput(string $output, ParserConfig $config) : ParserResult
Parameters
- $output : string
-
Parser output
- $config : ParserConfig
-
Parser configuration
Return values
ParserResult —Parsed result
preprocessText()
Preprocess text before parsing.
private
preprocessText(string $text) : string
Parameters
- $text : string
-
Raw text
Return values
string —Preprocessed text
runParser()
Run the external parser and return its output.
private
runParser(string $text) : string
Parameters
- $text : string
-
Text to parse
Tags
Return values
string —Parser output
runWithFile()
Run command with text in a temporary file.
private
runWithFile(string $text) : string
Parameters
- $text : string
-
Text to write to file
Tags
Return values
string —Command output
runWithStdin()
Run command with text piped to stdin.
private
runWithStdin(string $command, string $text) : string
Parameters
- $command : string
-
Command to execute
- $text : string
-
Text to pipe
Tags
Return values
string —Command output