Documentation

CharacterParser implements ParserInterface

Character-by-character parser for CJK languages.

Each character is treated as a separate word. This is suitable for Chinese and similar languages where there are no word boundaries.

Tags
since
3.0.0

Table of Contents

Interfaces

ParserInterface
Interface for text parsers that tokenize text into words and sentences.

Properties

$parsingService  : TextParsingService

Methods

__construct()  : mixed
getAvailabilityMessage()  : string
Get a description of why this parser might not be available.
getName()  : string
Get human-readable name for UI display.
getType()  : string
Get unique identifier for this parser type.
isAvailable()  : bool
Check if this parser is available on the current system.
parse()  : ParserResult
Parse text into a structured result with sentences and tokens.
applyInitialTransformations()  : string
Apply initial text transformations with character splitting.
applyWordSplitting()  : string
Apply word-splitting transformations.
parseToResult()  : ParserResult
Parse preprocessed text into a ParserResult.

Properties

Methods

getAvailabilityMessage()

Get a description of why this parser might not be available.

public getAvailabilityMessage() : string
Return values
string

Description of missing dependencies or empty if available

getName()

Get human-readable name for UI display.

public getName() : string
Return values
string

Human-readable parser name

getType()

Get unique identifier for this parser type.

public getType() : string
Return values
string

Parser type identifier

isAvailable()

Check if this parser is available on the current system.

public isAvailable() : bool
Return values
bool

True if parser can be used, false otherwise

parse()

Parse text into a structured result with sentences and tokens.

public parse(string $text, ParserConfig $config) : ParserResult
Parameters
$text : string

Text to parse (already preprocessed)

$config : ParserConfig

Parser configuration from language settings

Return values
ParserResult

Parsing result containing sentences and tokens

applyInitialTransformations()

Apply initial text transformations with character splitting.

protected applyInitialTransformations(string $text) : string
Parameters
$text : string

Raw text

Return values
string

Text after initial transformations

applyWordSplitting()

Apply word-splitting transformations.

protected applyWordSplitting(string $text, string $splitSentence, string $noSentenceEnd, string $termchar) : string
Parameters
$text : string

Text after initial transformations

$splitSentence : string

Sentence split regex

$noSentenceEnd : string

Exception patterns

$termchar : string

Word character regex

Return values
string

Preprocessed text

parseToResult()

Parse preprocessed text into a ParserResult.

protected parseToResult(string $text, bool $removeSpaces) : ParserResult
Parameters
$text : string

Preprocessed text

$removeSpaces : bool

Whether to remove spaces

Return values
ParserResult

Result with sentences and tokens


        
On this page

Search results