Documentation

RegexParser implements ParserInterface

Standard regex-based parser for most languages.

Uses regular expressions to identify word boundaries and sentence endings. Suitable for space-separated languages like English, French, German, etc.

Tags
since
3.0.0

Table of Contents

Interfaces

ParserInterface
Interface for text parsers that tokenize text into words and sentences.

Properties

$parsingService  : TextParsingService

Methods

__construct()  : mixed
getAvailabilityMessage()  : string
Get a description of why this parser might not be available.
getName()  : string
Get human-readable name for UI display.
getType()  : string
Get unique identifier for this parser type.
isAvailable()  : bool
Check if this parser is available on the current system.
parse()  : ParserResult
Parse text into a structured result with sentences and tokens.
applyInitialTransformations()  : string
Apply initial text transformations.
applyWordSplitting()  : string
Apply word-splitting transformations.
parseToResult()  : ParserResult
Parse preprocessed text into a ParserResult.

Properties

Methods

getAvailabilityMessage()

Get a description of why this parser might not be available.

public getAvailabilityMessage() : string
Return values
string

Description of missing dependencies or empty if available

getName()

Get human-readable name for UI display.

public getName() : string
Return values
string

Human-readable parser name

getType()

Get unique identifier for this parser type.

public getType() : string
Return values
string

Parser type identifier

isAvailable()

Check if this parser is available on the current system.

public isAvailable() : bool
Return values
bool

True if parser can be used, false otherwise

parse()

Parse text into a structured result with sentences and tokens.

public parse(string $text, ParserConfig $config) : ParserResult
Parameters
$text : string

Text to parse (already preprocessed)

$config : ParserConfig

Parser configuration from language settings

Return values
ParserResult

Parsing result containing sentences and tokens

applyInitialTransformations()

Apply initial text transformations.

protected applyInitialTransformations(string $text) : string

Normalizes text by marking paragraphs and collapsing whitespace.

Parameters
$text : string

Raw text

Return values
string

Text after initial transformations

applyWordSplitting()

Apply word-splitting transformations.

protected applyWordSplitting(string $text, string $splitSentence, string $noSentenceEnd, string $termchar) : string

Uses regex patterns to identify word and sentence boundaries.

Parameters
$text : string

Text after initial transformations

$splitSentence : string

Sentence split regex

$noSentenceEnd : string

Exception patterns

$termchar : string

Word character regex

Return values
string

Preprocessed text with \r for sentence breaks and \n for token breaks

parseToResult()

Parse preprocessed text into a ParserResult.

protected parseToResult(string $text, bool $removeSpaces) : ParserResult
Parameters
$text : string

Preprocessed text with \r and \n markers

$removeSpaces : bool

Whether to remove spaces

Return values
ParserResult

Result with sentences and tokens


        
On this page

Search results