RegexParser
in package
implements
ParserInterface
Standard regex-based parser for most languages.
Uses regular expressions to identify word boundaries and sentence endings. Suitable for space-separated languages like English, French, German, etc.
Tags
Table of Contents
Interfaces
- ParserInterface
- Interface for text parsers that tokenize text into words and sentences.
Properties
Methods
- __construct() : mixed
- getAvailabilityMessage() : string
- Get a description of why this parser might not be available.
- getName() : string
- Get human-readable name for UI display.
- getType() : string
- Get unique identifier for this parser type.
- isAvailable() : bool
- Check if this parser is available on the current system.
- parse() : ParserResult
- Parse text into a structured result with sentences and tokens.
- applyInitialTransformations() : string
- Apply initial text transformations.
- applyWordSplitting() : string
- Apply word-splitting transformations.
- parseToResult() : ParserResult
- Parse preprocessed text into a ParserResult.
Properties
$parsingService
private
TextParsingService
$parsingService
Methods
__construct()
public
__construct([TextParsingService|null $parsingService = null ]) : mixed
Parameters
- $parsingService : TextParsingService|null = null
getAvailabilityMessage()
Get a description of why this parser might not be available.
public
getAvailabilityMessage() : string
Return values
string —Description of missing dependencies or empty if available
getName()
Get human-readable name for UI display.
public
getName() : string
Return values
string —Human-readable parser name
getType()
Get unique identifier for this parser type.
public
getType() : string
Return values
string —Parser type identifier
isAvailable()
Check if this parser is available on the current system.
public
isAvailable() : bool
Return values
bool —True if parser can be used, false otherwise
parse()
Parse text into a structured result with sentences and tokens.
public
parse(string $text, ParserConfig $config) : ParserResult
Parameters
- $text : string
-
Text to parse (already preprocessed)
- $config : ParserConfig
-
Parser configuration from language settings
Return values
ParserResult —Parsing result containing sentences and tokens
applyInitialTransformations()
Apply initial text transformations.
protected
applyInitialTransformations(string $text) : string
Normalizes text by marking paragraphs and collapsing whitespace.
Parameters
- $text : string
-
Raw text
Return values
string —Text after initial transformations
applyWordSplitting()
Apply word-splitting transformations.
protected
applyWordSplitting(string $text, string $splitSentence, string $noSentenceEnd, string $termchar) : string
Uses regex patterns to identify word and sentence boundaries.
Parameters
- $text : string
-
Text after initial transformations
- $splitSentence : string
-
Sentence split regex
- $noSentenceEnd : string
-
Exception patterns
- $termchar : string
-
Word character regex
Return values
string —Preprocessed text with \r for sentence breaks and \n for token breaks
parseToResult()
Parse preprocessed text into a ParserResult.
protected
parseToResult(string $text, bool $removeSpaces) : ParserResult
Parameters
- $text : string
-
Preprocessed text with \r and \n markers
- $removeSpaces : bool
-
Whether to remove spaces
Return values
ParserResult —Result with sentences and tokens