Documentation

StandardTextParser
in package

Standard text parsing with sentence splitting.

Handles language settings retrieval, text transformations, splitting, previewing, and database insertion for non-Japanese text.

Tags
since
3.0.0

Table of Contents

Methods

applyInitialTransformations()  : string
Apply initial text transformations (before display preview).
applyWordSplitting()  : string
Apply word-splitting transformations (after display preview).
displayStandardPreview()  : void
Display preview HTML for standard text.
getLanguageSettings()  : array{removeSpaces: string, splitSentence: string, noSentenceEnd: string, termchar: string, rtlScript: mixed, splitEachChar: bool}|null
Get language settings for parsing.
parseStandardToDatabase()  : void
Parse standard text and insert into temp_word_occurrences.
splitStandardSentences()  : array<string|int, string>
Split standard text into sentences (split-only mode).
quoteChars()  : string
Build the Unicode quotation-mark character class fragment used in regex patterns.

Methods

applyInitialTransformations()

Apply initial text transformations (before display preview).

public static applyInitialTransformations(string $text, bool $splitEachChar) : string
Parameters
$text : string

Raw text

$splitEachChar : bool

Whether to split each character

Return values
string

Text after initial transformations

applyWordSplitting()

Apply word-splitting transformations (after display preview).

public static applyWordSplitting(string $text, string $splitSentence, string $noSentenceEnd, string $termchar) : string
Parameters
$text : string

Text after initial transformations

$splitSentence : string

Sentence split regex

$noSentenceEnd : string

Exception patterns

$termchar : string

Word character regex

Tags
psalm-suppress

InvalidReturnType, InvalidReturnStatement, PossiblyNullArgument

Return values
string

Preprocessed text ready for parsing

displayStandardPreview()

Display preview HTML for standard text.

public static displayStandardPreview(string $text, bool $rtlScript) : void
Parameters
$text : string

Preprocessed text (after initial transformations)

$rtlScript : bool

Whether text is right-to-left

getLanguageSettings()

Get language settings for parsing.

public static getLanguageSettings(int $lid) : array{removeSpaces: string, splitSentence: string, noSentenceEnd: string, termchar: string, rtlScript: mixed, splitEachChar: bool}|null
Parameters
$lid : int

Language ID

Return values
array{removeSpaces: string, splitSentence: string, noSentenceEnd: string, termchar: string, rtlScript: mixed, splitEachChar: bool}|null

Language settings or null if not found

parseStandardToDatabase()

Parse standard text and insert into temp_word_occurrences.

public static parseStandardToDatabase(string $text, string $termchar, string $removeSpaces, bool $useMaxSeID) : void
Parameters
$text : string

Preprocessed text

$termchar : string

Word character regex

$removeSpaces : string

Space removal setting

$useMaxSeID : bool

Whether to query for max sentence ID

Tags
psalm-suppress

MixedArgument

splitStandardSentences()

Split standard text into sentences (split-only mode).

public static splitStandardSentences(string $text, string $removeSpaces) : array<string|int, string>
Parameters
$text : string

Preprocessed text

$removeSpaces : string

Space removal setting

Tags
psalm-return

non-empty-list

Return values
array<string|int, string>

Array of sentences

quoteChars()

Build the Unicode quotation-mark character class fragment used in regex patterns.

private static quoteChars() : string

Contains: RIGHT DOUBLE QUOTE, close-paren, LEFT/RIGHT SINGLE QUOTE, single angle quotes, LEFT DOUBLE QUOTE, DOUBLE LOW-9 QUOTE, guillemets, CJK brackets.

Return values
string

Character class content (without surrounding brackets)


        
On this page

Search results