Documentation

StandardTextParser
in package

Lwt

Standard text parsing with sentence splitting.

Handles language settings retrieval, text transformations, splitting, previewing, and database insertion for non-Japanese text.

Methods

applyInitialTransformations() : string: Apply initial text transformations (before display preview).
applyWordSplitting() : string: Apply word-splitting transformations (after display preview).
displayStandardPreview() : void: Display preview HTML for standard text.
getLanguageSettings() : array{removeSpaces: string, splitSentence: string, noSentenceEnd: string, termchar: string, rtlScript: mixed, splitEachChar: bool}|null: Get language settings for parsing.
parseStandardToDatabase() : void: Parse standard text and insert into temp_word_occurrences.
splitStandardSentences() : array<string|int, string>: Split standard text into sentences (split-only mode).
quoteChars() : string: Build the Unicode quotation-mark character class fragment used in regex patterns.

applyInitialTransformations()

Apply initial text transformations (before display preview).


    public
            static        applyInitialTransformations(string $text, bool $splitEachChar) : string

Parameters

$text : string: Raw text
$splitEachChar : bool: Whether to split each character

Return values

string —

Text after initial transformations

applyWordSplitting()

Apply word-splitting transformations (after display preview).


    public
            static        applyWordSplitting(string $text, string $splitSentence, string $noSentenceEnd, string $termchar) : string

Parameters

$text : string: Text after initial transformations
$splitSentence : string: Sentence split regex
$noSentenceEnd : string: Exception patterns
$termchar : string: Word character regex

Return values

string —

Preprocessed text ready for parsing

displayStandardPreview()

Display preview HTML for standard text.


    public
            static        displayStandardPreview(string $text, bool $rtlScript) : void

Parameters

$text : string: Preprocessed text (after initial transformations)
$rtlScript : bool: Whether text is right-to-left

getLanguageSettings()

Get language settings for parsing.


    public
            static        getLanguageSettings(int $lid) : array{removeSpaces: string, splitSentence: string, noSentenceEnd: string, termchar: string, rtlScript: mixed, splitEachChar: bool}|null

Parameters

$lid : int: Language ID

Return values

array{removeSpaces: string, splitSentence: string, noSentenceEnd: string, termchar: string, rtlScript: mixed, splitEachChar: bool}|null —

Language settings or null if not found

parseStandardToDatabase()

Parse standard text and insert into temp_word_occurrences.


    public
            static        parseStandardToDatabase(string $text, string $termchar, string $removeSpaces, bool $useMaxSeID) : void

Parameters

$text : string: Preprocessed text
$termchar : string: Word character regex
$removeSpaces : string: Space removal setting
$useMaxSeID : bool: Whether to query for max sentence ID

splitStandardSentences()

Split standard text into sentences (split-only mode).


    public
            static        splitStandardSentences(string $text, string $removeSpaces) : array<string|int, string>

Parameters

$text : string: Preprocessed text
$removeSpaces : string: Space removal setting

Return values

array<string|int, string> —

Array of sentences

quoteChars()

Build the Unicode quotation-mark character class fragment used in regex patterns.


    private
            static        quoteChars() : string

Contains: RIGHT DOUBLE QUOTE, close-paren, LEFT/RIGHT SINGLE QUOTE, single angle quotes, LEFT DOUBLE QUOTE, DOUBLE LOW-9 QUOTE, guillemets, CJK brackets.

Return values

string —

Character class content (without surrounding brackets)

Documentation

StandardTextParser
in package

Lwt

Tags

Table of Contents

Methods

Methods

applyInitialTransformations()

Parameters

Return values

applyWordSplitting()

Parameters

Tags

Return values

displayStandardPreview()

Parameters

getLanguageSettings()

Parameters

Return values

parseStandardToDatabase()

Parameters

Tags

splitStandardSentences()

Parameters

Tags

Return values

quoteChars()

Return values

Search results

StandardTextParser in package Lwt

Tags

Table of Contents

Methods

Methods

applyInitialTransformations()

Parameters

Return values

applyWordSplitting()

Parameters

Tags

Return values

displayStandardPreview()

Parameters

getLanguageSettings()

Parameters

Return values

parseStandardToDatabase()

Parameters

Tags

splitStandardSentences()

Parameters

Tags

Return values

quoteChars()

Return values

StandardTextParser
in package

Lwt