Token
in package
Represents a single token from text parsing.
A token can be either a word (learnable content) or a non-word (punctuation, whitespace, symbols). Tokens maintain their position within the text for proper reconstruction and display.
Tags
Table of Contents
Properties
- $isWord : bool
- $order : int
- $reading : string
- $sentenceIndex : int
- $text : string
- $wordCount : int
Methods
- __construct() : mixed
- Create a new token.
- getOrder() : int
- Get the order/position within the sentence.
- getReading() : string
- Get the phonetic reading.
- getSentenceIndex() : int
- Get the sentence index this token belongs to.
- getText() : string
- Get the token text content.
- getWordCount() : int
- Get the word count (for multi-word expressions).
- isWord() : bool
- Check if this token is a learnable word.
- nonWord() : self
- Create a non-word token (punctuation, whitespace, etc.).
- word() : self
- Create a word token.
Properties
$isWord
private
bool
$isWord
$order
private
int
$order
$reading
private
string
$reading
= ''
$sentenceIndex
private
int
$sentenceIndex
$text
private
string
$text
$wordCount
private
int
$wordCount
= 1
Methods
__construct()
Create a new token.
public
__construct(string $text, int $sentenceIndex, int $order, bool $isWord[, int $wordCount = 1 ][, string $reading = '' ]) : mixed
Parameters
- $text : string
-
The token text content
- $sentenceIndex : int
-
Index of the sentence this token belongs to (0-based)
- $order : int
-
Position of this token within its sentence (0-based)
- $isWord : bool
-
True if this is a learnable word, false for punctuation/whitespace
- $wordCount : int = 1
-
Number of words (1 for single word, >1 for multi-word expressions)
- $reading : string = ''
-
Optional phonetic reading (e.g., furigana for Japanese)
getOrder()
Get the order/position within the sentence.
public
getOrder() : int
Return values
int —Order within sentence (0-based)
getReading()
Get the phonetic reading.
public
getReading() : string
Return values
string —Phonetic reading or empty string if not available
getSentenceIndex()
Get the sentence index this token belongs to.
public
getSentenceIndex() : int
Return values
int —Sentence index (0-based)
getText()
Get the token text content.
public
getText() : string
Return values
string —Token text
getWordCount()
Get the word count (for multi-word expressions).
public
getWordCount() : int
Return values
int —Word count (1 for single words)
isWord()
Check if this token is a learnable word.
public
isWord() : bool
Return values
bool —True for words, false for punctuation/whitespace
nonWord()
Create a non-word token (punctuation, whitespace, etc.).
public
static nonWord(string $text, int $sentenceIndex, int $order) : self
Parameters
- $text : string
-
Token text
- $sentenceIndex : int
-
Sentence index
- $order : int
-
Order within sentence
Return values
self —New non-word token
word()
Create a word token.
public
static word(string $text, int $sentenceIndex, int $order[, string $reading = '' ]) : self
Parameters
- $text : string
-
Word text
- $sentenceIndex : int
-
Sentence index
- $order : int
-
Order within sentence
- $reading : string = ''
-
Optional phonetic reading
Return values
self —New word token