Documentation

PHP extends Tokenizer
in package

Table of Contents

Constants

T_STRING_CONTEXTS  = [T_OBJECT_OPERATOR => true, T_NULLSAFE_OBJECT_OPERATOR => true, T_FUNCTION => true, T_CLASS => true, T_INTERFACE => true, T_TRAIT => true, T_ENUM => true, T_ENUM_CASE => true, T_EXTENDS => true, T_IMPLEMENTS => true, T_ATTRIBUTE => true, T_NEW => true, T_CONST => true, T_NS_SEPARATOR => true, T_USE => true, T_NAMESPACE => true, T_PAAMAYIM_NEKUDOTAYIM => true, T_GOTO => true]
Contexts in which keywords should always be tokenized as T_STRING.
PHP_LABEL_REGEX  = '`^[a-zA-Z_\x80-\xff][a-zA-Z0-9_\x80-\xff]*$`'
Regular expression to check if a given identifier name is valid for use in PHP.

Properties

$endScopeTokens  : array<int|string, int|string>
A list of tokens that end the scope.
$ignoredLines  : array<string|int, mixed>
A list of lines being ignored due to error suppression comments.
$knownLengths  : array<string|int, int>
Known lengths of tokens.
$scopeOpeners  : array<string|int, mixed>
A list of tokens that are allowed to open a scope.
$config  : Config
The config data for the run.
$eolChar  : string
The EOL char used in the content.
$numTokens  : int
The number of tokens in the tokens array.
$tokens  : array<string|int, mixed>
A token-based representation of the content.
$tstringContexts  : array<string|int, mixed>
Contexts in which keywords should always be tokenized as T_STRING.
$resolveTokenCache  : array<string|int, mixed>
A cache of different token types, resolved into arrays.

Methods

__construct()  : void
Initialise and run the tokenizer.
getTokens()  : array<string|int, mixed>
Gets the array of tokens.
replaceTabsInToken()  : void
Replaces tabs in original token content with spaces.
resolveSimpleToken()  : array<string|int, mixed>
Converts simple tokens into a format that conforms to complex tokens produced by token_get_all().
standardiseToken()  : array<string|int, mixed>
Takes a token produced from <code>token_get_all()</code> and produces a more uniform token.
isMinifiedContent()  : bool
Checks the content to see if it looks minified.
processAdditional()  : void
Performs additional processing after main tokenizing.
tokenize()  : array<string|int, mixed>
Creates an array of tokens when given some PHP code.
createAttributesNestingMap()  : void
Creates a map for the attributes tokens that surround other tokens.
findCloser()  : int|null
Finds a "closer" token (closing parenthesis or square bracket for example) Handle parenthesis balancing while searching for closing token
parsePhpAttribute()  : array<string|int, mixed>
PHP 8 attributes parser for PHP < 8 Handles single-line and multiline attributes.

Constants

T_STRING_CONTEXTS

Contexts in which keywords should always be tokenized as T_STRING.

protected array<int|string, true> T_STRING_CONTEXTS = [T_OBJECT_OPERATOR => true, T_NULLSAFE_OBJECT_OPERATOR => true, T_FUNCTION => true, T_CLASS => true, T_INTERFACE => true, T_TRAIT => true, T_ENUM => true, T_ENUM_CASE => true, T_EXTENDS => true, T_IMPLEMENTS => true, T_ATTRIBUTE => true, T_NEW => true, T_CONST => true, T_NS_SEPARATOR => true, T_USE => true, T_NAMESPACE => true, T_PAAMAYIM_NEKUDOTAYIM => true, T_GOTO => true]

PHP_LABEL_REGEX

Regular expression to check if a given identifier name is valid for use in PHP.

private string PHP_LABEL_REGEX = '`^[a-zA-Z_\x80-\xff][a-zA-Z0-9_\x80-\xff]*$`'

Properties

$endScopeTokens

A list of tokens that end the scope.

public array<int|string, int|string> $endScopeTokens = [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET, T_ENDIF => T_ENDIF, T_ENDFOR => T_ENDFOR, T_ENDFOREACH => T_ENDFOREACH, T_ENDWHILE => T_ENDWHILE, T_ENDSWITCH => T_ENDSWITCH, T_ENDDECLARE => T_ENDDECLARE, T_BREAK => T_BREAK, T_END_HEREDOC => T_END_HEREDOC, T_END_NOWDOC => T_END_NOWDOC]

This array is just a unique collection of the end tokens from the scopeOpeners array. The data is duplicated here to save time during parsing of the file.

$ignoredLines

A list of lines being ignored due to error suppression comments.

public array<string|int, mixed> $ignoredLines = []

$knownLengths

Known lengths of tokens.

public array<string|int, int> $knownLengths = [T_ABSTRACT => 8, T_AND_EQUAL => 2, T_ARRAY => 5, T_AS => 2, T_BOOLEAN_AND => 2, T_BOOLEAN_OR => 2, T_BREAK => 5, T_CALLABLE => 8, T_CASE => 4, T_CATCH => 5, T_CLASS => 5, T_CLASS_C => 9, T_CLONE => 5, T_CONCAT_EQUAL => 2, T_CONST => 5, T_CONTINUE => 8, T_CURLY_OPEN => 2, T_DEC => 2, T_DECLARE => 7, T_DEFAULT => 7, T_DIR => 7, T_DIV_EQUAL => 2, T_DO => 2, T_DOLLAR_OPEN_CURLY_BRACES => 2, T_DOUBLE_ARROW => 2, T_DOUBLE_COLON => 2, T_ECHO => 4, T_ELLIPSIS => 3, T_ELSE => 4, T_ELSEIF => 6, T_EMPTY => 5, T_ENDDECLARE => 10, T_ENDFOR => 6, T_ENDFOREACH => 10, T_ENDIF => 5, T_ENDSWITCH => 9, T_ENDWHILE => 8, T_ENUM => 4, T_ENUM_CASE => 4, T_EVAL => 4, T_EXTENDS => 7, T_FILE => 8, T_FINAL => 5, T_FINALLY => 7, T_FN => 2, T_FOR => 3, T_FOREACH => 7, T_FUNCTION => 8, T_FUNC_C => 12, T_GLOBAL => 6, T_GOTO => 4, T_GOTO_COLON => 1, T_HALT_COMPILER => 15, T_IF => 2, T_IMPLEMENTS => 10, T_INC => 2, T_INCLUDE => 7, T_INCLUDE_ONCE => 12, T_INSTANCEOF => 10, T_INSTEADOF => 9, T_INTERFACE => 9, T_ISSET => 5, T_IS_EQUAL => 2, T_IS_GREATER_OR_EQUAL => 2, T_IS_IDENTICAL => 3, T_IS_NOT_EQUAL => 2, T_IS_NOT_IDENTICAL => 3, T_IS_SMALLER_OR_EQUAL => 2, T_LINE => 8, T_LIST => 4, T_LOGICAL_AND => 3, T_LOGICAL_OR => 2, T_LOGICAL_XOR => 3, T_MATCH => 5, T_MATCH_ARROW => 2, T_MATCH_DEFAULT => 7, T_METHOD_C => 10, T_MINUS_EQUAL => 2, T_POW_EQUAL => 3, T_MOD_EQUAL => 2, T_MUL_EQUAL => 2, T_NAMESPACE => 9, T_NS_C => 13, T_NS_SEPARATOR => 1, T_NEW => 3, T_NULLSAFE_OBJECT_OPERATOR => 3, T_OBJECT_OPERATOR => 2, T_OPEN_TAG_WITH_ECHO => 3, T_OR_EQUAL => 2, T_PLUS_EQUAL => 2, T_PRINT => 5, T_PRIVATE => 7, T_PRIVATE_SET => 12, T_PUBLIC => 6, T_PUBLIC_SET => 11, T_PROTECTED => 9, T_PROTECTED_SET => 14, T_READONLY => 8, T_REQUIRE => 7, T_REQUIRE_ONCE => 12, T_RETURN => 6, T_STATIC => 6, T_SWITCH => 6, T_THROW => 5, T_TRAIT => 5, T_TRAIT_C => 9, T_TRY => 3, T_UNSET => 5, T_USE => 3, T_VAR => 3, T_WHILE => 5, T_XOR_EQUAL => 2, T_YIELD => 5, T_OPEN_CURLY_BRACKET => 1, T_CLOSE_CURLY_BRACKET => 1, T_OPEN_SQUARE_BRACKET => 1, T_CLOSE_SQUARE_BRACKET => 1, T_OPEN_PARENTHESIS => 1, T_CLOSE_PARENTHESIS => 1, T_COLON => 1, T_STRING_CONCAT => 1, T_INLINE_THEN => 1, T_INLINE_ELSE => 1, T_NULLABLE => 1, T_NULL => 4, T_FALSE => 5, T_TRUE => 4, T_SEMICOLON => 1, T_EQUAL => 1, T_MULTIPLY => 1, T_DIVIDE => 1, T_PLUS => 1, T_MINUS => 1, T_MODULUS => 1, T_POW => 2, T_SPACESHIP => 3, T_COALESCE => 2, T_COALESCE_EQUAL => 3, T_BITWISE_AND => 1, T_BITWISE_OR => 1, T_BITWISE_XOR => 1, T_SL => 2, T_SR => 2, T_SL_EQUAL => 3, T_SR_EQUAL => 3, T_GREATER_THAN => 1, T_LESS_THAN => 1, T_BOOLEAN_NOT => 1, T_SELF => 4, T_PARENT => 6, T_COMMA => 1, T_CLOSURE => 8, T_BACKTICK => 1, T_OPEN_SHORT_ARRAY => 1, T_CLOSE_SHORT_ARRAY => 1, T_TYPE_UNION => 1, T_TYPE_INTERSECTION => 1, T_TYPE_OPEN_PARENTHESIS => 1, T_TYPE_CLOSE_PARENTHESIS => 1]

$scopeOpeners

A list of tokens that are allowed to open a scope.

public array<string|int, mixed> $scopeOpeners = [T_IF => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET, T_COLON => T_COLON], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET, T_ENDIF => T_ENDIF, T_ELSE => T_ELSE, T_ELSEIF => T_ELSEIF], 'strict' => false, 'shared' => false, 'with' => [T_ELSE => T_ELSE, T_ELSEIF => T_ELSEIF]], T_TRY => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET], 'strict' => true, 'shared' => false, 'with' => []], T_CATCH => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET], 'strict' => true, 'shared' => false, 'with' => []], T_FINALLY => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET], 'strict' => true, 'shared' => false, 'with' => []], T_ELSE => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET, T_COLON => T_COLON], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET, T_ENDIF => T_ENDIF], 'strict' => false, 'shared' => false, 'with' => [T_IF => T_IF, T_ELSEIF => T_ELSEIF]], T_ELSEIF => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET, T_COLON => T_COLON], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET, T_ENDIF => T_ENDIF, T_ELSE => T_ELSE, T_ELSEIF => T_ELSEIF], 'strict' => false, 'shared' => false, 'with' => [T_IF => T_IF, T_ELSE => T_ELSE]], T_FOR => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET, T_COLON => T_COLON], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET, T_ENDFOR => T_ENDFOR], 'strict' => false, 'shared' => false, 'with' => []], T_FOREACH => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET, T_COLON => T_COLON], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET, T_ENDFOREACH => T_ENDFOREACH], 'strict' => false, 'shared' => false, 'with' => []], T_INTERFACE => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET], 'strict' => true, 'shared' => false, 'with' => []], T_FUNCTION => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET], 'strict' => true, 'shared' => false, 'with' => []], T_CLASS => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET], 'strict' => true, 'shared' => false, 'with' => []], T_TRAIT => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET], 'strict' => true, 'shared' => false, 'with' => []], T_ENUM => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET], 'strict' => true, 'shared' => false, 'with' => []], T_USE => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET], 'strict' => false, 'shared' => false, 'with' => []], T_DECLARE => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET, T_COLON => T_COLON], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET, T_ENDDECLARE => T_ENDDECLARE], 'strict' => false, 'shared' => false, 'with' => []], T_NAMESPACE => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET], 'strict' => false, 'shared' => false, 'with' => []], T_WHILE => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET, T_COLON => T_COLON], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET, T_ENDWHILE => T_ENDWHILE], 'strict' => false, 'shared' => false, 'with' => []], T_DO => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET], 'strict' => true, 'shared' => false, 'with' => []], T_SWITCH => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET, T_COLON => T_COLON], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET, T_ENDSWITCH => T_ENDSWITCH], 'strict' => true, 'shared' => false, 'with' => []], T_CASE => ['start' => [T_COLON => T_COLON, T_SEMICOLON => T_SEMICOLON, T_CLOSE_TAG => T_CLOSE_TAG], 'end' => [T_BREAK => T_BREAK, T_RETURN => T_RETURN, T_CONTINUE => T_CONTINUE, T_THROW => T_THROW, T_EXIT => T_EXIT, T_GOTO => T_GOTO], 'strict' => true, 'shared' => true, 'with' => [T_DEFAULT => T_DEFAULT, T_CASE => T_CASE, T_SWITCH => T_SWITCH]], T_DEFAULT => ['start' => [T_COLON => T_COLON, T_SEMICOLON => T_SEMICOLON, T_CLOSE_TAG => T_CLOSE_TAG], 'end' => [T_BREAK => T_BREAK, T_RETURN => T_RETURN, T_CONTINUE => T_CONTINUE, T_THROW => T_THROW, T_EXIT => T_EXIT, T_GOTO => T_GOTO], 'strict' => true, 'shared' => true, 'with' => [T_CASE => T_CASE, T_SWITCH => T_SWITCH]], T_MATCH => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET], 'strict' => true, 'shared' => false, 'with' => []], T_START_HEREDOC => ['start' => [T_START_HEREDOC => T_START_HEREDOC], 'end' => [T_END_HEREDOC => T_END_HEREDOC], 'strict' => true, 'shared' => false, 'with' => []], T_START_NOWDOC => ['start' => [T_START_NOWDOC => T_START_NOWDOC], 'end' => [T_END_NOWDOC => T_END_NOWDOC], 'strict' => true, 'shared' => false, 'with' => []]]

This array also contains information about what kind of token the scope opener uses to open and close the scope, if the token strictly requires an opener, if the token can share a scope closer, and who it can be shared with. An example of a token that shares a scope closer is a CASE scope.

$eolChar

The EOL char used in the content.

protected string $eolChar = ''

$numTokens

The number of tokens in the tokens array.

protected int $numTokens = 0

$tokens

A token-based representation of the content.

protected array<string|int, mixed> $tokens = []

$tstringContexts

Contexts in which keywords should always be tokenized as T_STRING.

Use the PHP::T_STRING_CONTEXTS constant instead.

protected array<string|int, mixed> $tstringContexts = self::T_STRING_CONTEXTS

$resolveTokenCache

A cache of different token types, resolved into arrays.

private static array<string|int, mixed> $resolveTokenCache = []
Tags
see
standardiseToken()

Methods

__construct()

Initialise and run the tokenizer.

public __construct(string $content, Config|null $config[, string $eolChar = '\n' ]) : void
Parameters
$content : string

The content to tokenize.

$config : Config|null

The config data for the run.

$eolChar : string = '\n'

The EOL char used in the content.

Tags
throws
TokenizerException

If the file appears to be minified.

getTokens()

Gets the array of tokens.

public getTokens() : array<string|int, mixed>
Return values
array<string|int, mixed>

replaceTabsInToken()

Replaces tabs in original token content with spaces.

public replaceTabsInToken(array<string|int, mixed> &$token[, string $prefix = ' ' ][, string $padding = ' ' ][, int|null $tabWidth = null ]) : void

Each tab can represent between 1 and $config->tabWidth spaces, so this cannot be a straight string replace. The original content is placed into an orig_content index and the new token length is also set in the length index.

Parameters
$token : array<string|int, mixed>

The token to replace tabs inside.

$prefix : string = ' '

The character to use to represent the start of a tab.

$padding : string = ' '

The character to use to represent the end of a tab.

$tabWidth : int|null = null

The number of spaces each tab represents.

resolveSimpleToken()

Converts simple tokens into a format that conforms to complex tokens produced by token_get_all().

public static resolveSimpleToken(string $token) : array<string|int, mixed>

Simple tokens are tokens that are not in array form when produced from token_get_all().

Parameters
$token : string

The simple token to convert.

Return values
array<string|int, mixed>

The new token in array format.

standardiseToken()

Takes a token produced from <code>token_get_all()</code> and produces a more uniform token.

public static standardiseToken(string|array<string|int, mixed> $token) : array<string|int, mixed>
Parameters
$token : string|array<string|int, mixed>

The token to convert.

Return values
array<string|int, mixed>

The new token.

isMinifiedContent()

Checks the content to see if it looks minified.

protected isMinifiedContent(string $content[, string $eolChar = '\n' ]) : bool
Parameters
$content : string

The content to tokenize.

$eolChar : string = '\n'

The EOL char used in the content.

Return values
bool

processAdditional()

Performs additional processing after main tokenizing.

protected processAdditional() : void

This additional processing checks for CASE statements that are using curly braces for scope openers and closers. It also turns some T_FUNCTION tokens into T_CLOSURE when they are not standard function definitions. It also detects short array syntax and converts those square brackets into new tokens. It also corrects some usage of the static and class keywords. It also assigns tokens to function return types.

tokenize()

Creates an array of tokens when given some PHP code.

protected tokenize(string $code) : array<string|int, mixed>

Starts by using token_get_all() but does a lot of extra processing to insert information about the context of the token.

Parameters
$code : string

The code to tokenize.

Return values
array<string|int, mixed>

createAttributesNestingMap()

Creates a map for the attributes tokens that surround other tokens.

private createAttributesNestingMap() : void

findCloser()

Finds a "closer" token (closing parenthesis or square bracket for example) Handle parenthesis balancing while searching for closing token

private findCloser(array<string|int, mixed> &$tokens, int $start, string|array<string|int, string> $openerTokens, string $closerChar) : int|null
Parameters
$tokens : array<string|int, mixed>

The list of tokens to iterate searching the closing token (as returned by token_get_all).

$start : int

The starting position.

$openerTokens : string|array<string|int, string>

The opening character.

$closerChar : string

The closing character.

Return values
int|null

The position of the closing token, if found. NULL otherwise.

parsePhpAttribute()

PHP 8 attributes parser for PHP < 8 Handles single-line and multiline attributes.

private parsePhpAttribute(array<string|int, mixed> &$tokens, int $stackPtr) : array<string|int, mixed>
Parameters
$tokens : array<string|int, mixed>

The original array of tokens (as returned by token_get_all).

$stackPtr : int

The current position in token array.

Return values
array<string|int, mixed>

The array of parsed attribute tokens


        
On this page

Search results