Documentation

Font extends PDFObject
in package

Lwt

Class Font

Constants

COMMAND = 'c'
MISSING = '?'
OPERATOR = 'o'
TYPE = 't'

Properties

$recursionStack : array<string|int, mixed>: The recursion stack.
$addPositionWhitespace : bool
$config : Config|null
$content : string
$document : Document|null
$header : Header
$table : array<string|int, mixed>
$tableSizes : array<string|int, mixed>
$initializedEncodingByPdfObject : Encoding: In some PDF-files encoding could be referenced by object id but object itself does not contain `/Type /Encoding` in its dictionary. These objects wouldn't be initialized as Encoding in \Smalot\PdfParser\PDFObject::factory() during file parsing (they would be just PDFObject).
$uchrCache : array<string|int, mixed>: Caches results from uchr.

Methods

__construct() : mixed
calculateTextWidth() : float|null: Calculate text width with data from header 'Widths'. If width of character is not found then character is added to missing array.
decodeContent() : string: Decode given $text to "utf-8" encoded string.
decodeEntities() : string: Decode string with html entity encoded chars.
decodeHexadecimal() : string: Decode hexadecimal encoded string. If $add_braces is true result value would be wrapped by parentheses.
decodeOctal() : string: Decode string with octal-decoded chunks.
decodeText() : string: Decode text by commands array.
decodeUnicode() : string: Check if given string is Unicode text (by BOM); If true - decode to "utf-8" encoded string.
factory() : self
get() : Element|PDFObject|Header
getCommandsText() : array<string|int, mixed>: getCommandsText() expects the content of $text_part to be an already formatted, single-line command from a document stream.
getConfig() : Config|null
getContent() : string|null
getDetails() : array<string|int, mixed>
getDocument() : Document
getHeader() : Header|null
getName() : string
getText() : string: Returns the text content of a PDF as a string. Attempts to add whitespace for spacing and line-breaks where appropriate.
getTextArray() : array<string|int, mixed>: Returns the text content of a PDF as an array of strings. No extra whitespace is inserted besides what is actually encoded in the PDF text.
getType() : string
has() : bool
init() : mixed
loadTranslateTable() : array<string|int, mixed>: Init internal chars translation table by ToUnicode CMap.
setTable() : void: Set custom char translation table where: - key - integer character code; - value - "utf-8" encoded value;
translateChar() : string|bool
uchr() : string: Convert unicode character code to "utf-8" encoded string.
getFontSpaceLimit() : int
getUniqueId() : string: Returns unique id identifying the object.
createEncodingByPdfObject() : Encoding: Create Encoding instance by PDFObject instance (without init).
createInitializedEncodingByPdfObject() : Encoding: Create Encoding instance by PDFObject instance and init it.
decodeContentByAutodetectIfNecessary() : string|false: If string seems like "utf-8" encoded string do nothing and just return given string as is.
decodeContentByEncoding() : string|null: Decode content by any type of Encoding (dictionary's item) instance.
decodeContentByEncodingElement() : string|null: Decode content when $encoding (given by $this->get('Encoding')) is instance of Element.
decodeContentByEncodingEncoding() : string: Decode content when $encoding (given by $this->get('Encoding')) is instance of Encoding.
decodeContentByToUnicodeCMapOrDescendantFonts() : string: First try to decode $text by ToUnicode CMap.
getIconvEncodingNameOrNullByPdfEncodingName() : string|null: Convert PDF encoding name to iconv-known encoding name.
getInitializedEncodingByPdfObject() : Encoding: Returns already created or create a new one if not created before Encoding instance by PDFObject instance.

COMMAND


    public
        mixed
    COMMAND
    = 'c'

MISSING


    public
        mixed
    MISSING
    = '?'

OPERATOR


    public
        mixed
    OPERATOR
    = 'o'

TYPE


    public
        mixed
    TYPE
    = 't'

$recursionStack

The recursion stack.


        public
        static    array<string|int, mixed>
    $recursionStack
     = []

$addPositionWhitespace


        protected
            bool
    $addPositionWhitespace
     = false

$config


        protected
            Config|null
    $config

$content


        protected
            string
    $content

$document


        protected
            Document|null
    $document

$header


        protected
            Header
    $header

$table


        protected
            array<string|int, mixed>
    $table

$tableSizes


        protected
            array<string|int, mixed>
    $tableSizes

$initializedEncodingByPdfObject

In some PDF-files encoding could be referenced by object id but object itself does not contain `/Type /Encoding` in its dictionary. These objects wouldn't be initialized as Encoding in \Smalot\PdfParser\PDFObject::factory() during file parsing (they would be just PDFObject).


        private
            Encoding
    $initializedEncodingByPdfObject

Therefore, we create an instance of Encoding from them during decoding and cache this value in this property.

$uchrCache

Caches results from uchr.


        private
        static    array<string|int, mixed>
    $uchrCache
     = []

__construct()


    public
                    __construct(Document $document[, Header|null $header = null ][, string|null $content = null ][, Config|null $config = null ]) : mixed

Parameters

$document : Document
$header : Header|null = null
$content : string|null = null
$config : Config|null = null

calculateTextWidth()

Calculate text width with data from header 'Widths'. If width of character is not found then character is added to missing array.


    public
                    calculateTextWidth(string $text[, array<string|int, mixed>|null &$missing = null ]) : float|null

Parameters

$text : string
$missing : array<string|int, mixed>|null = null

Return values

float|null

decodeContent()

Decode given $text to "utf-8" encoded string.


    public
                    decodeContent(string $text[, bool &$unicode = null ]) : string

Parameters

$text : string
$unicode : bool = null: This parameter is deprecated and might be removed in a future release

Return values

string

decodeEntities()

Decode string with html entity encoded chars.


    public
            static        decodeEntities(string $text) : string

Parameters

$text : string

Return values

string

decodeHexadecimal()

Decode hexadecimal encoded string. If $add_braces is true result value would be wrapped by parentheses.


    public
            static        decodeHexadecimal(string $hexa[, bool $add_braces = false ]) : string

Parameters

$hexa : string
$add_braces : bool = false

Return values

string

decodeOctal()

Decode string with octal-decoded chunks.


    public
            static        decodeOctal(string $text) : string

Parameters

$text : string

Return values

string

decodeText()

Decode text by commands array.


    public
                    decodeText(array<string|int, mixed> $commands[, float $fontFactor = 4 ]) : string

Parameters

$commands : array<string|int, mixed>
$fontFactor : float = 4

Return values

string

decodeUnicode()

Check if given string is Unicode text (by BOM); If true - decode to "utf-8" encoded string.


    public
            static        decodeUnicode(string $text) : string

Otherwise - return text as is.

Parameters

$text : string

Return values

string

factory()


    public
            static        factory(Document $document, Header $header, string|null $content[, Config|null $config = null ]) : self

Parameters

$document : Document
$header : Header
$content : string|null
$config : Config|null = null

Return values

self

get()


    public
                    get(string $name) : Element|PDFObject|Header

Parameters

$name : string

Return values

Element|PDFObject|Header

getCommandsText()

getCommandsText() expects the content of $text_part to be an already formatted, single-line command from a document stream.


    public
                    getCommandsText(string $text_part[, int &$offset = 0 ]) : array<string|int, mixed>

The companion function getSectionsText() returns a document stream as an array of single commands for just this purpose. Because of this, the argument $offset is no longer used, and may be removed in a future PdfParser release.

A better name for this function would be getCommandText() since it now always works on just one command.

Parameters

$text_part : string
$offset : int = 0

Return values

array<string|int, mixed>

getConfig()


    public
                    getConfig() : Config|null

Return values

Config|null

getContent()


    public
                    getContent() : string|null

Return values

string|null

getDetails()


    public
                    getDetails([bool $deep = true ]) : array<string|int, mixed>

Parameters

$deep : bool = true

Return values

array<string|int, mixed>

getDocument()


    public
                    getDocument() : Document

Return values

Document

getHeader()


    public
                    getHeader() : Header|null

Return values

Header|null

getName()


    public
                    getName() : string

Return values

string

getText()

Returns the text content of a PDF as a string. Attempts to add whitespace for spacing and line-breaks where appropriate.


    public
                    getText([Page|null $page = null ]) : string

getText() leverages getTextArray() to get the content of the document, setting the addPositionWhitespace flag to true so whitespace is inserted in a logical way for reading by humans.

Parameters

$page : Page|null = null

Return values

string

getTextArray()

Returns the text content of a PDF as an array of strings. No extra whitespace is inserted besides what is actually encoded in the PDF text.


    public
                    getTextArray([Page|null $page = null ]) : array<string|int, mixed>

Parameters

$page : Page|null = null

Return values

array<string|int, mixed>

getType()


    public
                    getType() : string

Return values

string

has()


    public
                    has(string $name) : bool

Parameters

$name : string

Return values

bool

init()


    public
                    init() : mixed

loadTranslateTable()

Init internal chars translation table by ToUnicode CMap.


    public
                    loadTranslateTable() : array<string|int, mixed>

Return values

array<string|int, mixed>

setTable()

Set custom char translation table where: - key - integer character code; - value - "utf-8" encoded value;


    public
                    setTable(array<string|int, mixed> $table) : void

Parameters

$table : array<string|int, mixed>

translateChar()


    public
                    translateChar(string $char[, bool $use_default = true ]) : string|bool

Parameters

$char : string
$use_default : bool = true

Return values

string|bool

uchr()

Convert unicode character code to "utf-8" encoded string.


    public
            static        uchr(int|float $code) : string

Parameters

$code : int|float: Unicode character code. Will be casted to int internally!

Return values

string

getFontSpaceLimit()


    protected
                    getFontSpaceLimit() : int

Return values

int

getUniqueId()

Returns unique id identifying the object.


    protected
                    getUniqueId() : string

Return values

string

createEncodingByPdfObject()

Create Encoding instance by PDFObject instance (without init).


    private
                    createEncodingByPdfObject(PDFObject $PDFObject) : Encoding

Parameters

$PDFObject : PDFObject

Return values

Encoding

createInitializedEncodingByPdfObject()

Create Encoding instance by PDFObject instance and init it.


    private
                    createInitializedEncodingByPdfObject(PDFObject $PDFObject) : Encoding

Parameters

$PDFObject : PDFObject

Return values

Encoding

decodeContentByAutodetectIfNecessary()

If string seems like "utf-8" encoded string do nothing and just return given string as is.


    private
                    decodeContentByAutodetectIfNecessary(string $text) : string|false

Otherwise, interpret string as "Window-1252" encoded string.

Parameters

$text : string

Return values

string|false

decodeContentByEncoding()

Decode content by any type of Encoding (dictionary's item) instance.


    private
                    decodeContentByEncoding(string $text) : string|null

Parameters

$text : string

Return values

string|null

decodeContentByEncodingElement()

Decode content when $encoding (given by $this->get('Encoding')) is instance of Element.


    private
                    decodeContentByEncodingElement(string $text, Element $encoding) : string|null

Parameters

$text : string
$encoding : Element

Return values

string|null

decodeContentByEncodingEncoding()

Decode content when $encoding (given by $this->get('Encoding')) is instance of Encoding.


    private
                    decodeContentByEncodingEncoding(string $text, Encoding $encoding) : string

Parameters

$text : string
$encoding : Encoding

Return values

string

decodeContentByToUnicodeCMapOrDescendantFonts()

First try to decode $text by ToUnicode CMap.


    private
                    decodeContentByToUnicodeCMapOrDescendantFonts(string $text) : string

If char translation not found in ToUnicode CMap tries:

If DescendantFonts exists tries to decode char by one of that fonts.
- If have no success to decode by DescendantFonts interpret $text as a string with "Windows-1252" encoding.
If DescendantFonts does not exist just return "?" as decoded char.

Parameters

$text : string

Return values

string

getIconvEncodingNameOrNullByPdfEncodingName()

Convert PDF encoding name to iconv-known encoding name.


    private
                    getIconvEncodingNameOrNullByPdfEncodingName(string $pdfEncodingName) : string|null

Parameters

$pdfEncodingName : string

Return values

string|null

getInitializedEncodingByPdfObject()

Returns already created or create a new one if not created before Encoding instance by PDFObject instance.


    private
                    getInitializedEncodingByPdfObject(PDFObject $PDFObject) : Encoding

Parameters

$PDFObject : PDFObject

Return values

Encoding

Font extends PDFObject in package Lwt

Table of Contents

Constants

Properties

Methods

Constants

COMMAND

MISSING

OPERATOR

TYPE

Properties

$recursionStack

$addPositionWhitespace

$config

$content

$document

$header

$table

$tableSizes

$initializedEncodingByPdfObject

Tags

$uchrCache

Methods

__construct()

Parameters

calculateTextWidth()

Parameters

Return values

decodeContent()

Parameters

Return values

decodeEntities()

Parameters

Return values

decodeHexadecimal()

Parameters

Return values

decodeOctal()

Parameters

Return values

decodeText()

Parameters

Return values

decodeUnicode()

Parameters

Tags

Return values

factory()

Parameters

Return values

get()

Parameters

Return values

getCommandsText()

Parameters

Return values

getConfig()

Return values

getContent()

Return values

getDetails()

Parameters

Return values

getDocument()

Return values

getHeader()

Return values

getName()

Return values

getText()

Parameters

Return values

getTextArray()

Parameters

Tags

Return values

getType()

Return values

has()

Parameters

Font extends PDFObject
in package

Lwt