Documentation

Font extends PDFObject
in package

Class Font

Table of Contents

Constants

COMMAND  = 'c'
MISSING  = '?'
OPERATOR  = 'o'
TYPE  = 't'

Properties

$recursionStack  : array<string|int, mixed>
The recursion stack.
$addPositionWhitespace  : bool
$config  : Config|null
$content  : string
$document  : Document|null
$header  : Header
$table  : array<string|int, mixed>
$tableSizes  : array<string|int, mixed>
$initializedEncodingByPdfObject  : Encoding
In some PDF-files encoding could be referenced by object id but object itself does not contain `/Type /Encoding` in its dictionary. These objects wouldn't be initialized as Encoding in \Smalot\PdfParser\PDFObject::factory() during file parsing (they would be just PDFObject).
$uchrCache  : array<string|int, mixed>
Caches results from uchr.

Methods

__construct()  : mixed
calculateTextWidth()  : float|null
Calculate text width with data from header 'Widths'. If width of character is not found then character is added to missing array.
decodeContent()  : string
Decode given $text to "utf-8" encoded string.
decodeEntities()  : string
Decode string with html entity encoded chars.
decodeHexadecimal()  : string
Decode hexadecimal encoded string. If $add_braces is true result value would be wrapped by parentheses.
decodeOctal()  : string
Decode string with octal-decoded chunks.
decodeText()  : string
Decode text by commands array.
decodeUnicode()  : string
Check if given string is Unicode text (by BOM); If true - decode to "utf-8" encoded string.
factory()  : self
get()  : Element|PDFObject|Header
getCommandsText()  : array<string|int, mixed>
getCommandsText() expects the content of $text_part to be an already formatted, single-line command from a document stream.
getConfig()  : Config|null
getContent()  : string|null
getDetails()  : array<string|int, mixed>
getDocument()  : Document
getHeader()  : Header|null
getName()  : string
getText()  : string
Returns the text content of a PDF as a string. Attempts to add whitespace for spacing and line-breaks where appropriate.
getTextArray()  : array<string|int, mixed>
Returns the text content of a PDF as an array of strings. No extra whitespace is inserted besides what is actually encoded in the PDF text.
getType()  : string
has()  : bool
init()  : mixed
loadTranslateTable()  : array<string|int, mixed>
Init internal chars translation table by ToUnicode CMap.
setTable()  : void
Set custom char translation table where: - key - integer character code; - value - "utf-8" encoded value;
translateChar()  : string|bool
uchr()  : string
Convert unicode character code to "utf-8" encoded string.
getFontSpaceLimit()  : int
getUniqueId()  : string
Returns unique id identifying the object.
createEncodingByPdfObject()  : Encoding
Create Encoding instance by PDFObject instance (without init).
createInitializedEncodingByPdfObject()  : Encoding
Create Encoding instance by PDFObject instance and init it.
decodeContentByAutodetectIfNecessary()  : string|false
If string seems like "utf-8" encoded string do nothing and just return given string as is.
decodeContentByEncoding()  : string|null
Decode content by any type of Encoding (dictionary's item) instance.
decodeContentByEncodingElement()  : string|null
Decode content when $encoding (given by $this->get('Encoding')) is instance of Element.
decodeContentByEncodingEncoding()  : string
Decode content when $encoding (given by $this->get('Encoding')) is instance of Encoding.
decodeContentByToUnicodeCMapOrDescendantFonts()  : string
First try to decode $text by ToUnicode CMap.
getIconvEncodingNameOrNullByPdfEncodingName()  : string|null
Convert PDF encoding name to iconv-known encoding name.
getInitializedEncodingByPdfObject()  : Encoding
Returns already created or create a new one if not created before Encoding instance by PDFObject instance.

Constants

MISSING

public mixed MISSING = '?'

Properties

$recursionStack

The recursion stack.

public static array<string|int, mixed> $recursionStack = []

$addPositionWhitespace

protected bool $addPositionWhitespace = false

$table

protected array<string|int, mixed> $table

$tableSizes

protected array<string|int, mixed> $tableSizes

$initializedEncodingByPdfObject

In some PDF-files encoding could be referenced by object id but object itself does not contain `/Type /Encoding` in its dictionary. These objects wouldn't be initialized as Encoding in \Smalot\PdfParser\PDFObject::factory() during file parsing (they would be just PDFObject).

private Encoding $initializedEncodingByPdfObject

Therefore, we create an instance of Encoding from them during decoding and cache this value in this property.

Tags
see
https://github.com/smalot/pdfparser/pull/500

$uchrCache

Caches results from uchr.

private static array<string|int, mixed> $uchrCache = []

Methods

__construct()

public __construct(Document $document[, Header|null $header = null ][, string|null $content = null ][, Config|null $config = null ]) : mixed
Parameters
$document : Document
$header : Header|null = null
$content : string|null = null
$config : Config|null = null

calculateTextWidth()

Calculate text width with data from header 'Widths'. If width of character is not found then character is added to missing array.

public calculateTextWidth(string $text[, array<string|int, mixed>|null &$missing = null ]) : float|null
Parameters
$text : string
$missing : array<string|int, mixed>|null = null
Return values
float|null

decodeContent()

Decode given $text to "utf-8" encoded string.

public decodeContent(string $text[, bool &$unicode = null ]) : string
Parameters
$text : string
$unicode : bool = null

This parameter is deprecated and might be removed in a future release

Return values
string

decodeEntities()

Decode string with html entity encoded chars.

public static decodeEntities(string $text) : string
Parameters
$text : string
Return values
string

decodeHexadecimal()

Decode hexadecimal encoded string. If $add_braces is true result value would be wrapped by parentheses.

public static decodeHexadecimal(string $hexa[, bool $add_braces = false ]) : string
Parameters
$hexa : string
$add_braces : bool = false
Return values
string

decodeOctal()

Decode string with octal-decoded chunks.

public static decodeOctal(string $text) : string
Parameters
$text : string
Return values
string

decodeText()

Decode text by commands array.

public decodeText(array<string|int, mixed> $commands[, float $fontFactor = 4 ]) : string
Parameters
$commands : array<string|int, mixed>
$fontFactor : float = 4
Return values
string

decodeUnicode()

Check if given string is Unicode text (by BOM); If true - decode to "utf-8" encoded string.

public static decodeUnicode(string $text) : string

Otherwise - return text as is.

Parameters
$text : string
Tags
todo

Rename in next major release to make the name correspond to reality (for ex. decodeIfUnicode())

Return values
string

factory()

public static factory(Document $document, Header $header, string|null $content[, Config|null $config = null ]) : self
Parameters
$document : Document
$header : Header
$content : string|null
$config : Config|null = null
Return values
self

getCommandsText()

getCommandsText() expects the content of $text_part to be an already formatted, single-line command from a document stream.

public getCommandsText(string $text_part[, int &$offset = 0 ]) : array<string|int, mixed>

The companion function getSectionsText() returns a document stream as an array of single commands for just this purpose. Because of this, the argument $offset is no longer used, and may be removed in a future PdfParser release.

A better name for this function would be getCommandText() since it now always works on just one command.

Parameters
$text_part : string
$offset : int = 0
Return values
array<string|int, mixed>

getContent()

public getContent() : string|null
Return values
string|null

getDetails()

public getDetails([bool $deep = true ]) : array<string|int, mixed>
Parameters
$deep : bool = true
Return values
array<string|int, mixed>

getName()

public getName() : string
Return values
string

getText()

Returns the text content of a PDF as a string. Attempts to add whitespace for spacing and line-breaks where appropriate.

public getText([Page|null $page = null ]) : string

getText() leverages getTextArray() to get the content of the document, setting the addPositionWhitespace flag to true so whitespace is inserted in a logical way for reading by humans.

Parameters
$page : Page|null = null
Return values
string

getTextArray()

Returns the text content of a PDF as an array of strings. No extra whitespace is inserted besides what is actually encoded in the PDF text.

public getTextArray([Page|null $page = null ]) : array<string|int, mixed>
Parameters
$page : Page|null = null
Tags
throws
Exception
Return values
array<string|int, mixed>

getType()

public getType() : string
Return values
string

has()

public has(string $name) : bool
Parameters
$name : string
Return values
bool

init()

public init() : mixed

loadTranslateTable()

Init internal chars translation table by ToUnicode CMap.

public loadTranslateTable() : array<string|int, mixed>
Return values
array<string|int, mixed>

setTable()

Set custom char translation table where: - key - integer character code; - value - "utf-8" encoded value;

public setTable(array<string|int, mixed> $table) : void
Parameters
$table : array<string|int, mixed>

translateChar()

public translateChar(string $char[, bool $use_default = true ]) : string|bool
Parameters
$char : string
$use_default : bool = true
Return values
string|bool

uchr()

Convert unicode character code to "utf-8" encoded string.

public static uchr(int|float $code) : string
Parameters
$code : int|float

Unicode character code. Will be casted to int internally!

Return values
string

getFontSpaceLimit()

protected getFontSpaceLimit() : int
Tags
todo

Deprecated, use $this->config->getFontSpaceLimit() instead.

Return values
int

getUniqueId()

Returns unique id identifying the object.

protected getUniqueId() : string
Return values
string

createEncodingByPdfObject()

Create Encoding instance by PDFObject instance (without init).

private createEncodingByPdfObject(PDFObject $PDFObject) : Encoding
Parameters
$PDFObject : PDFObject
Return values
Encoding

createInitializedEncodingByPdfObject()

Create Encoding instance by PDFObject instance and init it.

private createInitializedEncodingByPdfObject(PDFObject $PDFObject) : Encoding
Parameters
$PDFObject : PDFObject
Return values
Encoding

decodeContentByAutodetectIfNecessary()

If string seems like "utf-8" encoded string do nothing and just return given string as is.

private decodeContentByAutodetectIfNecessary(string $text) : string|false

Otherwise, interpret string as "Window-1252" encoded string.

Parameters
$text : string
Return values
string|false

decodeContentByEncoding()

Decode content by any type of Encoding (dictionary's item) instance.

private decodeContentByEncoding(string $text) : string|null
Parameters
$text : string
Return values
string|null

decodeContentByEncodingElement()

Decode content when $encoding (given by $this->get('Encoding')) is instance of Element.

private decodeContentByEncodingElement(string $text, Element $encoding) : string|null
Parameters
$text : string
$encoding : Element
Return values
string|null

decodeContentByEncodingEncoding()

Decode content when $encoding (given by $this->get('Encoding')) is instance of Encoding.

private decodeContentByEncodingEncoding(string $text, Encoding $encoding) : string
Parameters
$text : string
$encoding : Encoding
Return values
string

decodeContentByToUnicodeCMapOrDescendantFonts()

First try to decode $text by ToUnicode CMap.

private decodeContentByToUnicodeCMapOrDescendantFonts(string $text) : string

If char translation not found in ToUnicode CMap tries:

  • If DescendantFonts exists tries to decode char by one of that fonts.
    • If have no success to decode by DescendantFonts interpret $text as a string with "Windows-1252" encoding.
  • If DescendantFonts does not exist just return "?" as decoded char.
Parameters
$text : string
Tags
todo

Seems this is invalid algorithm that do not follow pdf-format specification. Must be rewritten.

Return values
string

getIconvEncodingNameOrNullByPdfEncodingName()

Convert PDF encoding name to iconv-known encoding name.

private getIconvEncodingNameOrNullByPdfEncodingName(string $pdfEncodingName) : string|null
Parameters
$pdfEncodingName : string
Return values
string|null

getInitializedEncodingByPdfObject()

Returns already created or create a new one if not created before Encoding instance by PDFObject instance.

private getInitializedEncodingByPdfObject(PDFObject $PDFObject) : Encoding
Parameters
$PDFObject : PDFObject
Return values
Encoding

        
On this page

Search results