Documentation

Page extends PDFObject
in package

Lwt

Class PDFObject

Constants

COMMAND = 'c'
OPERATOR = 'o'
TYPE = 't'

Properties

$recursionStack : array<string|int, mixed>: The recursion stack.
$addPositionWhitespace : bool
$config : Config|null
$content : string
$dataTm : array<string|int, mixed>
$document : Document|null
$fonts : array<string|int, Font>
$header : Header
$xobjects : array<string|int, PDFObject>

Methods

__construct() : mixed
createPageForFpdf() : Page: Return page if document is a FPDF/FPDI document
createPDFObjectForFpdf() : PDFObject: Return a new PDFObject of the document created with FPDF/FPDI
extractDecodedRawData() : array<string|int, mixed>: Gets all the decoded text data with it internal representation from a page.
extractRawData() : array<string|int, mixed>: Gets all the text data with its internal representation of the page.
factory() : self
get() : Element|PDFObject|Header
getCommandsText() : array<string|int, mixed>: getCommandsText() expects the content of $text_part to be an already formatted, single-line command from a document stream.
getConfig() : Config|null
getContent() : string|null
getDataCommands() : array<string|int, mixed>: Gets just the Text commands that are involved in text positions and Text Matrix (Tm)
getDataTm() : array<string|int, mixed>: Gets the Text Matrix of the text in the page
getDetails() : array<string|int, mixed>
getDocument() : Document
getFont() : Font|null
getFonts() : array<string|int, Font>
getHeader() : Header|null
getPageNumber() : int: Return the page number of the PDF document of the page object
getPDFObjectForFpdf() : PDFObject: Return the Object of the page if the document is a FPDF/FPDI document
getText() : string: Returns the text content of a PDF as a string. Attempts to add whitespace for spacing and line-breaks where appropriate.
getTextArray() : array<string|int, mixed>: Returns the text content of a PDF as an array of strings. No extra whitespace is inserted besides what is actually encoded in the PDF text.
getTextXY() : array<string|int, mixed>: Gets text data that are around the given coordinates (X,Y)
getXObject() : PDFObject|null
getXObjects() : array<string|int, PDFObject>: Support for XObject
has() : bool
init() : mixed
isFpdf() : bool: Return true if the current page is a (setasign\Fpdi\Fpdi) FPDI/FPDF document
getUniqueId() : string: Returns unique id identifying the object.

COMMAND


    public
        mixed
    COMMAND
    = 'c'

OPERATOR


    public
        mixed
    OPERATOR
    = 'o'

TYPE


    public
        mixed
    TYPE
    = 't'

$recursionStack

The recursion stack.


        public
        static    array<string|int, mixed>
    $recursionStack
     = []

$addPositionWhitespace


        protected
            bool
    $addPositionWhitespace
     = false

$config


        protected
            Config|null
    $config

$content


        protected
            string
    $content

$dataTm


        protected
            array<string|int, mixed>
    $dataTm

$document


        protected
            Document|null
    $document

$fonts


        protected
            array<string|int, Font>
    $fonts

$header


        protected
            Header
    $header

$xobjects


        protected
            array<string|int, PDFObject>
    $xobjects

__construct()


    public
                    __construct(Document $document[, Header|null $header = null ][, string|null $content = null ][, Config|null $config = null ]) : mixed

Parameters

$document : Document
$header : Header|null = null
$content : string|null = null
$config : Config|null = null

createPageForFpdf()

Return page if document is a FPDF/FPDI document


    public
                    createPageForFpdf() : Page

Return values

Page —

The page

createPDFObjectForFpdf()

Return a new PDFObject of the document created with FPDF/FPDI


    public
                    createPDFObjectForFpdf() : PDFObject

For a document generated by FPDF/FPDI, it generates a new PDFObject for that document

Return values

PDFObject —

The PDFObject

extractDecodedRawData()

Gets all the decoded text data with it internal representation from a page.


    public
                    extractDecodedRawData([array<string|int, mixed> $extractedRawData = null ]) : array<string|int, mixed>

Parameters

$extractedRawData : array<string|int, mixed> = null: the extracted data return by extractRawData or null if extractRawData should be called

Return values

array<string|int, mixed> —

An array with the data and the internal representation

extractRawData()

Gets all the text data with its internal representation of the page.


    public
                    extractRawData() : array<string|int, mixed>

Returns an array with the data and the internal representation

Return values

array<string|int, mixed>

factory()


    public
            static        factory(Document $document, Header $header, string|null $content[, Config|null $config = null ]) : self

Parameters

$document : Document
$header : Header
$content : string|null
$config : Config|null = null

Return values

self

get()


    public
                    get(string $name) : Element|PDFObject|Header

Parameters

$name : string

Return values

Element|PDFObject|Header

getCommandsText()

getCommandsText() expects the content of $text_part to be an already formatted, single-line command from a document stream.


    public
                    getCommandsText(string $text_part[, int &$offset = 0 ]) : array<string|int, mixed>

The companion function getSectionsText() returns a document stream as an array of single commands for just this purpose. Because of this, the argument $offset is no longer used, and may be removed in a future PdfParser release.

A better name for this function would be getCommandText() since it now always works on just one command.

Parameters

$text_part : string
$offset : int = 0

Return values

array<string|int, mixed>

getConfig()


    public
                    getConfig() : Config|null

Return values

Config|null

getContent()


    public
                    getContent() : string|null

Return values

string|null

getDataCommands()

Gets just the Text commands that are involved in text positions and Text Matrix (Tm)


    public
                    getDataCommands([array<string|int, mixed> $extractedDecodedRawData = null ]) : array<string|int, mixed>

It extract just the PDF commands that are involved with text positions, and the Text Matrix (Tm). These are: BT, ET, TL, Td, TD, Tm, T*, Tj, ', ", and TJ

Parameters

$extractedDecodedRawData : array<string|int, mixed> = null: The data extracted by extractDecodeRawData. If it is null, the method extractDecodeRawData is called.

Return values

array<string|int, mixed> —

An array with the text command of the page

getDataTm()

Gets the Text Matrix of the text in the page


    public
                    getDataTm([array<string|int, mixed> $dataCommands = null ]) : array<string|int, mixed>

Return an array where every item is an array where the first item is the Text Matrix (Tm) and the second is a string with the text data. The Text matrix is an array of 6 numbers. The last 2 numbers are the coordinates X and Y of the text. The first 4 numbers has to be with Scalation, Rotation and Skew of the text.

Parameters

$dataCommands : array<string|int, mixed> = null: the data extracted by getDataCommands if null getDataCommands is called

Return values

array<string|int, mixed> —

an array with the data of the page including the Tm information of any text in the page

getDetails()


    public
                    getDetails([bool $deep = true ]) : array<string|int, mixed>

Parameters

$deep : bool = true

Return values

array<string|int, mixed>

getDocument()


    public
                    getDocument() : Document

Return values

Document

getFont()


    public
                    getFont(string $id) : Font|null

Parameters

$id : string

Return values

Font|null

getFonts()


    public
                    getFonts() : array<string|int, Font>

Return values

array<string|int, Font>

getHeader()


    public
                    getHeader() : Header|null

Return values

Header|null

getPageNumber()

Return the page number of the PDF document of the page object


    public
                    getPageNumber() : int

Return values

int —

the page number

getPDFObjectForFpdf()

Return the Object of the page if the document is a FPDF/FPDI document


    public
                    getPDFObjectForFpdf() : PDFObject

If the document was generated by FPDF/FPDI it returns the PDFObject of the given page

Return values

PDFObject —

The PDFObject for the page

getText()

Returns the text content of a PDF as a string. Attempts to add whitespace for spacing and line-breaks where appropriate.


    public
                    getText([self|null $page = null ]) : string

getText() leverages getTextArray() to get the content of the document, setting the addPositionWhitespace flag to true so whitespace is inserted in a logical way for reading by humans.

Parameters

$page : self|null = null

Return values

string

getTextArray()

Returns the text content of a PDF as an array of strings. No extra whitespace is inserted besides what is actually encoded in the PDF text.


    public
                    getTextArray([self|null $page = null ]) : array<string|int, mixed>

Parameters

$page : self|null = null

Return values

array<string|int, mixed>

getTextXY()

Gets text data that are around the given coordinates (X,Y)


    public
                    getTextXY([float $x = null ][, float $y = null ][, float $xError = 0 ][, float $yError = 0 ]) : array<string|int, mixed>

If the text is in near the given coordinates (X,Y) (or the TM info), the text is returned. The extractedData return by getDataTm, could be use to see where is the coordinates of a given text, using the TM info for it.

Parameters

$x : float = null: The X value of the coordinate to search for. if null just the Y value is considered (same Row)
$y : float = null: The Y value of the coordinate to search for just the X value is considered (same column)
$xError : float = 0: The value less or more to consider an X to be "near"
$yError : float = 0: The value less or more to consider an Y to be "near"

Return values

array<string|int, mixed> —

An array of text that are near the given coordinates. If no text "near" the x,y coordinate, an empty array is returned. If Both, x and y coordinates are null, null is returned.

getXObject()


    public
                    getXObject(string $id) : PDFObject|null

Parameters

$id : string

Return values

PDFObject|null

getXObjects()

Support for XObject


    public
                    getXObjects() : array<string|int, PDFObject>

Return values

array<string|int, PDFObject>

has()


    public
                    has(string $name) : bool

Parameters

$name : string

Return values

bool

init()


    public
                    init() : mixed

isFpdf()

Return true if the current page is a (setasign\Fpdi\Fpdi) FPDI/FPDF document


    public
                    isFpdf() : bool

The metadata 'Producer' should have the value of "FPDF" . FPDF_VERSION if the pdf file was generated by FPDF/Fpfi.

Return values

bool —

true is the current page is a FPDI/FPDF document

getUniqueId()

Returns unique id identifying the object.


    protected
                    getUniqueId() : string

Return values

string

Page extends PDFObject in package Lwt

Table of Contents

Constants

Properties

Methods

Constants

COMMAND

OPERATOR

TYPE

Properties

$recursionStack

$addPositionWhitespace

$config

$content

$dataTm

$document

$fonts

$header

$xobjects

Methods

__construct()

Parameters

createPageForFpdf()

Return values

createPDFObjectForFpdf()

Return values

extractDecodedRawData()

Parameters

Return values

extractRawData()

Return values

factory()

Parameters

Return values

get()

Parameters

Return values

getCommandsText()

Parameters

Return values

getConfig()

Return values

getContent()

Return values

getDataCommands()

Parameters

Return values

getDataTm()

Parameters

Return values

getDetails()

Parameters

Return values

getDocument()

Return values

getFont()

Parameters

Return values

getFonts()

Return values

getHeader()

Return values

getPageNumber()

Return values

getPDFObjectForFpdf()

Return values

getText()

Parameters

Return values

getTextArray()

Parameters

Return values

getTextXY()

Parameters

Return values

getXObject()

Parameters

Return values

getXObjects()

Return values

Page extends PDFObject
in package

Lwt