Documentation

Page extends PDFObject
in package

Class PDFObject

Table of Contents

Constants

COMMAND  = 'c'
OPERATOR  = 'o'
TYPE  = 't'

Properties

$recursionStack  : array<string|int, mixed>
The recursion stack.
$addPositionWhitespace  : bool
$config  : Config|null
$content  : string
$dataTm  : array<string|int, mixed>
$document  : Document|null
$fonts  : array<string|int, Font>
$header  : Header
$xobjects  : array<string|int, PDFObject>

Methods

__construct()  : mixed
createPageForFpdf()  : Page
Return page if document is a FPDF/FPDI document
createPDFObjectForFpdf()  : PDFObject
Return a new PDFObject of the document created with FPDF/FPDI
extractDecodedRawData()  : array<string|int, mixed>
Gets all the decoded text data with it internal representation from a page.
extractRawData()  : array<string|int, mixed>
Gets all the text data with its internal representation of the page.
factory()  : self
get()  : Element|PDFObject|Header
getCommandsText()  : array<string|int, mixed>
getCommandsText() expects the content of $text_part to be an already formatted, single-line command from a document stream.
getConfig()  : Config|null
getContent()  : string|null
getDataCommands()  : array<string|int, mixed>
Gets just the Text commands that are involved in text positions and Text Matrix (Tm)
getDataTm()  : array<string|int, mixed>
Gets the Text Matrix of the text in the page
getDetails()  : array<string|int, mixed>
getDocument()  : Document
getFont()  : Font|null
getFonts()  : array<string|int, Font>
getHeader()  : Header|null
getPageNumber()  : int
Return the page number of the PDF document of the page object
getPDFObjectForFpdf()  : PDFObject
Return the Object of the page if the document is a FPDF/FPDI document
getText()  : string
Returns the text content of a PDF as a string. Attempts to add whitespace for spacing and line-breaks where appropriate.
getTextArray()  : array<string|int, mixed>
Returns the text content of a PDF as an array of strings. No extra whitespace is inserted besides what is actually encoded in the PDF text.
getTextXY()  : array<string|int, mixed>
Gets text data that are around the given coordinates (X,Y)
getXObject()  : PDFObject|null
getXObjects()  : array<string|int, PDFObject>
Support for XObject
has()  : bool
init()  : mixed
isFpdf()  : bool
Return true if the current page is a (setasign\Fpdi\Fpdi) FPDI/FPDF document
getUniqueId()  : string
Returns unique id identifying the object.

Constants

Properties

$recursionStack

The recursion stack.

public static array<string|int, mixed> $recursionStack = []

$addPositionWhitespace

protected bool $addPositionWhitespace = false

$dataTm

protected array<string|int, mixed> $dataTm

$fonts

protected array<string|int, Font> $fonts

Methods

__construct()

public __construct(Document $document[, Header|null $header = null ][, string|null $content = null ][, Config|null $config = null ]) : mixed
Parameters
$document : Document
$header : Header|null = null
$content : string|null = null
$config : Config|null = null

createPageForFpdf()

Return page if document is a FPDF/FPDI document

public createPageForFpdf() : Page
Return values
Page

The page

createPDFObjectForFpdf()

Return a new PDFObject of the document created with FPDF/FPDI

public createPDFObjectForFpdf() : PDFObject

For a document generated by FPDF/FPDI, it generates a new PDFObject for that document

Return values
PDFObject

The PDFObject

extractDecodedRawData()

Gets all the decoded text data with it internal representation from a page.

public extractDecodedRawData([array<string|int, mixed> $extractedRawData = null ]) : array<string|int, mixed>
Parameters
$extractedRawData : array<string|int, mixed> = null

the extracted data return by extractRawData or null if extractRawData should be called

Return values
array<string|int, mixed>

An array with the data and the internal representation

extractRawData()

Gets all the text data with its internal representation of the page.

public extractRawData() : array<string|int, mixed>

Returns an array with the data and the internal representation

Return values
array<string|int, mixed>

factory()

public static factory(Document $document, Header $header, string|null $content[, Config|null $config = null ]) : self
Parameters
$document : Document
$header : Header
$content : string|null
$config : Config|null = null
Return values
self

getCommandsText()

getCommandsText() expects the content of $text_part to be an already formatted, single-line command from a document stream.

public getCommandsText(string $text_part[, int &$offset = 0 ]) : array<string|int, mixed>

The companion function getSectionsText() returns a document stream as an array of single commands for just this purpose. Because of this, the argument $offset is no longer used, and may be removed in a future PdfParser release.

A better name for this function would be getCommandText() since it now always works on just one command.

Parameters
$text_part : string
$offset : int = 0
Return values
array<string|int, mixed>

getContent()

public getContent() : string|null
Return values
string|null

getDataCommands()

Gets just the Text commands that are involved in text positions and Text Matrix (Tm)

public getDataCommands([array<string|int, mixed> $extractedDecodedRawData = null ]) : array<string|int, mixed>

It extract just the PDF commands that are involved with text positions, and the Text Matrix (Tm). These are: BT, ET, TL, Td, TD, Tm, T*, Tj, ', ", and TJ

Parameters
$extractedDecodedRawData : array<string|int, mixed> = null

The data extracted by extractDecodeRawData. If it is null, the method extractDecodeRawData is called.

Return values
array<string|int, mixed>

An array with the text command of the page

getDataTm()

Gets the Text Matrix of the text in the page

public getDataTm([array<string|int, mixed> $dataCommands = null ]) : array<string|int, mixed>

Return an array where every item is an array where the first item is the Text Matrix (Tm) and the second is a string with the text data. The Text matrix is an array of 6 numbers. The last 2 numbers are the coordinates X and Y of the text. The first 4 numbers has to be with Scalation, Rotation and Skew of the text.

Parameters
$dataCommands : array<string|int, mixed> = null

the data extracted by getDataCommands if null getDataCommands is called

Return values
array<string|int, mixed>

an array with the data of the page including the Tm information of any text in the page

getDetails()

public getDetails([bool $deep = true ]) : array<string|int, mixed>
Parameters
$deep : bool = true
Return values
array<string|int, mixed>

getFont()

public getFont(string $id) : Font|null
Parameters
$id : string
Return values
Font|null

getFonts()

public getFonts() : array<string|int, Font>
Return values
array<string|int, Font>

getPageNumber()

Return the page number of the PDF document of the page object

public getPageNumber() : int
Return values
int

the page number

getPDFObjectForFpdf()

Return the Object of the page if the document is a FPDF/FPDI document

public getPDFObjectForFpdf() : PDFObject

If the document was generated by FPDF/FPDI it returns the PDFObject of the given page

Return values
PDFObject

The PDFObject for the page

getText()

Returns the text content of a PDF as a string. Attempts to add whitespace for spacing and line-breaks where appropriate.

public getText([self|null $page = null ]) : string

getText() leverages getTextArray() to get the content of the document, setting the addPositionWhitespace flag to true so whitespace is inserted in a logical way for reading by humans.

Parameters
$page : self|null = null
Return values
string

getTextArray()

Returns the text content of a PDF as an array of strings. No extra whitespace is inserted besides what is actually encoded in the PDF text.

public getTextArray([self|null $page = null ]) : array<string|int, mixed>
Parameters
$page : self|null = null
Return values
array<string|int, mixed>

getTextXY()

Gets text data that are around the given coordinates (X,Y)

public getTextXY([float $x = null ][, float $y = null ][, float $xError = 0 ][, float $yError = 0 ]) : array<string|int, mixed>

If the text is in near the given coordinates (X,Y) (or the TM info), the text is returned. The extractedData return by getDataTm, could be use to see where is the coordinates of a given text, using the TM info for it.

Parameters
$x : float = null

The X value of the coordinate to search for. if null just the Y value is considered (same Row)

$y : float = null

The Y value of the coordinate to search for just the X value is considered (same column)

$xError : float = 0

The value less or more to consider an X to be "near"

$yError : float = 0

The value less or more to consider an Y to be "near"

Return values
array<string|int, mixed>

An array of text that are near the given coordinates. If no text "near" the x,y coordinate, an empty array is returned. If Both, x and y coordinates are null, null is returned.

getXObject()

public getXObject(string $id) : PDFObject|null
Parameters
$id : string
Return values
PDFObject|null

getXObjects()

Support for XObject

public getXObjects() : array<string|int, PDFObject>
Return values
array<string|int, PDFObject>

has()

public has(string $name) : bool
Parameters
$name : string
Return values
bool

isFpdf()

Return true if the current page is a (setasign\Fpdi\Fpdi) FPDI/FPDF document

public isFpdf() : bool

The metadata 'Producer' should have the value of "FPDF" . FPDF_VERSION if the pdf file was generated by FPDF/Fpfi.

Return values
bool

true is the current page is a FPDI/FPDF document

getUniqueId()

Returns unique id identifying the object.

protected getUniqueId() : string
Return values
string

        
On this page

Search results