Documentation

RawDataParser
in package

Table of Contents

Properties

$cfg  : array<string, bool>
Configuration array.
$filterHelper  : mixed
$objects  : mixed
$config  : Config

Methods

__construct()  : mixed
parseData()  : array<string|int, mixed>
Parses PDF data and returns extracted data as array.
decodeStream()  : array<string|int, mixed>
Decode the specified stream.
decodeXref()  : array<string|int, mixed>
Decode the Cross-Reference section
decodeXrefStream()  : array<string|int, mixed>
Decode the Cross-Reference Stream section
getIndirectObject()  : array<string|int, mixed>
Get content of indirect object.
getObjectHeaderLen()  : int
getObjectHeaderPattern()  : string
getObjectVal()  : array<string|int, mixed>
Get the content of object, resolving indirect object reference if necessary.
getRawObject()  : array<string|int, mixed>
Get object type, raw value and offset to next object
getXrefData()  : array<string|int, mixed>
Get Cross-Reference (xref) table and trailer data from PDF document data.
getHeaderValue()  : string|array<string|int, mixed>|null
Get value of an object header's section (obj << YYY >> part ).

Properties

$cfg

Configuration array.

protected array<string, bool> $cfg = [ // if `true` ignore filter decoding errors 'ignore_filter_decoding_errors' => true, // if `true` ignore missing filter decoding errors 'ignore_missing_filter_decoders' => true, ]

Methods

__construct()

public __construct([array<string|int, mixed> $cfg = [] ][, Config|null $config = null ]) : mixed
Parameters
$cfg : array<string|int, mixed> = []

Configuration array, default is []

$config : Config|null = null

parseData()

Parses PDF data and returns extracted data as array.

public parseData(string $data) : array<string|int, mixed>
Parameters
$data : string

PDF data to parse

Tags
throws
EmptyPdfException

if empty PDF data given

throws
MissingPdfHeaderException

if PDF data missing %PDF- header

Return values
array<string|int, mixed>

array of parsed PDF document objects

decodeStream()

Decode the specified stream.

protected decodeStream(string $pdfData, array<string|int, mixed> $xref, array<string|int, mixed> $sdic, string $stream) : array<string|int, mixed>
Parameters
$pdfData : string

PDF data

$xref : array<string|int, mixed>
$sdic : array<string|int, mixed>

Stream's dictionary array

$stream : string

Stream to decode

Tags
throws
Exception
Return values
array<string|int, mixed>

containing decoded stream data and remaining filters

decodeXref()

Decode the Cross-Reference section

protected decodeXref(string $pdfData, int $startxref[, array<string|int, mixed> $xref = [] ][, array<string|int, int> $visitedOffsets = [] ]) : array<string|int, mixed>
Parameters
$pdfData : string

PDF data

$startxref : int

Offset at which the xref section starts (position of the 'xref' keyword)

$xref : array<string|int, mixed> = []

Previous xref array (if any)

$visitedOffsets : array<string|int, int> = []

Array of visited offsets to prevent infinite loops

Tags
throws
Exception
Return values
array<string|int, mixed>

containing xref and trailer data

decodeXrefStream()

Decode the Cross-Reference Stream section

protected decodeXrefStream(string $pdfData, int $startxref[, array<string|int, mixed> $xref = [] ][, array<string|int, int> $visitedOffsets = [] ]) : array<string|int, mixed>
Parameters
$pdfData : string

PDF data

$startxref : int

Offset at which the xref section starts

$xref : array<string|int, mixed> = []

Previous xref array (if any)

$visitedOffsets : array<string|int, int> = []

Array of visited offsets to prevent infinite loops

Tags
throws
Exception

if unknown PNG predictor detected

Return values
array<string|int, mixed>

containing xref and trailer data

getIndirectObject()

Get content of indirect object.

protected getIndirectObject(string $pdfData, array<string|int, mixed> $xref, string $objRef[, int $offset = 0 ][, bool $decoding = true ]) : array<string|int, mixed>
Parameters
$pdfData : string

PDF data

$xref : array<string|int, mixed>
$objRef : string

Object number and generation number separated by underscore character

$offset : int = 0

Object offset

$decoding : bool = true

If true decode streams

Tags
throws
Exception

if invalid object reference found

Return values
array<string|int, mixed>

containing object data

getObjectHeaderLen()

protected getObjectHeaderLen(array<string|int, mixed> $objRefs) : int
Parameters
$objRefs : array<string|int, mixed>
Return values
int

getObjectHeaderPattern()

protected getObjectHeaderPattern(array<string|int, mixed> $objRefs) : string
Parameters
$objRefs : array<string|int, mixed>
Return values
string

getObjectVal()

Get the content of object, resolving indirect object reference if necessary.

protected getObjectVal(string $pdfData, mixed $xref, array<string|int, mixed> $obj) : array<string|int, mixed>
Parameters
$pdfData : string

PDF data

$xref : mixed
$obj : array<string|int, mixed>

Object value

Tags
throws
Exception
Return values
array<string|int, mixed>

containing object data

getRawObject()

Get object type, raw value and offset to next object

protected getRawObject(string $pdfData[, int $offset = 0 ][, array<string|int, mixed>|null $headerDic = null ]) : array<string|int, mixed>
Parameters
$pdfData : string
$offset : int = 0

Object offset

$headerDic : array<string|int, mixed>|null = null

obj header's dictionary, parsed by getRawObject. Used for stream parsing optimization

Return values
array<string|int, mixed>

containing object type, raw value and offset to next object

getXrefData()

Get Cross-Reference (xref) table and trailer data from PDF document data.

protected getXrefData(string $pdfData[, int $offset = 0 ][, array<string|int, mixed> $xref = [] ][, array<string|int, int> $visitedOffsets = [] ]) : array<string|int, mixed>
Parameters
$pdfData : string
$offset : int = 0

xref offset (if known)

$xref : array<string|int, mixed> = []

previous xref array (if any)

$visitedOffsets : array<string|int, int> = []

array of visited offsets to prevent infinite loops

Tags
throws
Exception

if it was unable to find startxref

throws
Exception

if it was unable to find xref

Return values
array<string|int, mixed>

containing xref and trailer data

getHeaderValue()

Get value of an object header's section (obj << YYY >> part ).

private getHeaderValue(array<string|int, mixed>|null $headerDic, string $key, string $type[, string|array<string|int, mixed>|null $default = '' ]) : string|array<string|int, mixed>|null

It is similar to Header::get('...')->getContent(), the only difference is it can be used during the parsing process, when no Smalot\PdfParser\Header objects are created yet.

Parameters
$headerDic : array<string|int, mixed>|null
$key : string

header's section name

$type : string

type of the section (i.e. 'numeric', '/', '<<', etc.)

$default : string|array<string|int, mixed>|null = ''

default value for header's section

Return values
string|array<string|int, mixed>|null

value of obj header's section, or default value if none found, or its type doesn't match $type param


        
On this page

Search results