RawDataParser
in package
Table of Contents
Properties
- $cfg : array<string, bool>
- Configuration array.
- $filterHelper : mixed
- $objects : mixed
- $config : Config
Methods
- __construct() : mixed
- parseData() : array<string|int, mixed>
- Parses PDF data and returns extracted data as array.
- decodeStream() : array<string|int, mixed>
- Decode the specified stream.
- decodeXref() : array<string|int, mixed>
- Decode the Cross-Reference section
- decodeXrefStream() : array<string|int, mixed>
- Decode the Cross-Reference Stream section
- getIndirectObject() : array<string|int, mixed>
- Get content of indirect object.
- getObjectHeaderLen() : int
- getObjectHeaderPattern() : string
- getObjectVal() : array<string|int, mixed>
- Get the content of object, resolving indirect object reference if necessary.
- getRawObject() : array<string|int, mixed>
- Get object type, raw value and offset to next object
- getXrefData() : array<string|int, mixed>
- Get Cross-Reference (xref) table and trailer data from PDF document data.
- getHeaderValue() : string|array<string|int, mixed>|null
- Get value of an object header's section (obj << YYY >> part ).
Properties
$cfg
Configuration array.
protected
array<string, bool>
$cfg
= [
// if `true` ignore filter decoding errors
'ignore_filter_decoding_errors' => true,
// if `true` ignore missing filter decoding errors
'ignore_missing_filter_decoders' => true,
]
$filterHelper
protected
mixed
$filterHelper
$objects
protected
mixed
$objects
$config
private
Config
$config
Methods
__construct()
public
__construct([array<string|int, mixed> $cfg = [] ][, Config|null $config = null ]) : mixed
Parameters
- $cfg : array<string|int, mixed> = []
-
Configuration array, default is []
- $config : Config|null = null
parseData()
Parses PDF data and returns extracted data as array.
public
parseData(string $data) : array<string|int, mixed>
Parameters
- $data : string
-
PDF data to parse
Tags
Return values
array<string|int, mixed> —array of parsed PDF document objects
decodeStream()
Decode the specified stream.
protected
decodeStream(string $pdfData, array<string|int, mixed> $xref, array<string|int, mixed> $sdic, string $stream) : array<string|int, mixed>
Parameters
- $pdfData : string
-
PDF data
- $xref : array<string|int, mixed>
- $sdic : array<string|int, mixed>
-
Stream's dictionary array
- $stream : string
-
Stream to decode
Tags
Return values
array<string|int, mixed> —containing decoded stream data and remaining filters
decodeXref()
Decode the Cross-Reference section
protected
decodeXref(string $pdfData, int $startxref[, array<string|int, mixed> $xref = [] ][, array<string|int, int> $visitedOffsets = [] ]) : array<string|int, mixed>
Parameters
- $pdfData : string
-
PDF data
- $startxref : int
-
Offset at which the xref section starts (position of the 'xref' keyword)
- $xref : array<string|int, mixed> = []
-
Previous xref array (if any)
- $visitedOffsets : array<string|int, int> = []
-
Array of visited offsets to prevent infinite loops
Tags
Return values
array<string|int, mixed> —containing xref and trailer data
decodeXrefStream()
Decode the Cross-Reference Stream section
protected
decodeXrefStream(string $pdfData, int $startxref[, array<string|int, mixed> $xref = [] ][, array<string|int, int> $visitedOffsets = [] ]) : array<string|int, mixed>
Parameters
- $pdfData : string
-
PDF data
- $startxref : int
-
Offset at which the xref section starts
- $xref : array<string|int, mixed> = []
-
Previous xref array (if any)
- $visitedOffsets : array<string|int, int> = []
-
Array of visited offsets to prevent infinite loops
Tags
Return values
array<string|int, mixed> —containing xref and trailer data
getIndirectObject()
Get content of indirect object.
protected
getIndirectObject(string $pdfData, array<string|int, mixed> $xref, string $objRef[, int $offset = 0 ][, bool $decoding = true ]) : array<string|int, mixed>
Parameters
- $pdfData : string
-
PDF data
- $xref : array<string|int, mixed>
- $objRef : string
-
Object number and generation number separated by underscore character
- $offset : int = 0
-
Object offset
- $decoding : bool = true
-
If true decode streams
Tags
Return values
array<string|int, mixed> —containing object data
getObjectHeaderLen()
protected
getObjectHeaderLen(array<string|int, mixed> $objRefs) : int
Parameters
- $objRefs : array<string|int, mixed>
Return values
intgetObjectHeaderPattern()
protected
getObjectHeaderPattern(array<string|int, mixed> $objRefs) : string
Parameters
- $objRefs : array<string|int, mixed>
Return values
stringgetObjectVal()
Get the content of object, resolving indirect object reference if necessary.
protected
getObjectVal(string $pdfData, mixed $xref, array<string|int, mixed> $obj) : array<string|int, mixed>
Parameters
- $pdfData : string
-
PDF data
- $xref : mixed
- $obj : array<string|int, mixed>
-
Object value
Tags
Return values
array<string|int, mixed> —containing object data
getRawObject()
Get object type, raw value and offset to next object
protected
getRawObject(string $pdfData[, int $offset = 0 ][, array<string|int, mixed>|null $headerDic = null ]) : array<string|int, mixed>
Parameters
- $pdfData : string
- $offset : int = 0
-
Object offset
- $headerDic : array<string|int, mixed>|null = null
-
obj header's dictionary, parsed by getRawObject. Used for stream parsing optimization
Return values
array<string|int, mixed> —containing object type, raw value and offset to next object
getXrefData()
Get Cross-Reference (xref) table and trailer data from PDF document data.
protected
getXrefData(string $pdfData[, int $offset = 0 ][, array<string|int, mixed> $xref = [] ][, array<string|int, int> $visitedOffsets = [] ]) : array<string|int, mixed>
Parameters
- $pdfData : string
- $offset : int = 0
-
xref offset (if known)
- $xref : array<string|int, mixed> = []
-
previous xref array (if any)
- $visitedOffsets : array<string|int, int> = []
-
array of visited offsets to prevent infinite loops
Tags
Return values
array<string|int, mixed> —containing xref and trailer data
getHeaderValue()
Get value of an object header's section (obj << YYY >> part ).
private
getHeaderValue(array<string|int, mixed>|null $headerDic, string $key, string $type[, string|array<string|int, mixed>|null $default = '' ]) : string|array<string|int, mixed>|null
It is similar to Header::get('...')->getContent(), the only difference is it can be used during the parsing process, when no Smalot\PdfParser\Header objects are created yet.
Parameters
- $headerDic : array<string|int, mixed>|null
- $key : string
-
header's section name
- $type : string
-
type of the section (i.e. 'numeric', '/', '<<', etc.)
- $default : string|array<string|int, mixed>|null = ''
-
default value for header's section
Return values
string|array<string|int, mixed>|null —value of obj header's section, or default value if none found, or its type doesn't match $type param