Documentation

EpubParserService

Service for parsing EPUB files and extracting content.

Uses the kiwilan/php-ebook library to read EPUB files and extract metadata and chapter content for import into LWT.

Tags
since
3.0.0

Table of Contents

Methods

cleanHtmlContent()  : string
Clean HTML content to plain text suitable for LWT.
getMetadata()  : array{title: string, author: string|null, description: string|null, language: string|null}|null
Get just the metadata without parsing chapters.
isValidEpub()  : bool
Validate that a file is an EPUB.
parse()  : array{metadata: array{title: string, author: string|null, description: string|null, language: string|null, sourceHash: string}, chapters: array{num: int, title: string, content: string}[]}
Parse an EPUB file and extract metadata and chapters.
extractAuthor()  : string|null
Extract the primary author name from an ebook.
extractChapters()  : array<string|int, array{num: int, title: string, content: string}>
Extract chapters from an ebook.
extractFromHtmlFiles()  : array<string|int, array{num: int, title: string, content: string}>
Extract content from HTML files in the EPUB as fallback.
extractTitleFromContent()  : string
Extract a title from content if possible.
getEpubModule()  : EpubModule|null
Get the EpubModule from an Ebook.

Methods

cleanHtmlContent()

Clean HTML content to plain text suitable for LWT.

public cleanHtmlContent(string $html) : string

Strips HTML tags while preserving paragraph structure with double newlines for paragraph breaks.

Parameters
$html : string

The HTML content

Return values
string

Clean plain text

getMetadata()

Get just the metadata without parsing chapters.

public getMetadata(string $filePath) : array{title: string, author: string|null, description: string|null, language: string|null}|null
Parameters
$filePath : string

Path to the EPUB file

Return values
array{title: string, author: string|null, description: string|null, language: string|null}|null

Metadata or null on failure

isValidEpub()

Validate that a file is an EPUB.

public isValidEpub(string $filePath) : bool
Parameters
$filePath : string

Path to the file

Return values
bool

True if valid EPUB

parse()

Parse an EPUB file and extract metadata and chapters.

public parse(string $filePath) : array{metadata: array{title: string, author: string|null, description: string|null, language: string|null, sourceHash: string}, chapters: array{num: int, title: string, content: string}[]}
Parameters
$filePath : string

Absolute path to the EPUB file

Tags
throws
InvalidArgumentException

If file doesn't exist

throws
RuntimeException

If file cannot be parsed

Return values
array{metadata: array{title: string, author: string|null, description: string|null, language: string|null, sourceHash: string}, chapters: array{num: int, title: string, content: string}[]}

extractAuthor()

Extract the primary author name from an ebook.

private extractAuthor(Ebook $ebook) : string|null
Parameters
$ebook : Ebook

The ebook object

Return values
string|null

Author name or null if not found

extractChapters()

Extract chapters from an ebook.

private extractChapters(Ebook $ebook) : array<string|int, array{num: int, title: string, content: string}>
Parameters
$ebook : Ebook

The ebook object

Return values
array<string|int, array{num: int, title: string, content: string}>

extractFromHtmlFiles()

Extract content from HTML files in the EPUB as fallback.

private extractFromHtmlFiles(Ebook $ebook) : array<string|int, array{num: int, title: string, content: string}>
Parameters
$ebook : Ebook

The ebook object

Return values
array<string|int, array{num: int, title: string, content: string}>

extractTitleFromContent()

Extract a title from content if possible.

private extractTitleFromContent(string $content, int $num) : string
Parameters
$content : string

The text content

$num : int

Default chapter number

Return values
string

The extracted or default title


        
On this page

Search results