EpubParserService
in package
Service for parsing EPUB files and extracting content.
Uses the kiwilan/php-ebook library to read EPUB files and extract metadata and chapter content for import into LWT.
Tags
Table of Contents
Methods
- cleanHtmlContent() : string
- Clean HTML content to plain text suitable for LWT.
- getMetadata() : array{title: string, author: string|null, description: string|null, language: string|null}|null
- Get just the metadata without parsing chapters.
- isValidEpub() : bool
- Validate that a file is an EPUB.
- parse() : array{metadata: array{title: string, author: string|null, description: string|null, language: string|null, sourceHash: string}, chapters: array{num: int, title: string, content: string}[]}
- Parse an EPUB file and extract metadata and chapters.
- extractAuthor() : string|null
- Extract the primary author name from an ebook.
- extractChapters() : array<string|int, array{num: int, title: string, content: string}>
- Extract chapters from an ebook.
- extractFromHtmlFiles() : array<string|int, array{num: int, title: string, content: string}>
- Extract content from HTML files in the EPUB as fallback.
- extractTitleFromContent() : string
- Extract a title from content if possible.
- getEpubModule() : EpubModule|null
- Get the EpubModule from an Ebook.
Methods
cleanHtmlContent()
Clean HTML content to plain text suitable for LWT.
public
cleanHtmlContent(string $html) : string
Strips HTML tags while preserving paragraph structure with double newlines for paragraph breaks.
Parameters
- $html : string
-
The HTML content
Return values
string —Clean plain text
getMetadata()
Get just the metadata without parsing chapters.
public
getMetadata(string $filePath) : array{title: string, author: string|null, description: string|null, language: string|null}|null
Parameters
- $filePath : string
-
Path to the EPUB file
Return values
array{title: string, author: string|null, description: string|null, language: string|null}|null —Metadata or null on failure
isValidEpub()
Validate that a file is an EPUB.
public
isValidEpub(string $filePath) : bool
Parameters
- $filePath : string
-
Path to the file
Return values
bool —True if valid EPUB
parse()
Parse an EPUB file and extract metadata and chapters.
public
parse(string $filePath) : array{metadata: array{title: string, author: string|null, description: string|null, language: string|null, sourceHash: string}, chapters: array{num: int, title: string, content: string}[]}
Parameters
- $filePath : string
-
Absolute path to the EPUB file
Tags
Return values
array{metadata: array{title: string, author: string|null, description: string|null, language: string|null, sourceHash: string}, chapters: array{num: int, title: string, content: string}[]}extractAuthor()
Extract the primary author name from an ebook.
private
extractAuthor(Ebook $ebook) : string|null
Parameters
- $ebook : Ebook
-
The ebook object
Return values
string|null —Author name or null if not found
extractChapters()
Extract chapters from an ebook.
private
extractChapters(Ebook $ebook) : array<string|int, array{num: int, title: string, content: string}>
Parameters
- $ebook : Ebook
-
The ebook object
Return values
array<string|int, array{num: int, title: string, content: string}>extractFromHtmlFiles()
Extract content from HTML files in the EPUB as fallback.
private
extractFromHtmlFiles(Ebook $ebook) : array<string|int, array{num: int, title: string, content: string}>
Parameters
- $ebook : Ebook
-
The ebook object
Return values
array<string|int, array{num: int, title: string, content: string}>extractTitleFromContent()
Extract a title from content if possible.
private
extractTitleFromContent(string $content, int $num) : string
Parameters
- $content : string
-
The text content
- $num : int
-
Default chapter number
Return values
string —The extracted or default title
getEpubModule()
Get the EpubModule from an Ebook.
private
getEpubModule(Ebook $ebook) : EpubModule|null
Parameters
- $ebook : Ebook
-
The ebook object
Return values
EpubModule|null —The EPUB module or null if not an EPUB