RssParser
in package
Service for parsing RSS and Atom feeds.
Provides pure parsing functionality without database access. Supports both RSS 2.0 and Atom feed formats.
Tags
Table of Contents
Methods
- detectAndParse() : array<int|string, array<string, string>|string>|null
- Detect and parse feed, determining best text source.
- getFeedTitle() : string|null
- Get the feed title from a feed URL.
- parse() : array<int, array{title: string, link: string, desc: string, date: string, audio: string, text: string}>|null
- Parse RSS/Atom feed and return article items with metadata.
- cleanDescription() : string
- Clean and normalize description text.
- cleanDescriptionForDetection() : string
- Clean description for detection mode.
- cleanTitle() : string
- Clean and normalize title text.
- cleanTitleForDetection() : string
- Clean title for detection mode.
- convertToHtmlEntities() : string
- Convert HTML to HTML entities.
- countTextLengths() : array{desc: array{long: int, short: int}, encoded: array{long: int, short: int}}
- Count text lengths for source detection.
- determineBestTextSource() : array<int|string, array<string, string>|string>
- Determine best text source and update items.
- extractAudioEnclosure() : string
- Extract audio enclosure URL.
- extractInlineText() : string|null
- Extract inline text from item node.
- extractLink() : string
- Extract link from node based on feed type.
- formatParsedDate() : string
- Format parsed date array to MySQL datetime.
- getFeedTagMapping() : array{item: string, title: string, description: string, link: string, pubDate: string, enclosure: string, url: string}|null
- Get tag mapping for RSS/Atom feed format.
- parseFeedDate() : string
- Parse feed date string to MySQL datetime format.
- parseItem() : array{title: string, link: string, desc: string, date: string, audio: string, text: string}|null
- Parse a single feed item.
- parseItemForDetection() : array{title: string, desc: string, link: string, encoded?: string, description?: string, content?: string}
- Parse item for detection mode (includes raw text content).
Methods
detectAndParse()
Detect and parse feed, determining best text source.
public
detectAndParse(string $sourceUri) : array<int|string, array<string, string>|string>|null
Analyzes feed to determine whether to use:
- content (Atom)
- description (RSS)
- encoded (RSS with content:encoded)
- webpage link (external fetch)
Parameters
- $sourceUri : string
-
Feed URL
Return values
array<int|string, array<string, string>|string>|null —Feed data with feed_text indicator or null on error
getFeedTitle()
Get the feed title from a feed URL.
public
getFeedTitle(string $sourceUri) : string|null
Parameters
- $sourceUri : string
-
Feed URL
Return values
string|null —Feed title or null on error
parse()
Parse RSS/Atom feed and return article items with metadata.
public
parse(string $sourceUri[, string $articleSection = '' ]) : array<int, array{title: string, link: string, desc: string, date: string, audio: string, text: string}>|null
Supports both RSS 2.0 and Atom feed formats. Extracts:
- Title, description, link, publication date
- Audio enclosures (podcast support)
- Inline text content (if article section specified)
Parameters
- $sourceUri : string
-
Feed URL
- $articleSection : string = ''
-
Tag name for inline text extraction
Return values
array<int, array{title: string, link: string, desc: string, date: string, audio: string, text: string}>|null —Array of feed items or null on error
cleanDescription()
Clean and normalize description text.
private
cleanDescription(string $desc) : string
Parameters
- $desc : string
-
Raw description
Return values
string —Cleaned description
cleanDescriptionForDetection()
Clean description for detection mode.
private
cleanDescriptionForDetection(string $desc) : string
Parameters
- $desc : string
-
Raw description
Return values
string —Cleaned description
cleanTitle()
Clean and normalize title text.
private
cleanTitle(string $title) : string
Parameters
- $title : string
-
Raw title
Return values
string —Cleaned title
cleanTitleForDetection()
Clean title for detection mode.
private
cleanTitleForDetection(string $title) : string
Parameters
- $title : string
-
Raw title
Return values
string —Cleaned title
convertToHtmlEntities()
Convert HTML to HTML entities.
private
convertToHtmlEntities(string $html) : string
Parameters
- $html : string
-
HTML content
Return values
string —Converted content
countTextLengths()
Count text lengths for source detection.
private
countTextLengths(array{title: string, desc: string, link: string, encoded?: string, description?: string, content?: string} $item, string $descKey, string $encKey) : array{desc: array{long: int, short: int}, encoded: array{long: int, short: int}}
Parameters
- $item : array{title: string, desc: string, link: string, encoded?: string, description?: string, content?: string}
-
Item data
- $descKey : string
-
Description key
- $encKey : string
-
Encoded key
Return values
array{desc: array{long: int, short: int}, encoded: array{long: int, short: int}} —Counts array
determineBestTextSource()
Determine best text source and update items.
private
determineBestTextSource(array<int|string, array<string, string>|string> $rssData, array{item: string, title: string, description: string, link: string, pubDate: string, enclosure: string, url: string} $feedTags, int $descCount, int $descNocount, int $encCount, int $encNocount) : array<int|string, array<string, string>|string>
Parameters
- $rssData : array<int|string, array<string, string>|string>
-
Feed items
- $feedTags : array{item: string, title: string, description: string, link: string, pubDate: string, enclosure: string, url: string}
-
Tag mapping
- $descCount : int
-
Long description count
- $descNocount : int
-
Short description count
- $encCount : int
-
Long encoded count
- $encNocount : int
-
Short encoded count
Return values
array<int|string, array<string, string>|string> —Updated feed data
extractAudioEnclosure()
Extract audio enclosure URL.
private
extractAudioEnclosure(DOMElement $node, array{item: string, title: string, description: string, link: string, pubDate: string, enclosure: string, url: string} $feedTags) : string
Parameters
- $node : DOMElement
-
Item node
- $feedTags : array{item: string, title: string, description: string, link: string, pubDate: string, enclosure: string, url: string}
-
Tag mapping
Return values
string —Audio URL or empty string
extractInlineText()
Extract inline text from item node.
private
extractInlineText(DOMElement $node, string $articleSection) : string|null
Parameters
- $node : DOMElement
-
Item node
- $articleSection : string
-
Tag name for text extraction
Return values
string|null —Extracted text or null
extractLink()
Extract link from node based on feed type.
private
extractLink(DOMElement|null $linkNode, array<string|int, mixed> $feedTags) : string
Parameters
- $linkNode : DOMElement|null
-
Link node
- $feedTags : array<string|int, mixed>
-
Tag mapping
Return values
string —Link URL
formatParsedDate()
Format parsed date array to MySQL datetime.
private
formatParsedDate(array<string|int, mixed> $pubDate, int $fallback) : string
Parameters
- $pubDate : array<string|int, mixed>
-
Parsed date array
- $fallback : int
-
Fallback offset
Return values
string —MySQL datetime string
getFeedTagMapping()
Get tag mapping for RSS/Atom feed format.
private
getFeedTagMapping(DOMDocument $rss) : array{item: string, title: string, description: string, link: string, pubDate: string, enclosure: string, url: string}|null
Parameters
- $rss : DOMDocument
-
Feed document
Return values
array{item: string, title: string, description: string, link: string, pubDate: string, enclosure: string, url: string}|null —Tag mapping or null if unknown format
parseFeedDate()
Parse feed date string to MySQL datetime format.
private
parseFeedDate(string|null $dateStr, int $fallback) : string
Parameters
- $dateStr : string|null
-
Date string from feed
- $fallback : int
-
Fallback offset for ordering
Return values
string —MySQL datetime string
parseItem()
Parse a single feed item.
private
parseItem(DOMElement $node, array{item: string, title: string, description: string, link: string, pubDate: string, enclosure: string, url: string} $feedTags, int $index, string $articleSection) : array{title: string, link: string, desc: string, date: string, audio: string, text: string}|null
Parameters
- $node : DOMElement
-
Item node
- $feedTags : array{item: string, title: string, description: string, link: string, pubDate: string, enclosure: string, url: string}
-
Tag mapping
- $index : int
-
Item index (for date fallback)
- $articleSection : string
-
Tag for inline text extraction
Return values
array{title: string, link: string, desc: string, date: string, audio: string, text: string}|null —Parsed item or null if invalid
parseItemForDetection()
Parse item for detection mode (includes raw text content).
private
parseItemForDetection(DOMElement $node, array{item: string, title: string, description: string, link: string, pubDate: string, enclosure: string, url: string} $feedTags) : array{title: string, desc: string, link: string, encoded?: string, description?: string, content?: string}
Parameters
- $node : DOMElement
-
Item node
- $feedTags : array{item: string, title: string, description: string, link: string, pubDate: string, enclosure: string, url: string}
-
Tag mapping
Return values
array{title: string, desc: string, link: string, encoded?: string, description?: string, content?: string} —Parsed item