Documentation

RssParser

Service for parsing RSS and Atom feeds.

Provides pure parsing functionality without database access. Supports both RSS 2.0 and Atom feed formats.

Tags
since
3.0.0

Table of Contents

Methods

detectAndParse()  : array<int|string, array<string, string>|string>|null
Detect and parse feed, determining best text source.
getFeedTitle()  : string|null
Get the feed title from a feed URL.
parse()  : array<int, array{title: string, link: string, desc: string, date: string, audio: string, text: string}>|null
Parse RSS/Atom feed and return article items with metadata.
cleanDescription()  : string
Clean and normalize description text.
cleanDescriptionForDetection()  : string
Clean description for detection mode.
cleanTitle()  : string
Clean and normalize title text.
cleanTitleForDetection()  : string
Clean title for detection mode.
convertToHtmlEntities()  : string
Convert HTML to HTML entities.
countTextLengths()  : array{desc: array{long: int, short: int}, encoded: array{long: int, short: int}}
Count text lengths for source detection.
determineBestTextSource()  : array<int|string, array<string, string>|string>
Determine best text source and update items.
extractAudioEnclosure()  : string
Extract audio enclosure URL.
extractInlineText()  : string|null
Extract inline text from item node.
extractLink()  : string
Extract link from node based on feed type.
formatParsedDate()  : string
Format parsed date array to MySQL datetime.
getFeedTagMapping()  : array{item: string, title: string, description: string, link: string, pubDate: string, enclosure: string, url: string}|null
Get tag mapping for RSS/Atom feed format.
parseFeedDate()  : string
Parse feed date string to MySQL datetime format.
parseItem()  : array{title: string, link: string, desc: string, date: string, audio: string, text: string}|null
Parse a single feed item.
parseItemForDetection()  : array{title: string, desc: string, link: string, encoded?: string, description?: string, content?: string}
Parse item for detection mode (includes raw text content).

Methods

detectAndParse()

Detect and parse feed, determining best text source.

public detectAndParse(string $sourceUri) : array<int|string, array<string, string>|string>|null

Analyzes feed to determine whether to use:

  • content (Atom)
  • description (RSS)
  • encoded (RSS with content:encoded)
  • webpage link (external fetch)
Parameters
$sourceUri : string

Feed URL

Return values
array<int|string, array<string, string>|string>|null

Feed data with feed_text indicator or null on error

getFeedTitle()

Get the feed title from a feed URL.

public getFeedTitle(string $sourceUri) : string|null
Parameters
$sourceUri : string

Feed URL

Return values
string|null

Feed title or null on error

parse()

Parse RSS/Atom feed and return article items with metadata.

public parse(string $sourceUri[, string $articleSection = '' ]) : array<int, array{title: string, link: string, desc: string, date: string, audio: string, text: string}>|null

Supports both RSS 2.0 and Atom feed formats. Extracts:

  • Title, description, link, publication date
  • Audio enclosures (podcast support)
  • Inline text content (if article section specified)
Parameters
$sourceUri : string

Feed URL

$articleSection : string = ''

Tag name for inline text extraction

Return values
array<int, array{title: string, link: string, desc: string, date: string, audio: string, text: string}>|null

Array of feed items or null on error

cleanDescription()

Clean and normalize description text.

private cleanDescription(string $desc) : string
Parameters
$desc : string

Raw description

Return values
string

Cleaned description

cleanDescriptionForDetection()

Clean description for detection mode.

private cleanDescriptionForDetection(string $desc) : string
Parameters
$desc : string

Raw description

Return values
string

Cleaned description

cleanTitle()

Clean and normalize title text.

private cleanTitle(string $title) : string
Parameters
$title : string

Raw title

Return values
string

Cleaned title

cleanTitleForDetection()

Clean title for detection mode.

private cleanTitleForDetection(string $title) : string
Parameters
$title : string

Raw title

Return values
string

Cleaned title

convertToHtmlEntities()

Convert HTML to HTML entities.

private convertToHtmlEntities(string $html) : string
Parameters
$html : string

HTML content

Return values
string

Converted content

countTextLengths()

Count text lengths for source detection.

private countTextLengths(array{title: string, desc: string, link: string, encoded?: string, description?: string, content?: string} $item, string $descKey, string $encKey) : array{desc: array{long: int, short: int}, encoded: array{long: int, short: int}}
Parameters
$item : array{title: string, desc: string, link: string, encoded?: string, description?: string, content?: string}

Item data

$descKey : string

Description key

$encKey : string

Encoded key

Return values
array{desc: array{long: int, short: int}, encoded: array{long: int, short: int}}

Counts array

determineBestTextSource()

Determine best text source and update items.

private determineBestTextSource(array<int|string, array<string, string>|string> $rssData, array{item: string, title: string, description: string, link: string, pubDate: string, enclosure: string, url: string} $feedTags, int $descCount, int $descNocount, int $encCount, int $encNocount) : array<int|string, array<string, string>|string>
Parameters
$rssData : array<int|string, array<string, string>|string>

Feed items

$feedTags : array{item: string, title: string, description: string, link: string, pubDate: string, enclosure: string, url: string}

Tag mapping

$descCount : int

Long description count

$descNocount : int

Short description count

$encCount : int

Long encoded count

$encNocount : int

Short encoded count

Return values
array<int|string, array<string, string>|string>

Updated feed data

extractAudioEnclosure()

Extract audio enclosure URL.

private extractAudioEnclosure(DOMElement $node, array{item: string, title: string, description: string, link: string, pubDate: string, enclosure: string, url: string} $feedTags) : string
Parameters
$node : DOMElement

Item node

$feedTags : array{item: string, title: string, description: string, link: string, pubDate: string, enclosure: string, url: string}

Tag mapping

Return values
string

Audio URL or empty string

extractInlineText()

Extract inline text from item node.

private extractInlineText(DOMElement $node, string $articleSection) : string|null
Parameters
$node : DOMElement

Item node

$articleSection : string

Tag name for text extraction

Return values
string|null

Extracted text or null

Extract link from node based on feed type.

private extractLink(DOMElement|null $linkNode, array<string|int, mixed> $feedTags) : string
Parameters
$linkNode : DOMElement|null

Link node

$feedTags : array<string|int, mixed>

Tag mapping

Return values
string

Link URL

formatParsedDate()

Format parsed date array to MySQL datetime.

private formatParsedDate(array<string|int, mixed> $pubDate, int $fallback) : string
Parameters
$pubDate : array<string|int, mixed>

Parsed date array

$fallback : int

Fallback offset

Return values
string

MySQL datetime string

getFeedTagMapping()

Get tag mapping for RSS/Atom feed format.

private getFeedTagMapping(DOMDocument $rss) : array{item: string, title: string, description: string, link: string, pubDate: string, enclosure: string, url: string}|null
Parameters
$rss : DOMDocument

Feed document

Return values
array{item: string, title: string, description: string, link: string, pubDate: string, enclosure: string, url: string}|null

Tag mapping or null if unknown format

parseFeedDate()

Parse feed date string to MySQL datetime format.

private parseFeedDate(string|null $dateStr, int $fallback) : string
Parameters
$dateStr : string|null

Date string from feed

$fallback : int

Fallback offset for ordering

Return values
string

MySQL datetime string

parseItem()

Parse a single feed item.

private parseItem(DOMElement $node, array{item: string, title: string, description: string, link: string, pubDate: string, enclosure: string, url: string} $feedTags, int $index, string $articleSection) : array{title: string, link: string, desc: string, date: string, audio: string, text: string}|null
Parameters
$node : DOMElement

Item node

$feedTags : array{item: string, title: string, description: string, link: string, pubDate: string, enclosure: string, url: string}

Tag mapping

$index : int

Item index (for date fallback)

$articleSection : string

Tag for inline text extraction

Return values
array{title: string, link: string, desc: string, date: string, audio: string, text: string}|null

Parsed item or null if invalid

parseItemForDetection()

Parse item for detection mode (includes raw text content).

private parseItemForDetection(DOMElement $node, array{item: string, title: string, description: string, link: string, pubDate: string, enclosure: string, url: string} $feedTags) : array{title: string, desc: string, link: string, encoded?: string, description?: string, content?: string}
Parameters
$node : DOMElement

Item node

$feedTags : array{item: string, title: string, description: string, link: string, pubDate: string, enclosure: string, url: string}

Tag mapping

Return values
array{title: string, desc: string, link: string, encoded?: string, description?: string, content?: string}

Parsed item


        
On this page

Search results