Documentation

SimilarityCalculator

Service class for calculating term similarity.

Contains algorithms for phonetic normalization and similarity ranking using the Sørensen–Dice coefficient.

Tags
since
3.0.0

Table of Contents

Constants

STATUS_WEIGHT_IGNORED  = 0.5
Weight multiplier for ignored words (status 98).
STATUS_WEIGHT_IN_PROGRESS  = 1.15
Weight multiplier for words in progress (status 2-4).
STATUS_WEIGHT_LEARNED  = 1.3
Weight multiplier for learned words (status 5).
STATUS_WEIGHT_NEW  = 1.0
Weight multiplier for new words (status 1).
STATUS_WEIGHT_WELL_KNOWN  = 1.25
Weight multiplier for well-known words (status 99).

Properties

$phoneticMap  : array<string, string>
Phonetic character mapping for normalization.

Methods

getCombinedSimilarityRanking()  : float
Combined similarity ranking using character pairs and phonetic matching.
getSimilarityRanking()  : float
Similarity ranking of two UTF-8 strings using Sørensen–Dice coefficient.
getStatusWeight()  : float
Get weight multiplier based on word status.
letterPairs()  : array<string|int, string>
Get letter pairs from string.
phoneticNormalize()  : string
Normalize a string for phonetic comparison.
wordLetterPairs()  : array<string|int, string>
Get word letter pairs from string.

Constants

STATUS_WEIGHT_IGNORED

Weight multiplier for ignored words (status 98).

public mixed STATUS_WEIGHT_IGNORED = 0.5

STATUS_WEIGHT_IN_PROGRESS

Weight multiplier for words in progress (status 2-4).

public mixed STATUS_WEIGHT_IN_PROGRESS = 1.15

STATUS_WEIGHT_LEARNED

Weight multiplier for learned words (status 5).

public mixed STATUS_WEIGHT_LEARNED = 1.3

STATUS_WEIGHT_NEW

Weight multiplier for new words (status 1).

public mixed STATUS_WEIGHT_NEW = 1.0

STATUS_WEIGHT_WELL_KNOWN

Weight multiplier for well-known words (status 99).

public mixed STATUS_WEIGHT_WELL_KNOWN = 1.25

Properties

$phoneticMap

Phonetic character mapping for normalization.

private static array<string, string> $phoneticMap = [ // Vowel groups 'a' => 'a', 'à' => 'a', 'á' => 'a', 'â' => 'a', 'ã' => 'a', 'ä' => 'a', 'å' => 'a', 'ā' => 'a', 'ă' => 'a', 'ą' => 'a', 'æ' => 'ae', 'e' => 'e', 'è' => 'e', 'é' => 'e', 'ê' => 'e', 'ë' => 'e', 'ē' => 'e', 'ĕ' => 'e', 'ė' => 'e', 'ę' => 'e', 'ě' => 'e', 'i' => 'i', 'ì' => 'i', 'í' => 'i', 'î' => 'i', 'ï' => 'i', 'ĩ' => 'i', 'ī' => 'i', 'ĭ' => 'i', 'į' => 'i', 'ı' => 'i', 'y' => 'i', 'o' => 'o', 'ò' => 'o', 'ó' => 'o', 'ô' => 'o', 'õ' => 'o', 'ö' => 'o', 'ō' => 'o', 'ŏ' => 'o', 'ő' => 'o', 'ø' => 'o', 'œ' => 'oe', 'u' => 'u', 'ù' => 'u', 'ú' => 'u', 'û' => 'u', 'ü' => 'u', 'ũ' => 'u', 'ū' => 'u', 'ŭ' => 'u', 'ů' => 'u', 'ű' => 'u', 'ų' => 'u', // Consonant groups - similar sounds 'b' => 'b', 'p' => 'p', 'c' => 'k', 'k' => 'k', 'q' => 'k', 'ç' => 's', 'ć' => 'c', 'č' => 'c', 'd' => 'd', 't' => 't', 'ð' => 'd', 'þ' => 't', 'f' => 'f', 'v' => 'v', 'ph' => 'f', 'g' => 'g', 'ğ' => 'g', 'ģ' => 'g', 'j' => 'j', 'h' => 'h', 'l' => 'l', 'ł' => 'l', 'ľ' => 'l', 'ĺ' => 'l', 'ļ' => 'l', 'm' => 'm', 'n' => 'n', 'ñ' => 'n', 'ń' => 'n', 'ň' => 'n', 'ņ' => 'n', 'r' => 'r', 'ŕ' => 'r', 'ř' => 'r', 'ŗ' => 'r', 's' => 's', 'z' => 's', 'ś' => 's', 'š' => 's', 'ş' => 's', 'ź' => 's', 'ż' => 's', 'ž' => 's', 'ß' => 'ss', 'w' => 'w', 'x' => 'ks', ]

Maps similar-sounding characters to a common representation.

Methods

getCombinedSimilarityRanking()

Combined similarity ranking using character pairs and phonetic matching.

public getCombinedSimilarityRanking(string $str1, string $str2[, float $phoneticWeight = 0.3 ]) : float
Parameters
$str1 : string

First string (lowercase)

$str2 : string

Second string (lowercase)

$phoneticWeight : float = 0.3

Weight for phonetic similarity (0-1)

Return values
float

Combined similarity ranking (0-1)

getSimilarityRanking()

Similarity ranking of two UTF-8 strings using Sørensen–Dice coefficient.

public getSimilarityRanking(string $str1, string $str2) : float

Source http://www.catalysoft.com/articles/StrikeAMatch.html Source http://stackoverflow.com/questions/653157

Parameters
$str1 : string

First string

$str2 : string

Second string

Return values
float

Similarity ranking (0-1)

getStatusWeight()

Get weight multiplier based on word status.

public getStatusWeight(int $status) : float
Parameters
$status : int

Word status (1-5, 98=ignored, 99=well-known)

Return values
float

Weight multiplier

letterPairs()

Get letter pairs from string.

public letterPairs(string $str) : array<string|int, string>
Parameters
$str : string

Input string

Return values
array<string|int, string>

phoneticNormalize()

Normalize a string for phonetic comparison.

public phoneticNormalize(string $str) : string

Applies phonetic transformations to make similar-sounding words more likely to match.

Parameters
$str : string

Input string (should be lowercase)

Return values
string

Phonetically normalized string

wordLetterPairs()

Get word letter pairs from string.

public wordLetterPairs(string $str) : array<string|int, string>
Parameters
$str : string

Input string

Return values
array<string|int, string>

        
On this page

Search results