Documentation

SimilarityCalculator
in package

Lwt

Modules

Vocabulary

Application

Services

Service class for calculating term similarity.

Contains algorithms for phonetic normalization and similarity ranking using the Sørensen–Dice coefficient.

Constants

STATUS_WEIGHT_IGNORED = 0.5: Weight multiplier for ignored words (status 98).
STATUS_WEIGHT_IN_PROGRESS = 1.15: Weight multiplier for words in progress (status 2-4).
STATUS_WEIGHT_LEARNED = 1.3: Weight multiplier for learned words (status 5).
STATUS_WEIGHT_NEW = 1.0: Weight multiplier for new words (status 1).
STATUS_WEIGHT_WELL_KNOWN = 1.25: Weight multiplier for well-known words (status 99).

Properties

$phoneticMap : array<string, string>: Phonetic character mapping for normalization.

Methods

getCombinedSimilarityRanking() : float: Combined similarity ranking using character pairs and phonetic matching.
getSimilarityRanking() : float: Similarity ranking of two UTF-8 strings using Sørensen–Dice coefficient.
getStatusWeight() : float: Get weight multiplier based on word status.
letterPairs() : array<string|int, string>: Get letter pairs from string.
phoneticNormalize() : string: Normalize a string for phonetic comparison.
wordLetterPairs() : array<string|int, string>: Get word letter pairs from string.

STATUS_WEIGHT_IGNORED

Weight multiplier for ignored words (status 98).


    public
        mixed
    STATUS_WEIGHT_IGNORED
    = 0.5

STATUS_WEIGHT_IN_PROGRESS

Weight multiplier for words in progress (status 2-4).


    public
        mixed
    STATUS_WEIGHT_IN_PROGRESS
    = 1.15

STATUS_WEIGHT_LEARNED

Weight multiplier for learned words (status 5).


    public
        mixed
    STATUS_WEIGHT_LEARNED
    = 1.3

STATUS_WEIGHT_NEW

Weight multiplier for new words (status 1).


    public
        mixed
    STATUS_WEIGHT_NEW
    = 1.0

STATUS_WEIGHT_WELL_KNOWN

Weight multiplier for well-known words (status 99).


    public
        mixed
    STATUS_WEIGHT_WELL_KNOWN
    = 1.25

$phoneticMap

Phonetic character mapping for normalization.


        private
        static    array<string, string>
    $phoneticMap
     = [
    // Vowel groups
    'a' => 'a',
    'à' => 'a',
    'á' => 'a',
    'â' => 'a',
    'ã' => 'a',
    'ä' => 'a',
    'å' => 'a',
    'ā' => 'a',
    'ă' => 'a',
    'ą' => 'a',
    'æ' => 'ae',
    'e' => 'e',
    'è' => 'e',
    'é' => 'e',
    'ê' => 'e',
    'ë' => 'e',
    'ē' => 'e',
    'ĕ' => 'e',
    'ė' => 'e',
    'ę' => 'e',
    'ě' => 'e',
    'i' => 'i',
    'ì' => 'i',
    'í' => 'i',
    'î' => 'i',
    'ï' => 'i',
    'ĩ' => 'i',
    'ī' => 'i',
    'ĭ' => 'i',
    'į' => 'i',
    'ı' => 'i',
    'y' => 'i',
    'o' => 'o',
    'ò' => 'o',
    'ó' => 'o',
    'ô' => 'o',
    'õ' => 'o',
    'ö' => 'o',
    'ō' => 'o',
    'ŏ' => 'o',
    'ő' => 'o',
    'ø' => 'o',
    'œ' => 'oe',
    'u' => 'u',
    'ù' => 'u',
    'ú' => 'u',
    'û' => 'u',
    'ü' => 'u',
    'ũ' => 'u',
    'ū' => 'u',
    'ŭ' => 'u',
    'ů' => 'u',
    'ű' => 'u',
    'ų' => 'u',
    // Consonant groups - similar sounds
    'b' => 'b',
    'p' => 'p',
    'c' => 'k',
    'k' => 'k',
    'q' => 'k',
    'ç' => 's',
    'ć' => 'c',
    'č' => 'c',
    'd' => 'd',
    't' => 't',
    'ð' => 'd',
    'þ' => 't',
    'f' => 'f',
    'v' => 'v',
    'ph' => 'f',
    'g' => 'g',
    'ğ' => 'g',
    'ģ' => 'g',
    'j' => 'j',
    'h' => 'h',
    'l' => 'l',
    'ł' => 'l',
    'ľ' => 'l',
    'ĺ' => 'l',
    'ļ' => 'l',
    'm' => 'm',
    'n' => 'n',
    'ñ' => 'n',
    'ń' => 'n',
    'ň' => 'n',
    'ņ' => 'n',
    'r' => 'r',
    'ŕ' => 'r',
    'ř' => 'r',
    'ŗ' => 'r',
    's' => 's',
    'z' => 's',
    'ś' => 's',
    'š' => 's',
    'ş' => 's',
    'ź' => 's',
    'ż' => 's',
    'ž' => 's',
    'ß' => 'ss',
    'w' => 'w',
    'x' => 'ks',
]

Maps similar-sounding characters to a common representation.

getCombinedSimilarityRanking()

Combined similarity ranking using character pairs and phonetic matching.


    public
                    getCombinedSimilarityRanking(string $str1, string $str2[, float $phoneticWeight = 0.3 ]) : float

Parameters

$str1 : string: First string (lowercase)
$str2 : string: Second string (lowercase)
$phoneticWeight : float = 0.3: Weight for phonetic similarity (0-1)

Return values

float —

Combined similarity ranking (0-1)

getSimilarityRanking()

Similarity ranking of two UTF-8 strings using Sørensen–Dice coefficient.


    public
                    getSimilarityRanking(string $str1, string $str2) : float

Source http://www.catalysoft.com/articles/StrikeAMatch.html Source http://stackoverflow.com/questions/653157

Parameters

$str1 : string: First string
$str2 : string: Second string

Return values

float —

Similarity ranking (0-1)

getStatusWeight()

Get weight multiplier based on word status.


    public
                    getStatusWeight(int $status) : float

Parameters

$status : int: Word status (1-5, 98=ignored, 99=well-known)

Return values

float —

Weight multiplier

letterPairs()

Get letter pairs from string.


    public
                    letterPairs(string $str) : array<string|int, string>

Parameters

$str : string: Input string

Return values

array<string|int, string>

phoneticNormalize()

Normalize a string for phonetic comparison.


    public
                    phoneticNormalize(string $str) : string

Applies phonetic transformations to make similar-sounding words more likely to match.

Parameters

$str : string: Input string (should be lowercase)

Return values

string —

Phonetically normalized string

wordLetterPairs()

Get word letter pairs from string.


    public
                    wordLetterPairs(string $str) : array<string|int, string>

Parameters

$str : string: Input string

Return values

array<string|int, string>

Documentation

SimilarityCalculator
in package

Lwt

Modules

Vocabulary

Application

Services

Tags

Table of Contents

Constants

Properties

Methods

Constants

STATUS_WEIGHT_IGNORED

STATUS_WEIGHT_IN_PROGRESS

STATUS_WEIGHT_LEARNED

STATUS_WEIGHT_NEW

STATUS_WEIGHT_WELL_KNOWN

Properties

$phoneticMap

Methods

getCombinedSimilarityRanking()

Parameters

Return values

getSimilarityRanking()

Parameters

Return values

getStatusWeight()

Parameters

Return values

letterPairs()

Parameters

Return values

phoneticNormalize()

Parameters

Return values

wordLetterPairs()

Parameters

Return values

Search results

SimilarityCalculator in package Lwt Modules Vocabulary Application Services

Tags

Table of Contents

Constants

Properties

Methods

Constants

STATUS_WEIGHT_IGNORED

STATUS_WEIGHT_IN_PROGRESS

STATUS_WEIGHT_LEARNED

STATUS_WEIGHT_NEW

STATUS_WEIGHT_WELL_KNOWN

Properties

$phoneticMap

Methods

getCombinedSimilarityRanking()

Parameters

Return values

getSimilarityRanking()

Parameters

Return values

getStatusWeight()

Parameters

Return values

letterPairs()

Parameters

Return values

phoneticNormalize()

Parameters

Return values

wordLetterPairs()

Parameters

Return values

SimilarityCalculator
in package

Lwt

Modules

Vocabulary

Application

Services