module Edits::JaroWinkler

Overview

Jaro-Winkler similarity measure.

When Jaro similarity exceeds a threshold, the Winkler extension adds additional weighting if a common prefix exists.

Sw = Sj + (l * p * (1 - Sj))

Where Sj is Jaro, l is prefix length, and p is prefix weight see https://en.wikipedia.org/wiki/Jaro-Winkler_distance

Defined in:

edits/jaro_winkler.cr

Constant Summary

WINKLER_PREFIX_WEIGHT = 0.1

prefix scaling factor for jaro-winkler metric. Default is 0.1 Should not exceed 0.25 or metric range will leave 0..1

WINKLER_THRESHOLD = 0.7

Threshold for boosting Jaro with winkler prefix multiplier. Default is 0.7

Class Method Summary

Class Method Detail

def self.distance(str1, str2, threshold = WINKLER_THRESHOLD, weight = WINKLER_PREFIX_WEIGHT) #

Calculate Jaro-Winkler distance, where 0 is an exact match and 1 is no similarity.

JaroWinkler.distance "information", "informant"
# => 0.05858585858585863

[View source]
def self.similarity(str1, str2, threshold = WINKLER_THRESHOLD, weight = WINKLER_PREFIX_WEIGHT) #

Calculate Jaro-Winkler similarity, where 1 is an exact match and 0 is no similarity.

JaroWinkler.similarity("information", "informant")
# => 0.9414141414141414

Note: not a true distance metric, fails to satisfy triangle inequality.


[View source]