module Edits::JaroWinkler
Overview
Jaro-Winkler similarity measure.
When Jaro similarity exceeds a threshold, the Winkler extension adds additional weighting if a common prefix exists.
See also:
Defined in:
edits/jaro_winkler.crConstant Summary
-
WINKLER_PREFIX_WEIGHT =
0.1
-
Prefix scaling factor for jaro-winkler metric. Default is 0.1 Should not exceed 0.25 or metric range will leave 0..1
-
WINKLER_THRESHOLD =
0.7
-
Threshold for boosting Jaro with Winkler prefix multiplier. Default is 0.7
Class Method Summary
-
.distance(str1, str2, threshold = WINKLER_THRESHOLD, weight = WINKLER_PREFIX_WEIGHT) : Float
Calculate Jaro-Winkler distance, where 0 is an exact match and 1 is no similarity.
-
.similarity(str1, str2, threshold = WINKLER_THRESHOLD, weight = WINKLER_PREFIX_WEIGHT) : Float
Calculate Jaro-Winkler similarity, where 1 is an exact match and 0 is no similarity.
Class Method Detail
Calculate Jaro-Winkler distance, where 0 is an exact match and 1 is no similarity.
Dw = 1 - similarity
JaroWinkler.distance "information", "informant"
# => 0.05858585858585863
Calculate Jaro-Winkler similarity, where 1 is an exact match and 0 is no similarity.
Sw = Sj + (l * p * (1 - Sj))
Where Sj
is Jaro, l
is prefix length, and p
is prefix weight
JaroWinkler.similarity("information", "informant")
# => 0.9414141414141414
NOTE not a true distance metric, fails to satisfy triangle inequality.