module Edits::JaroWinkler
Overview
Jaro-Winkler similarity measure.
When Jaro similarity exceeds a threshold, the Winkler extension adds additional weighting if a common prefix exists.
Sw = Sj + (l * p * (1 - Sj))
Where Sj
is Jaro, l
is prefix length, and p
is prefix weight
see https://en.wikipedia.org/wiki/Jaro-Winkler_distance
Defined in:
edits/jaro_winkler.crConstant Summary
-
WINKLER_PREFIX_WEIGHT =
0.1
-
prefix scaling factor for jaro-winkler metric. Default is 0.1 Should not exceed 0.25 or metric range will leave 0..1
-
WINKLER_THRESHOLD =
0.7
-
Threshold for boosting Jaro with winkler prefix multiplier. Default is 0.7
Class Method Summary
-
.distance(str1, str2, threshold = WINKLER_THRESHOLD, weight = WINKLER_PREFIX_WEIGHT)
Calculate Jaro-Winkler distance, where 0 is an exact match and 1 is no similarity.
-
.similarity(str1, str2, threshold = WINKLER_THRESHOLD, weight = WINKLER_PREFIX_WEIGHT)
Calculate Jaro-Winkler similarity, where 1 is an exact match and 0 is no similarity.
Class Method Detail
Calculate Jaro-Winkler distance, where 0 is an exact match and 1 is no similarity.
JaroWinkler.distance "information", "informant"
# => 0.05858585858585863
Calculate Jaro-Winkler similarity, where 1 is an exact match and 0 is no similarity.
JaroWinkler.similarity("information", "informant")
# => 0.9414141414141414
Note: not a true distance metric, fails to satisfy triangle inequality.