module StringMetrics
Overview
A module containing a collection of well known string metric algorithms
Defined in:
string-metrics.crstring-metrics/version.cr
Constant Summary
-
VERSION =
"0.1.0"
Class Method Summary
-
.damerau_levenshtein(s1 : String, s2 : String) : Int
A variation of the Levenshtein distance, this counts transpositions as a single edit.
-
.hamming(s1 : String, s2 : String) : Int
Returns the number of substitutions that exist between two strings of equal length.
-
.jaro(s1 : String, s2 : String)
A measure of similarity between two strings based on matching characters.
-
.jaro_winkler(s1 : String, s2 : String, scaling_factor = 0.1)
Similar to regular Jaro, but gives a higher score for matching from the beginning of the string.
-
.levenshtein(s1 : String, s2 : String) : Int
Returns the min edit distance between two strings.
Class Method Detail
A variation of the Levenshtein distance, this counts transpositions as a single edit.
StringMetrics.damerau_levenshtein("char", "hcar") == 1
as opposed to a distance of 2 from levenshtein on it's own
Ported from here
Returns the number of substitutions that exist between two strings of equal length. Will raise an ArgumentError if both parameters aren't of the same length
StringMetrics.hamming("Micro", "Macro") == 1
A measure of similarity between two strings based on matching characters. Returns 0 if there is no similarity while 1 is an exact match
StringMetrics.jaro("MARTHA", "MARHTA").round(2) == 0.94
Similar to regular Jaro, but gives a higher score for matching from the beginning of the string. Only change the scaling factor if you're intimate with the algorithm.
StringMetrics.jaro_winkler("MARTHA", "MARHTA").round(2) == 0.96
Returns the min edit distance between two strings. If the strings are exactly the same it will return 0, but if they differ it will return the minimum number of insertions, deletions, or substitutions to make them exactly the same.
StringMetrics.levenshtein("Car", "Char") == 1
More detail can be found here.
Ported from here