module StringMetrics

Overview

A module containing a collection of well known string metric algorithms

Defined in:

string-metrics.cr
string-metrics/version.cr

Constant Summary

VERSION = "0.1.0"

Class Method Summary

Class Method Detail

def self.damerau_levenshtein(s1 : String, s2 : String) : Int #

A variation of the Levenshtein distance, this counts transpositions as a single edit.

StringMetrics.damerau_levenshtein("char", "hcar") == 1

as opposed to a distance of 2 from levenshtein on it's own

Ported from here


[View source]
def self.hamming(s1 : String, s2 : String) : Int #

Returns the number of substitutions that exist between two strings of equal length. Will raise an ArgumentError if both parameters aren't of the same length

StringMetrics.hamming("Micro", "Macro") == 1

[View source]
def self.jaro(s1 : String, s2 : String) #

A measure of similarity between two strings based on matching characters. Returns 0 if there is no similarity while 1 is an exact match

StringMetrics.jaro("MARTHA", "MARHTA").round(2) == 0.94

[View source]
def self.jaro_winkler(s1 : String, s2 : String, scaling_factor = 0.1) #

Similar to regular Jaro, but gives a higher score for matching from the beginning of the string. Only change the scaling factor if you're intimate with the algorithm.

StringMetrics.jaro_winkler("MARTHA", "MARHTA").round(2) == 0.96

[View source]
def self.levenshtein(s1 : String, s2 : String) : Int #

Returns the min edit distance between two strings. If the strings are exactly the same it will return 0, but if they differ it will return the minimum number of insertions, deletions, or substitutions to make them exactly the same.

StringMetrics.levenshtein("Car", "Char") == 1

More detail can be found here.

Ported from here


[View source]