enum HClust::Rule

Overview

An enum that provides linkage rules that dictate how the distances should be updated upon merging two clusters.

A linkage rule is used when computing the distances of a newly formed cluster I ∪ J by the union of clusters I and J and every other existing cluster K (d(I ∪ J, K)) during the clustering procedure. Note that d(X, Y) expands to all distances between the clusters x ∈ X and y ∈ Y (d(x, y)).

The update formula for each linkage rule is provided as a class method with the same name, e.g., Rule.single for the single linkage rule. The method accepts the pre-computed distances d(I, J), d(I, K), and d(J, K), and cluster sizes (|I|, |J|, and |K|). This is an in-place method, where the distance to be updated (tipically, d(J, K) as cluster J is reused as the new cluster) is passed as a pointer to the corresponding position in the distance matrix.

Defined in:

hclust/rule.cr

Enum Members

Average = 0

Defines the distance between the cluster I ∪ J and cluster K as the arithmetic mean of all distances from every cluster i ∈ I and j ∈ J to k ∈ K:

d(IJ) = (|I| * d(I, K) + |J| * d(J, K)) / (|I| + |J|)

This is also called the UPGMA method.

Centroid = 1

Defines the distance between the cluster I ∪ J and cluster K as the distance between the cluster centroids or means:

d(IJ) = sqrt((|I| * d(I, K+ |J| * d(J, K))²) / (|I| + |J|)
                - (|I| * |J| * d(I, J)²) / (|I| + |J|)²)

This is also called the UPGMC method.

WARNING This method requires that the initial cluster distances are (proportional to) squared Euclidean distance.

Complete = 2

Defines the distance between the cluster I ∪ J and cluster K as the largest distance from every cluster i ∈ I and j ∈ J to k ∈ K:

d(IJ) = max(d(I, K), d(J, K))

This is also called the farthest neighbor method.

Median = 3

Defines the distance between the cluster I ∪ J and cluster K as the distance between the cluster centroids or means, where the centroid of I ∪ J is simply defined as the average of the centroids of I and J:

d(IJ) = sqrt((d(I, K) + d(J, K)) / 2 - d(I, J) / 4)

This is also called the WPGMC method.

WARNING This method requires that the initial cluster distances must be (proportional to) squared Euclidean distance.

Single = 4

Defines the distance between the cluster I ∪ J and cluster K as the smallest distance from every cluster i ∈ I and j ∈ J to k ∈ K:

d(IJ) = max(d(I, K), d(J, K))

This is also called the nearest neighbor method.

Ward = 5

Ward's minimum variance criterion minimizes the total intra-cluster variance. It defines the distance between the cluster I ∪ J and cluster K as the weighted squared distance between cluster centers. Using the Lance–Williams formula, the distance can be expressed as

d(IJ) = sqrt((|I| + |K|) * d(I, K+ (|J| + |K|) * d(J, K- |K| * d(I, J)²)

WARNING This method requires that the initial cluster distances are (proportional to) squared Euclidean distance.

Weighted = 6

Defines the distance between the cluster I ∪ J and cluster K as the arithmetic mean of the average distances between clusters I and K (d(I, K)) and clusters J and K (d(J, K)):

d(IJ) = (da(I, K) + da(J, K)) / 2

where da(X, Y) means the average distance between X and Y. This is also called the WPGMA method.

Class Method Summary

Instance Method Summary

Class Method Detail

def self.average(d_ij : Float64, d_ik : Float64, ptr_jk : Pointer(Float64), n_i : Int32, n_j : Int32, n_k : Int32) : Nil #

Update formula for the average linkage rule. The distance is computed between the newly formed cluster I ∪ J and K from the pre-computed distances between the clusters I, J, and K, and cluster sizes.

NOTE The value at ptr_jk will be modified according to the linkage rule.


[View source]
def self.centroid(d_ij : Float64, d_ik : Float64, ptr_jk : Pointer(Float64), n_i : Int32, n_j : Int32, n_k : Int32) : Nil #

Update formula for the centroid linkage rule. The distance is computed between the newly formed cluster I ∪ J and K from the pre-computed distances between the clusters I, J, and K, and cluster sizes.

NOTE The value at ptr_jk will be modified according to the linkage rule.


[View source]
def self.complete(d_ij : Float64, d_ik : Float64, ptr_jk : Pointer(Float64), n_i : Int32, n_j : Int32, n_k : Int32) : Nil #

Update formula for the complete linkage rule. The distance is computed between the newly formed cluster I ∪ J and K from the pre-computed distances between the clusters I, J, and K, and cluster sizes.

NOTE The value at ptr_jk will be modified according to the linkage rule.


[View source]
def self.median(d_ij : Float64, d_ik : Float64, ptr_jk : Pointer(Float64), n_i : Int32, n_j : Int32, n_k : Int32) : Nil #

Update formula for the median linkage rule. The distance is computed between the newly formed cluster I ∪ J and K from the pre-computed distances between the clusters I, J, and K, and cluster sizes.

NOTE The value at ptr_jk will be modified according to the linkage rule.


[View source]
def self.single(d_ij : Float64, d_ik : Float64, ptr_jk : Pointer(Float64), n_i : Int32, n_j : Int32, n_k : Int32) : Nil #

Update formula for the single linkage rule. The distance is computed between the newly formed cluster I ∪ J and K from the pre-computed distances between the clusters I, J, and K, and cluster sizes.

NOTE The value at ptr_jk will be modified according to the linkage rule.


[View source]
def self.ward(d_ij : Float64, d_ik : Float64, ptr_jk : Pointer(Float64), n_i : Int32, n_j : Int32, n_k : Int32) : Nil #

Update formula for the ward linkage rule. The distance is computed between the newly formed cluster I ∪ J and K from the pre-computed distances between the clusters I, J, and K, and cluster sizes.

NOTE The value at ptr_jk will be modified according to the linkage rule.


[View source]
def self.weighted(d_ij : Float64, d_ik : Float64, ptr_jk : Pointer(Float64), n_i : Int32, n_j : Int32, n_k : Int32) : Nil #

Update formula for the weighted linkage rule. The distance is computed between the newly formed cluster I ∪ J and K from the pre-computed distances between the clusters I, J, and K, and cluster sizes.

NOTE The value at ptr_jk will be modified according to the linkage rule.


[View source]

Instance Method Detail

def average? #

[View source]
def centroid? #

[View source]
def complete? #

[View source]
def median? #

[View source]
def needs_squared_euclidean? : Bool #

Returns true if the linkage rule requires that the initial cluster distances are (proportional to) squared Euclidean distance, else false.


[View source]
def order_dependent? : Bool #

Returns true if the distance formula depends on the order which the clusters were formed by merging, else false.

This is used in the cluster relabeling after linkage. See Dendrogram#relabel.


[View source]
def single? #

[View source]
def ward? #

[View source]
def weighted? #

[View source]