enum HClust::Rule
Overview
An enum that provides linkage rules that dictate how the distances should be updated upon merging two clusters.
A linkage rule is used when computing the distances of a newly formed cluster I ∪ J by the union of clusters I and J and every other existing cluster K (d(I ∪ J, K)) during the clustering procedure. Note that d(X, Y) expands to all distances between the clusters x ∈ X and y ∈ Y (d(x, y)).
The update formula for each linkage rule is provided as a class method
with the same name, e.g., Rule.single
for the single linkage rule.
The method accepts the pre-computed distances d(I, J), d(I, K),
and d(J, K), and cluster sizes (|I|, |J|, and |K|). This is an
in-place method, where the distance to be updated (tipically, d(J,
K) as cluster J is reused as the new cluster) is passed as a
pointer to the corresponding position in the distance matrix.
Defined in:
hclust/rule.crEnum Members
-
Average =
0
-
Defines the distance between the cluster I ∪ J and cluster K as the arithmetic mean of all distances from every cluster i ∈ I and j ∈ J to k ∈ K:
d(I ∪ J) = (|I| * d(I, K) + |J| * d(J, K)) / (|I| + |J|)
This is also called the UPGMA method.
-
Centroid =
1
-
Defines the distance between the cluster I ∪ J and cluster K as the distance between the cluster centroids or means:
d(I ∪ J) = sqrt((|I| * d(I, K)² + |J| * d(J, K))²) / (|I| + |J|) - (|I| * |J| * d(I, J)²) / (|I| + |J|)²)
This is also called the UPGMC method.
WARNING This method requires that the initial cluster distances are (proportional to) squared Euclidean distance.
-
Complete =
2
-
Defines the distance between the cluster I ∪ J and cluster K as the largest distance from every cluster i ∈ I and j ∈ J to k ∈ K:
d(I ∪ J) = max(d(I, K), d(J, K))
This is also called the farthest neighbor method.
-
Median =
3
-
Defines the distance between the cluster I ∪ J and cluster K as the distance between the cluster centroids or means, where the centroid of I ∪ J is simply defined as the average of the centroids of I and J:
d(I ∪ J) = sqrt((d(I, K) + d(J, K)) / 2 - d(I, J) / 4)
This is also called the WPGMC method.
WARNING This method requires that the initial cluster distances must be (proportional to) squared Euclidean distance.
-
Single =
4
-
Defines the distance between the cluster I ∪ J and cluster K as the smallest distance from every cluster i ∈ I and j ∈ J to k ∈ K:
d(I ∪ J) = max(d(I, K), d(J, K))
This is also called the nearest neighbor method.
-
Ward =
5
-
Ward's minimum variance criterion minimizes the total intra-cluster variance. It defines the distance between the cluster I ∪ J and cluster K as the weighted squared distance between cluster centers. Using the Lance–Williams formula, the distance can be expressed as
d(I ∪ J) = sqrt((|I| + |K|) * d(I, K)² + (|J| + |K|) * d(J, K)² - |K| * d(I, J)²)
WARNING This method requires that the initial cluster distances are (proportional to) squared Euclidean distance.
-
Weighted =
6
-
Defines the distance between the cluster I ∪ J and cluster K as the arithmetic mean of the average distances between clusters I and K (d(I, K)) and clusters J and K (d(J, K)):
d(I ∪ J) = (da(I, K) + da(J, K)) / 2
where da(X, Y) means the average distance between X and Y. This is also called the WPGMA method.
Class Method Summary
-
.average(d_ij : Float64, d_ik : Float64, ptr_jk : Pointer(Float64), n_i : Int32, n_j : Int32, n_k : Int32) : Nil
Update formula for the average linkage rule.
-
.centroid(d_ij : Float64, d_ik : Float64, ptr_jk : Pointer(Float64), n_i : Int32, n_j : Int32, n_k : Int32) : Nil
Update formula for the centroid linkage rule.
-
.complete(d_ij : Float64, d_ik : Float64, ptr_jk : Pointer(Float64), n_i : Int32, n_j : Int32, n_k : Int32) : Nil
Update formula for the complete linkage rule.
-
.median(d_ij : Float64, d_ik : Float64, ptr_jk : Pointer(Float64), n_i : Int32, n_j : Int32, n_k : Int32) : Nil
Update formula for the median linkage rule.
-
.single(d_ij : Float64, d_ik : Float64, ptr_jk : Pointer(Float64), n_i : Int32, n_j : Int32, n_k : Int32) : Nil
Update formula for the single linkage rule.
-
.ward(d_ij : Float64, d_ik : Float64, ptr_jk : Pointer(Float64), n_i : Int32, n_j : Int32, n_k : Int32) : Nil
Update formula for the ward linkage rule.
-
.weighted(d_ij : Float64, d_ik : Float64, ptr_jk : Pointer(Float64), n_i : Int32, n_j : Int32, n_k : Int32) : Nil
Update formula for the weighted linkage rule.
Instance Method Summary
- #average?
- #centroid?
- #complete?
- #median?
-
#needs_squared_euclidean? : Bool
Returns
true
if the linkage rule requires that the initial cluster distances are (proportional to) squared Euclidean distance, elsefalse
. -
#order_dependent? : Bool
Returns
true
if the distance formula depends on the order which the clusters were formed by merging, elsefalse
. - #single?
- #ward?
- #weighted?
Class Method Detail
Update formula for the average linkage rule. The distance is computed between the newly formed cluster I ∪ J and K from the pre-computed distances between the clusters I, J, and K, and cluster sizes.
NOTE The value at ptr_jk will be modified according to the linkage rule.
Update formula for the centroid linkage rule. The distance is computed between the newly formed cluster I ∪ J and K from the pre-computed distances between the clusters I, J, and K, and cluster sizes.
NOTE The value at ptr_jk will be modified according to the linkage rule.
Update formula for the complete linkage rule. The distance is computed between the newly formed cluster I ∪ J and K from the pre-computed distances between the clusters I, J, and K, and cluster sizes.
NOTE The value at ptr_jk will be modified according to the linkage rule.
Update formula for the median linkage rule. The distance is computed between the newly formed cluster I ∪ J and K from the pre-computed distances between the clusters I, J, and K, and cluster sizes.
NOTE The value at ptr_jk will be modified according to the linkage rule.
Update formula for the single linkage rule. The distance is computed between the newly formed cluster I ∪ J and K from the pre-computed distances between the clusters I, J, and K, and cluster sizes.
NOTE The value at ptr_jk will be modified according to the linkage rule.
Update formula for the ward linkage rule. The distance is computed between the newly formed cluster I ∪ J and K from the pre-computed distances between the clusters I, J, and K, and cluster sizes.
NOTE The value at ptr_jk will be modified according to the linkage rule.
Update formula for the weighted linkage rule. The distance is computed between the newly formed cluster I ∪ J and K from the pre-computed distances between the clusters I, J, and K, and cluster sizes.
NOTE The value at ptr_jk will be modified according to the linkage rule.
Instance Method Detail
Returns true
if the linkage rule requires that the initial
cluster distances are (proportional to) squared Euclidean
distance, else false
.
Returns true
if the distance formula depends on the order
which the clusters were formed by merging, else false
.
This is used in the cluster relabeling after linkage. See
Dendrogram#relabel
.