module SciKit
Extended Modules
Defined in:
scikit/clustering/kmeans.crscikit/matrices.cr
Instance Method Summary
-
#block_diag(*arrs : Tensor | Enumerable)
Create a block diagonal matrix from provided arrays.
-
#circulant(c : Tensor | Enumerable)
Construct a circulant matrix.
-
#hadamard(n : Int, dtype : U.class = Int32) forall U
Construct an Hadamard matrix.
-
#hankel(c : Tensor | Enumerable, r : Tensor | Enumerable)
Construct a Hankel matrix.
-
#hankel(c : Tensor | Enumerable)
Construct a Hankel matrix.
-
#kmeans(obs : Tensor(Float64), guess : Tensor(Float64), iter : Int = 20)
In the K-Means problem, a set of N points X(I) in M-dimensions is given.
-
#toeplitz(c : Tensor | Enumerable, r : Tensor | Enumerable)
The Toeplitz matrix has constant diagonals, with c as its first column and r as its first row.
-
#toeplitz(c : Tensor | Enumerable)
The Toeplitz matrix has constant diagonals, with c as its first column and r as its first row.
Instance Method Detail
Create a block diagonal matrix from provided arrays.
Given the inputs A, B and C, the output will have these arrays arranged on the diagonal
Arguments
arrs : Tensor | Enumerable Arguments to place along the diagonal, must be a common dtype
Examples
puts SciKit.block_diag([1, 2], [[5, 6], [7, 8]])
# [[1, 2, 0, 0],
# [0, 0, 5, 6],
# [0, 0, 7, 8]]
Construct a circulant matrix.
Arguments
c : Tensor | Enumerable First column of the matrix
Examples
SciKit.circulant([1, 2, 3])
# [[1, 3, 2],
# [2, 1, 3],
# [3, 2, 1]]
Construct an Hadamard matrix.
Constructs an n-by-n Hadamard matrix, using Sylvester’s construction. n must be a power of 2.
Arguments
n : Int The order of the matrix, must be a power of 2
Examples
SciKit.hadamard(2, dtype: Complex)
# [[1+0j , 1+0j ],
# [1+0j , -1+0j]]
Construct a Hankel matrix.
The Hankel matrix has constant anti-diagonals, with c as its first column and r as its last row. If r is not given, then r = zeros_like(c) is assumed.
Arguments
c : Tensor The first column of the matrix r : Tensor Last row of the matrix
Examples
puts SciKit.hankel([1, 2, 99])
# [[ 1, 2, 99],
# [ 2, 99, 0],
# [99, 0, 0]]
Construct a Hankel matrix.
The Hankel matrix has constant anti-diagonals, with c as its first column and r as its last row. If r is not given, then r = zeros_like(c) is assumed.
Arguments
c : Tensor The first column of the matrix r : Tensor Last row of the matrix
Examples
puts SciKit.hankel([1, 2, 99])
# [[ 1, 2, 99],
# [ 2, 99, 0],
# [99, 0, 0]]
In the K-Means problem, a set of N points X(I) in M-dimensions is given. The goal is to arrange these points into K clusters, with each cluster having a representative point Z(J), usually chosen as the centroid of the points in the cluster. The energy of each cluster is
E(J) = Sum ( all points X(I) in cluster J ) || X(I) - Z(J) ||^2
For a given set of clusters, the total energy is then simply the sum of the cluster energies E(J). The goal is to choose the clusters in such a way that the total energy is minimized. Usually, a point X(I) goes into the cluster with the closest representative point Z(J). So to define the clusters, it's enough simply to specify the locations of the cluster representatives.
This is actually a fairly hard problem. Most algorithms do reasonably well, but cannot guarantee that the best solution has been found. It is very common for algorithms to get stuck at a solution which is merely a "local minimum". For such a local minimum, every slight rearrangement of the solution makes the energy go up; however a major rearrangement would result in a big drop in energy.
A simple algorithm for the problem is known as "H-Means". It alternates between two procedures:
Using the given cluster centers, assign each point to the cluster with the nearest center; Using the given cluster assignments, replace each cluster center by the centroid or average of the points in the cluster. These steps are repeated until no points are moved, or some other termination criterion is reached. A more sophisticated algorithm, known as "K-Means", takes advantage of the fact that it is possible to quickly determine the decrease in energy caused by moving a point from its current cluster to another. It repeats the following procedure:
For each point, move it to another cluster if that would lower the energy. If you move a point, immediately update the cluster centers of the two affected clusters. This procedure is repeated until no points are moved, or some other termination criterion is reached.
References
John Hartigan, Manchek Wong, Algorithm AS 136: A K-Means Clustering Algorithm, Applied Statistics, Volume 28, Number 1, 1979, pages 100-108. Wendy Martinez, Angel Martinez, Computational Statistics Handbook with MATLAB, pages 373-376, Chapman and Hall / CRC, 2002. David Sparks, Algorithm AS 58: Euclidean Cluster Analysis, Applied Statistics, Volume 22, Number 1, 1973, pages 126-130.
The Toeplitz matrix has constant diagonals, with c as its first column and r as its first row. If r is not given, r == conjugate(c) is assumed.
Arguments
c : Tensor | Enumerable First column of the Matrix, assumed flat r : Tensor | Enumerable First row of the Matrix, assumed flat
Examples
SciKit.toeplitz([1, 2, 3], [1, 4, 5, 6])
# [[1, 4, 5, 6],
# [2, 1, 4, 5],
# [3, 2, 1, 4]]
The Toeplitz matrix has constant diagonals, with c as its first column and r as its first row. If r is not given, r == conjugate(c) is assumed.
Arguments
c : Tensor | Enumerable First column of the Matrix, assumed flat r : Tensor | Enumerable First row of the Matrix, assumed flat
Examples
SciKit.toeplitz([1, 2, 3], [1, 4, 5, 6])
# [[1, 4, 5, 6],
# [2, 1, 4, 5],
# [3, 2, 1, 4]]