module
Memo::Clustering
Extended Modules
Defined in:
memo/clustering.crInstance Method Summary
-
#sequential(db : DB::Database, service_id : Int64, usearch_index : USearch::Index, source_type : String, external_ids : Array(Int64), threshold : Float64 = 0.75, min_cluster_size : Int32 = 3) : Array(Cluster)
Sequential clustering for chronologically ordered sources.
Instance Method Detail
def sequential(db : DB::Database, service_id : Int64, usearch_index : USearch::Index, source_type : String, external_ids : Array(Int64), threshold : Float64 = 0.75, min_cluster_size : Int32 = 3) : Array(Cluster)
#
Sequential clustering for chronologically ordered sources.
Finds topic boundaries by detecting similarity drops between consecutive items. Preserves chronological order.
Algorithm:
- Load full embeddings for all external_ids (in order)
- Compute cosine similarity between each consecutive pair
- Mark indices where similarity < threshold as boundaries
- Group source_ids between boundaries into clusters
- Filter out clusters smaller than min_cluster_size
Returns array of Cluster structs representing topic groups. Sources without embeddings are skipped.