module Memo::Clustering

Extended Modules

Defined in:

memo/clustering.cr

Instance Method Summary

Instance Method Detail

def sequential(db : DB::Database, service_id : Int64, usearch_index : USearch::Index, source_type : String, external_ids : Array(Int64), threshold : Float64 = 0.75, min_cluster_size : Int32 = 3) : Array(Cluster) #

Sequential clustering for chronologically ordered sources.

Finds topic boundaries by detecting similarity drops between consecutive items. Preserves chronological order.

Algorithm:

  1. Load full embeddings for all external_ids (in order)
  2. Compute cosine similarity between each consecutive pair
  3. Mark indices where similarity < threshold as boundaries
  4. Group source_ids between boundaries into clusters
  5. Filter out clusters smaller than min_cluster_size

Returns array of Cluster structs representing topic groups. Sources without embeddings are skipped.


[View source]