module
Memo::Storage
Overview
Low-level storage operations for embeddings and chunks
Extended Modules
Defined in:
memo/storage.crInstance Method Summary
-
#compute_hash(text : String) : Bytes
Compute SHA256 hash for text content
-
#create_chunk(db : DB::Database, hash : Bytes, source_type : String, source_id : Int64, offset : Int32 | Nil, size : Int32, pair_id : Int64 | Nil = nil, parent_id : Int64 | Nil = nil) : Int64
Create chunk reference (or ignore if already exists)
-
#deserialize_embedding(blob : Bytes) : Array(Float64)
Deserialize embedding from binary blob
-
#get_rowid(db : DB::Database, hash : Bytes, service_id : Int64) : Int64 | Nil
Get the rowid of an embedding by hash and service_id.
-
#get_service_by_format_model(db : DB::Database, format : String, model : String) : Tuple(Int64, String, String | Nil, String, Int32, Int32, Float64) | Nil
Returns service record by format and model, or nil if not found
-
#get_service_by_name(db : DB::Database, name : String) : Tuple(Int64, String, String | Nil, String, Int32, Int32, Float64) | Nil
Get service by name
-
#increment_match_count(db : DB::Database, chunk_ids : Array(Int64))
Increment match_count for chunks
-
#increment_read_count(db : DB::Database, chunk_ids : Array(Int64))
Increment read_count for chunks
-
#register_service(db : DB::Database, name : String | Nil, format : String, base_url : String | Nil, model : String, dimensions : Int32, max_tokens : Int32) : Int64
Register or get existing service by name
-
#serialize_embedding(embedding : Array(Float64)) : Bytes
Serialize embedding to binary blob (Int16 for 50% storage reduction)
-
#store_embedding(db : DB::Database, hash : Bytes, token_count : Int32, service_id : Int64) : Tuple(Bool, Int64)
Register embedding hash in database (deduplicated by hash + service_id)
-
#update_tokens_per_byte(db : DB::Database, service_id : Int64, observed_ratio : Float64)
Update tokens_per_byte ratio for a service using exponential moving average
Instance Method Detail
Create chunk reference (or ignore if already exists)
Links a hash to a source with optional relationships. Uses INSERT OR IGNORE to safely handle re-indexing with different services.
All IDs (source_id, pair_id, parent_id) are internal IDs (FK to sources table). source_type is denormalized for fast filtering.
Returns chunk id if inserted, or 0 if chunk already existed (was ignored)
Get the rowid of an embedding by hash and service_id.
Returns nil if not found. Used for USearch key lookup during deletion.
Returns service record by format and model, or nil if not found
Get service by name
Returns service record or nil if not found
Increment match_count for chunks
Increment read_count for chunks
Register or get existing service by name
Returns service_id for the named service configuration. If name is nil, auto-generates from "format/model".
Serialize embedding to binary blob (Int16 for 50% storage reduction)
Maps normalized float range [-1, 1] to Int16 range [-32768, 32767]. Precision loss is ~0.003% for normalized vectors.
Register embedding hash in database (deduplicated by hash + service_id)
Vectors are stored in USearch; this table tracks what has been embedded for deduplication. The SQLite rowid serves as the USearch key.
Returns {inserted, rowid} where inserted is true if new, rowid is the USearch key.
Update tokens_per_byte ratio for a service using exponential moving average
Blends new observation with existing ratio: new = old * 0.9 + observed * 0.1