class
Llama::KvCache
- Llama::KvCache
- Reference
- Object
Overview
Wrapper for the llama_kv_cache structure Provides methods for managing the KV (Key-Value) cache in LLaMA models
Defined in:
llama/kv_cache.crllama/kv_cache/error.cr
Constructors
-
.new(handle : Pointer(LibLlama::LlamaKvCache), ctx : Context)
Creates a new KvCache instance from a raw pointer
Instance Method Summary
-
#can_shift? : Bool
Checks if the context supports KV cache shifting
-
#clear
Clears the KV cache This removes all tokens from the cache and resets its state
-
#defrag
Defragments the KV cache This will be applied lazily on next decode or explicitly with update
-
#finalize
Frees the resources associated with this KV cache
-
#n_tokens : Int32
Returns the number of tokens in the KV cache If a KV cell has multiple sequences assigned to it, it will be counted multiple times
-
#seq_add(seq_id : Int32, p0 : Int32, p1 : Int32, delta : Int32)
Adds a relative position delta to tokens in a sequence
-
#seq_cp(seq_id_src : Int32, seq_id_dst : Int32, p0 : Int32, p1 : Int32)
Copies tokens from one sequence to another in the KV cache
-
#seq_div(seq_id : Int32, p0 : Int32, p1 : Int32, d : Int32)
Divides the positions of tokens in a sequence by a factor
-
#seq_keep(seq_id : Int32)
Keeps only the specified sequence in the KV cache, removing all others
-
#seq_pos_max(seq_id : Int32) : Int32
Returns the maximum position in a sequence
-
#seq_rm(seq_id : Int32, p0 : Int32, p1 : Int32) : Bool
Removes tokens from a sequence in the KV cache
-
#to_unsafe : Pointer(Llama::LibLlama::LlamaKvCache)
Returns the raw pointer to the underlying llama_kv_cache structure
-
#update
Applies pending KV cache updates This includes K-shifts, defragmentation, etc.
-
#used_cells : Int32
Returns the number of used KV cells A cell is considered used if it has at least one sequence assigned to it
Constructor Detail
Creates a new KvCache instance from a raw pointer
Note: This constructor is intended for internal use. Users should obtain KvCache instances through Context#kv_cache.
To avoid circular references, we store the context pointer rather than the context object
Raises:
- Llama::KvCache::Error if the handle is null
Instance Method Detail
Checks if the context supports KV cache shifting
Returns:
- true if the context supports KV cache shifting, false otherwise
Raises:
- Llama::KvCache::Error if the operation fails
Clears the KV cache This removes all tokens from the cache and resets its state
Raises:
- Llama::KvCache::Error if the operation fails
Defragments the KV cache This will be applied lazily on next decode or explicitly with update
Raises:
- Llama::KvCache::Error if the operation fails
Returns the number of tokens in the KV cache If a KV cell has multiple sequences assigned to it, it will be counted multiple times
Returns:
- The number of tokens in the KV cache
Raises:
- Llama::KvCache::Error if the operation fails
Adds a relative position delta to tokens in a sequence
Parameters:
- seq_id: The sequence ID to modify
- p0: Start position (p0 < 0 means start from 0)
- p1: End position (p1 < 0 means end at infinity)
- delta: The position delta to add
Raises:
- Llama::KvCache::Error if the operation fails
Copies tokens from one sequence to another in the KV cache
Parameters:
- seq_id_src: Source sequence ID
- seq_id_dst: Destination sequence ID
- p0: Start position (p0 < 0 means start from 0)
- p1: End position (p1 < 0 means end at infinity)
Raises:
- Llama::KvCache::Error if the operation fails
Divides the positions of tokens in a sequence by a factor
Parameters:
- seq_id: The sequence ID to modify
- p0: Start position (p0 < 0 means start from 0)
- p1: End position (p1 < 0 means end at infinity)
- d: The divisor (must be > 1)
Raises:
- ArgumentError if the divisor is not greater than 1
- Llama::KvCache::Error if the operation fails
Keeps only the specified sequence in the KV cache, removing all others
Parameters:
- seq_id: The sequence ID to keep
Raises:
- Llama::KvCache::Error if the operation fails
Returns the maximum position in a sequence
Parameters:
- seq_id: The sequence ID to query
Returns:
- The maximum position in the sequence
Raises:
- Llama::KvCache::Error if the operation fails
Removes tokens from a sequence in the KV cache
Parameters:
- seq_id: The sequence ID to remove tokens from (seq_id < 0 matches any sequence)
- p0: Start position (p0 < 0 means start from 0)
- p1: End position (p1 < 0 means end at infinity)
Returns:
- true if successful, false if a partial sequence cannot be removed (removing a whole sequence never fails)
Raises:
- Llama::KvCache::Error if the operation fails
Returns the raw pointer to the underlying llama_kv_cache structure
Applies pending KV cache updates This includes K-shifts, defragmentation, etc.
Raises:
- Llama::KvCache::Error if the operation fails
Returns the number of used KV cells A cell is considered used if it has at least one sequence assigned to it
Returns:
- The number of used KV cells
Raises:
- Llama::KvCache::Error if the operation fails