class Llama::KvCache

Llama::KvCache
Reference
Object

Overview

Wrapper for the llama_kv_cache structure Provides methods for managing the KV (Key-Value) cache in LLaMA models

Defined in:

llama/kv_cache.cr
llama/kv_cache/error.cr

Constructors

.new(handle : Pointer(LibLlama::LlamaKvCache), ctx : Context)
Creates a new KvCache instance from a raw pointer

Instance Method Summary

#can_shift? : Bool
Checks if the context supports KV cache shifting
#clear : self
Clears the KV cache This removes all tokens from the cache and resets its state
#defrag : self
Defragments the KV cache This will be applied lazily on next decode or explicitly with update
#finalize
Frees the resources associated with this KV cache
#n_tokens : Int32
Returns the number of tokens in the KV cache If a KV cell has multiple sequences assigned to it, it will be counted multiple times
#seq_add(seq_id : Int32, p0 : Int32, p1 : Int32, delta : Int32) : self
Adds a relative position delta to tokens in a sequence
#seq_cp(seq_id_src : Int32, seq_id_dst : Int32, p0 : Int32, p1 : Int32) : self
Copies tokens from one sequence to another in the KV cache
#seq_div(seq_id : Int32, p0 : Int32, p1 : Int32, d : Int32) : self
Divides the positions of tokens in a sequence by a factor
#seq_keep(seq_id : Int32) : self
Keeps only the specified sequence in the KV cache, removing all others
#seq_pos_max(seq_id : Int32) : Int32
Returns the maximum position in a sequence
#seq_rm(seq_id : Int32, p0 : Int32, p1 : Int32) : Bool
Removes tokens from a sequence in the KV cache
#to_unsafe : Pointer(Llama::LibLlama::LlamaKvCache)
Returns the raw pointer to the underlying llama_kv_cache structure
#update : self
Applies pending KV cache updates This includes K-shifts, defragmentation, etc.
#used_cells : Int32
Returns the number of used KV cells A cell is considered used if it has at least one sequence assigned to it

Constructor Detail

def self.new(handle : Pointer(LibLlama::LlamaKvCache), ctx : Context) #

Creates a new KvCache instance from a raw pointer

Note: This constructor is intended for internal use. Users should obtain KvCache instances through Context#kv_cache.

To avoid circular references, we store the context pointer rather than the context object

Raises:

Llama::KvCache::Error if the handle is null

[View source]

Instance Method Detail

def can_shift? : Bool #

Checks if the context supports KV cache shifting

Returns:

true if the context supports KV cache shifting, false otherwise

Raises:

Llama::KvCache::Error if the operation fails

[View source]

def clear : self #

Clears the KV cache This removes all tokens from the cache and resets its state

Returns:

self for method chaining

Raises:

Llama::KvCache::Error if the operation fails

[View source]

def defrag : self #

Defragments the KV cache This will be applied lazily on next decode or explicitly with update

Returns:

self for method chaining

Raises:

Llama::KvCache::Error if the operation fails

[View source]

def finalize #

Frees the resources associated with this KV cache

[View source]

def n_tokens : Int32 #

Returns the number of tokens in the KV cache If a KV cell has multiple sequences assigned to it, it will be counted multiple times

Returns:

The number of tokens in the KV cache

Raises:

Llama::KvCache::Error if the operation fails

[View source]

def seq_add(seq_id : Int32, p0 : Int32, p1 : Int32, delta : Int32) : self #

Adds a relative position delta to tokens in a sequence

Parameters:

seq_id: The sequence ID to modify
p0: Start position (p0 < 0 means start from 0)
p1: End position (p1 < 0 means end at infinity)
delta: The position delta to add

Returns:

self for method chaining

Raises:

Llama::KvCache::Error if the operation fails

[View source]

def seq_cp(seq_id_src : Int32, seq_id_dst : Int32, p0 : Int32, p1 : Int32) : self #

Copies tokens from one sequence to another in the KV cache

Parameters:

seq_id_src: Source sequence ID
seq_id_dst: Destination sequence ID
p0: Start position (p0 < 0 means start from 0)
p1: End position (p1 < 0 means end at infinity)

Returns:

self for method chaining

Raises:

Llama::KvCache::Error if the operation fails

[View source]

def seq_div(seq_id : Int32, p0 : Int32, p1 : Int32, d : Int32) : self #

Divides the positions of tokens in a sequence by a factor

Parameters:

seq_id: The sequence ID to modify
p0: Start position (p0 < 0 means start from 0)
p1: End position (p1 < 0 means end at infinity)
d: The divisor (must be > 1)

Returns:

self for method chaining

Raises:

ArgumentError if the divisor is not greater than 1
Llama::KvCache::Error if the operation fails

[View source]

def seq_keep(seq_id : Int32) : self #

Keeps only the specified sequence in the KV cache, removing all others

Parameters:

seq_id: The sequence ID to keep

Returns:

self for method chaining

Raises:

Llama::KvCache::Error if the operation fails

[View source]

def seq_pos_max(seq_id : Int32) : Int32 #

Returns the maximum position in a sequence

Parameters:

seq_id: The sequence ID to query

Returns:

The maximum position in the sequence

Raises:

Llama::KvCache::Error if the operation fails

[View source]

def seq_rm(seq_id : Int32, p0 : Int32, p1 : Int32) : Bool #

Removes tokens from a sequence in the KV cache

Parameters:

seq_id: The sequence ID to remove tokens from (seq_id < 0 matches any sequence)
p0: Start position (p0 < 0 means start from 0)
p1: End position (p1 < 0 means end at infinity)

Returns:

true if successful, false if a partial sequence cannot be removed (removing a whole sequence never fails)

Raises:

Llama::KvCache::Error if the operation fails

[View source]

def to_unsafe : Pointer(Llama::LibLlama::LlamaKvCache) #

Returns the raw pointer to the underlying llama_kv_cache structure

[View source]

def update : self #

Applies pending KV cache updates This includes K-shifts, defragmentation, etc.

Returns:

self for method chaining

Raises:

Llama::KvCache::Error if the operation fails

[View source]

def used_cells : Int32 #

Returns the number of used KV cells A cell is considered used if it has at least one sequence assigned to it

Returns:

The number of used KV cells

Raises:

Llama::KvCache::Error if the operation fails

[View source]

CrystalDoc.info

llama

class Llama::KvCache

Overview

Defined in:

Constructors

Instance Method Summary

Constructor Detail

Instance Method Detail