class Llama::KvCache

Overview

Wrapper for the llama_kv_cache structure Provides methods for managing the KV (Key-Value) cache in LLaMA models

Defined in:

llama/kv_cache.cr
llama/kv_cache/error.cr

Constructors

Instance Method Summary

Constructor Detail

def self.new(handle : Pointer(LibLlama::LlamaKvCache), ctx : Context) #

Creates a new KvCache instance from a raw pointer

Note: This constructor is intended for internal use. Users should obtain KvCache instances through Context#kv_cache.

To avoid circular references, we store the context pointer rather than the context object

Raises:

  • Llama::KvCache::Error if the handle is null

[View source]

Instance Method Detail

def can_shift? : Bool #

Checks if the context supports KV cache shifting

Returns:

  • true if the context supports KV cache shifting, false otherwise

Raises:

  • Llama::KvCache::Error if the operation fails

[View source]
def clear #

Clears the KV cache This removes all tokens from the cache and resets its state

Raises:

  • Llama::KvCache::Error if the operation fails

[View source]
def defrag #

Defragments the KV cache This will be applied lazily on next decode or explicitly with update

Raises:

  • Llama::KvCache::Error if the operation fails

[View source]
def finalize #

Frees the resources associated with this KV cache


[View source]
def n_tokens : Int32 #

Returns the number of tokens in the KV cache If a KV cell has multiple sequences assigned to it, it will be counted multiple times

Returns:

  • The number of tokens in the KV cache

Raises:

  • Llama::KvCache::Error if the operation fails

[View source]
def seq_add(seq_id : Int32, p0 : Int32, p1 : Int32, delta : Int32) #

Adds a relative position delta to tokens in a sequence

Parameters:

  • seq_id: The sequence ID to modify
  • p0: Start position (p0 < 0 means start from 0)
  • p1: End position (p1 < 0 means end at infinity)
  • delta: The position delta to add

Raises:

  • Llama::KvCache::Error if the operation fails

[View source]
def seq_cp(seq_id_src : Int32, seq_id_dst : Int32, p0 : Int32, p1 : Int32) #

Copies tokens from one sequence to another in the KV cache

Parameters:

  • seq_id_src: Source sequence ID
  • seq_id_dst: Destination sequence ID
  • p0: Start position (p0 < 0 means start from 0)
  • p1: End position (p1 < 0 means end at infinity)

Raises:

  • Llama::KvCache::Error if the operation fails

[View source]
def seq_div(seq_id : Int32, p0 : Int32, p1 : Int32, d : Int32) #

Divides the positions of tokens in a sequence by a factor

Parameters:

  • seq_id: The sequence ID to modify
  • p0: Start position (p0 < 0 means start from 0)
  • p1: End position (p1 < 0 means end at infinity)
  • d: The divisor (must be > 1)

Raises:

  • ArgumentError if the divisor is not greater than 1
  • Llama::KvCache::Error if the operation fails

[View source]
def seq_keep(seq_id : Int32) #

Keeps only the specified sequence in the KV cache, removing all others

Parameters:

  • seq_id: The sequence ID to keep

Raises:

  • Llama::KvCache::Error if the operation fails

[View source]
def seq_pos_max(seq_id : Int32) : Int32 #

Returns the maximum position in a sequence

Parameters:

  • seq_id: The sequence ID to query

Returns:

  • The maximum position in the sequence

Raises:

  • Llama::KvCache::Error if the operation fails

[View source]
def seq_rm(seq_id : Int32, p0 : Int32, p1 : Int32) : Bool #

Removes tokens from a sequence in the KV cache

Parameters:

  • seq_id: The sequence ID to remove tokens from (seq_id < 0 matches any sequence)
  • p0: Start position (p0 < 0 means start from 0)
  • p1: End position (p1 < 0 means end at infinity)

Returns:

  • true if successful, false if a partial sequence cannot be removed (removing a whole sequence never fails)

Raises:

  • Llama::KvCache::Error if the operation fails

[View source]
def to_unsafe : Pointer(Llama::LibLlama::LlamaKvCache) #

Returns the raw pointer to the underlying llama_kv_cache structure


[View source]
def update #

Applies pending KV cache updates This includes K-shifts, defragmentation, etc.

Raises:

  • Llama::KvCache::Error if the operation fails

[View source]
def used_cells : Int32 #

Returns the number of used KV cells A cell is considered used if it has at least one sequence assigned to it

Returns:

  • The number of used KV cells

Raises:

  • Llama::KvCache::Error if the operation fails

[View source]