class Llama::Model

Overview

Wrapper for the llama_model structure

Defined in:

llama/model.cr
llama/model/error.cr

Constructors

Instance Method Summary

Constructor Detail

def self.new(path : String, n_gpu_layers : Int32 = 0, use_mmap : Bool = true, use_mlock : Bool = false, vocab_only : Bool = false) #

Creates a new Model instance by loading a model from a file.

Parameters:

  • path: Path to the model file (.gguf format).
  • n_gpu_layers: Number of layers to store in VRAM (default: 0). If 0, all layers are loaded to the CPU.
  • use_mmap: Use mmap if possible (default: true). Reduces memory usage.
  • use_mlock: Force the system to keep the model in RAM (default: false). May improve performance but increases memory usage.
  • vocab_only: Only load the vocabulary, no weights (default: false). Useful for inspecting the vocabulary.

Raises:

  • Llama::Model::Error if the model cannot be loaded.

[View source]

Instance Method Detail

def chat_template(name : String | Nil = nil) : String | Nil #

Gets the default chat template for this model

Parameters:

  • name: Optional template name (nil for default)

Returns:

  • The chat template string, or nil if not available

[View source]
def context(*args, **options) : Context #

Creates a new Context for this model

This method delegates to Context.new, passing self as the model parameter and forwarding all other arguments.

Returns:

  • A new Context instance

Raises:

  • Llama::Context::Error if the context cannot be created

[View source]
def decoder_start_token : Int32 #

Returns the token that must be provided to the decoder to start generating output For encoder-decoder models, returns the decoder start token For other models, returns -1


[View source]
def description : String #

Gets a string describing the model type

Returns:

  • A description of the model

[View source]
def finalize #

Frees the resources associated with this model


[View source]
def has_decoder? : Bool #

Returns whether the model contains a decoder


[View source]
def has_encoder? : Bool #

Returns whether the model contains an encoder


[View source]
def metadata : Hash(String, String) #

Gets all metadata as a hash

Returns:

  • A hash mapping metadata keys to values

[View source]
def metadata_count : Int32 #

Gets the number of metadata key/value pairs

Returns:

  • The number of metadata entries

[View source]
def metadata_key_at(i : Int32) : String | Nil #

Gets a metadata key name by index

Parameters:

  • i: The index of the metadata entry

Returns:

  • The key name, or nil if the index is out of bounds

[View source]
def metadata_value(key : String) : String | Nil #

Gets a metadata value as a string by key name

Parameters:

  • key: The metadata key to look up

Returns:

  • The metadata value as a string, or nil if not found

[View source]
def metadata_value_at(i : Int32) : String | Nil #

Gets a metadata value as a string by index

Parameters:

  • i: The index of the metadata entry

Returns:

  • The value as a string, or nil if the index is out of bounds

[View source]
def model_size : UInt64 #

Returns the total size of all the tensors in the model in bytes

Returns:

  • The total size of all tensors in the model (in bytes)

[View source]
def n_embd : Int32 #

Returns the number of embedding dimensions in the model


[View source]
def n_head : Int32 #

Returns the number of attention heads in the model


[View source]
def n_layer : Int32 #

Returns the number of layers in the model


[View source]
def n_params : UInt64 #

Returns the number of parameters in the model


[View source]
def recurrent? : Bool #

Returns whether the model is recurrent (like Mamba, RWKV, etc.)


[View source]
def rope_freq_scale_train : Float32 #

Returns the model's RoPE frequency scaling factor


[View source]
def to_unsafe : Pointer(Llama::LibLlama::LlamaModel) #

Returns the raw pointer to the underlying llama_model structure


[View source]
def vocab : Vocab #

Returns the vocabulary associated with this model


[View source]