class
Llama::Model
- Llama::Model
- Reference
- Object
Overview
Wrapper for the llama_model structure
Defined in:
llama/model.crllama/model/error.cr
Constructors
-
.new(path : String, n_gpu_layers : Int32 = 0, use_mmap : Bool = true, use_mlock : Bool = false, vocab_only : Bool = false)
Creates a new Model instance by loading a model from a file.
Instance Method Summary
-
#chat_template(name : String | Nil = nil) : String | Nil
Gets the default chat template for this model
-
#context(*args, **options) : Context
Creates a new Context for this model
-
#decoder_start_token : Int32
Returns the token that must be provided to the decoder to start generating output For encoder-decoder models, returns the decoder start token For other models, returns -1
-
#description : String
Gets a string describing the model type
-
#finalize
Frees the resources associated with this model
-
#has_decoder? : Bool
Returns whether the model contains a decoder
-
#has_encoder? : Bool
Returns whether the model contains an encoder
-
#metadata : Hash(String, String)
Gets all metadata as a hash
-
#metadata_count : Int32
Gets the number of metadata key/value pairs
-
#metadata_key_at(i : Int32) : String | Nil
Gets a metadata key name by index
-
#metadata_value(key : String) : String | Nil
Gets a metadata value as a string by key name
-
#metadata_value_at(i : Int32) : String | Nil
Gets a metadata value as a string by index
-
#model_size : UInt64
Returns the total size of all the tensors in the model in bytes
-
#n_embd : Int32
Returns the number of embedding dimensions in the model
-
#n_head : Int32
Returns the number of attention heads in the model
-
#n_layer : Int32
Returns the number of layers in the model
-
#n_params : UInt64
Returns the number of parameters in the model
-
#recurrent? : Bool
Returns whether the model is recurrent (like Mamba, RWKV, etc.)
-
#rope_freq_scale_train : Float32
Returns the model's RoPE frequency scaling factor
-
#to_unsafe : Pointer(Llama::LibLlama::LlamaModel)
Returns the raw pointer to the underlying llama_model structure
-
#vocab : Vocab
Returns the vocabulary associated with this model
Constructor Detail
Creates a new Model instance by loading a model from a file.
Parameters:
- path: Path to the model file (.gguf format).
- n_gpu_layers: Number of layers to store in VRAM (default: 0). If 0, all layers are loaded to the CPU.
- use_mmap: Use mmap if possible (default: true). Reduces memory usage.
- use_mlock: Force the system to keep the model in RAM (default: false). May improve performance but increases memory usage.
- vocab_only: Only load the vocabulary, no weights (default: false). Useful for inspecting the vocabulary.
Raises:
- Llama::Model::Error if the model cannot be loaded.
Instance Method Detail
Gets the default chat template for this model
Parameters:
- name: Optional template name (nil for default)
Returns:
- The chat template string, or nil if not available
Creates a new Context for this model
This method delegates to Context.new, passing self as the model parameter and forwarding all other arguments.
Returns:
- A new Context instance
Raises:
- Llama::Context::Error if the context cannot be created
Returns the token that must be provided to the decoder to start generating output For encoder-decoder models, returns the decoder start token For other models, returns -1
Gets a string describing the model type
Returns:
- A description of the model
Gets all metadata as a hash
Returns:
- A hash mapping metadata keys to values
Gets the number of metadata key/value pairs
Returns:
- The number of metadata entries
Gets a metadata key name by index
Parameters:
- i: The index of the metadata entry
Returns:
- The key name, or nil if the index is out of bounds
Gets a metadata value as a string by key name
Parameters:
- key: The metadata key to look up
Returns:
- The metadata value as a string, or nil if not found
Gets a metadata value as a string by index
Parameters:
- i: The index of the metadata entry
Returns:
- The value as a string, or nil if the index is out of bounds
Returns the total size of all the tensors in the model in bytes
Returns:
- The total size of all tensors in the model (in bytes)
Returns the raw pointer to the underlying llama_model structure