BlingFire for Crystal

build

This is a Crystal port of the blingfire-ruby. This port aims to bring the power of BlingFire tokenizers to Crystalists. This library allows you to run GPT-2 tokenization compatible with ChatGPT.

Installation

git clone https://github.com/kojix2/blingfire-crystal
crystal run downloader.cr
crystal spec

downloader.cr downloads compiled libraries from ankane/ml-builds. It also downloads some models from the official BlingFire repository.

example

See gpt2.cr in example directory

require "../src/blingfire"

# Load the model
model = BlingFire::Model.new("gpt2.bin")

# Get the text
text = "Intelligence is an accident of evolution, and not necessarily an advantage."

# Tokenize the text
tokens = model.text_to_ids(text)

# Print the tokens
puts tokens

# Token to text
model = BlingFire::Model.new("gpt2.i2w")

# Print the text
text = model.ids_to_text(tokens)

# Print the text
puts text

Documentation

Development

This port is a hurried work based on ankane/blingfire-ruby. It has passed basic tests, but there might still exist some undiscovered bugs. Please use it with care and report any issues you find. Pull requests and forks are much appreciated.

License

This project is licensed under the MIT License. Please see the LICENSE file for more information.