Cadmium::Glove
Pure Crystal implementation of Global Vectors for Word Representations.
Note that this does not work quite right yet. Something is off with the math and it's returning incorrect results.
Overview
GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.
Resources
Implementations in other languages
Installation
- 
Add the dependency to your shard.yml:dependencies: cadmium_glove: github: cadmium_cr/glove
- 
Run shards install
Usage
require "cadmium"
require "cadmium_glove"
include Cadmium
# Create a new model. Values used here are the defaults.
model = Glove::Model.new(
  max_count: 100,
  learning_rate: 0.05,
  alpha: 0.75,
  num_components: 30,
  epochs: 5
)
# Feed the model some text
text = File.read("quantum-physics.txt")
model.fit(text)
# Alternatively you can pass the model a Corpus object
corpus = Glove::Corpus.build(text)
model.fit(corpus)
# Train the model
model.train
# Save the model as JSON
model.save("./data")To import and use a model:
# Load the previously saved model from the data directory
model = Glove::Model.load("./data")
# Get the most similar words
puts model.most_similar("quantum")
# => [["physics", 0.9974459436353388], ["mechanics", 0.9971606266531394], ["theory", 0.9965966776283189]]
# Find words that are releated to atom like quantum is related to physics
puts model.analogy_words("atom", "quantum", "physics")
# => [["electron", 0.9858380292886947], ["energie", 0.9815122410243475], ["photon", 0.9665073849076669]]Performance
TODO Benchmarks
Contributing
- Fork it (https://github.com/cadmiumcr/glove/fork)
- Create your feature branch (git checkout -b my-new-feature)
- Commit your changes (git commit -am 'Add some feature')
- Push to the branch (git push origin my-new-feature)
- Create a new Pull Request
Contributors
- Chris Watson - creator and maintainer