class Cadmium::BayesClassifier

Cadmium::BayesClassifier
Reference
Object

Overview

This is a native-bayes classifier which used Laplace Smoothing. It can be trained to categorize sentences based on the words in that sentence.

Example:

classifier = Cadmium.bayes_classifier.new

# Train some angry examples
classifier.train("omg I can't believe you would do that to me", "angry")
classifier.train("I hate you so much!", "angry")
classifier.train("Just go. I don't need this.", "angry")
classifier.train("You're so full of shit!", "angry")

# Some happy ones
classifier.train("omg you're the best!", "happy")
classifier.train("I can't believe how happy you make me", "happy")
classifier.train("I love you so damn much!", "happy")
classifier.train("You're the best!", "happy")

# And some indifferent ones
classifier.train("Idk, what do you think?", "indifferent")
classifier.train("yeah that's ok", "indifferent")
classifier.train("cool", "indifferent")
classifier.train("I guess we could do that", "indifferent")

# Now let's test it on a sentence
classifier.categorize("You shit head!")
# => "angry"

puts classifier.categorize("You're the best :)")
# => "happy"

classifier.categorize("idk, my bff jill?")
# => "indifferent"

Included Modules

JSON::Serializable
YAML::Serializable

Defined in:

cadmium/classifier/bayes.cr

Constant Summary

DEFAULT_TOKENIZER = Cadmium::WordTokenizer.new

Constructors

.new(ctx : YAML::ParseContext, node : YAML::Nodes::Node)
.new(pull : JSON::PullParser)
.new(tokenizer = nil)

Instance Method Summary

#categories : Array(String)
Category names
#categorize(text)
Determines what category the text belongs to.
#doc_count : Hash(String, Int32)
Document frequency table for each of our categories.
#frequency_table(tokens)
Build a frequency hash map where - the keys are the entries in tokens - the values are the frequency of each entry in tokens
#initialize_category(name)
Intializes each of our data structure entities for this new category and returns self.
#token_probability(token, category)
Calculate the probaility that a token belongs to a category.
#tokenizer : Cadmium::Tokenizer
#tokenizer=(tokenizer : Cadmium::Tokenizer)
#total_documents : Int32
Number of documents we have learned from.
#train(text, category)
Train our native-bayes classifier by telling it what category the train text corresponds to.
#vocabulary : Array(String)
The words to learn from.
#word_count : Hash(String, Int32)
For each category, how many total words were mapped to it.
#word_frequency_count : Hash(String, Hash(String, Int32))
Word frequency table for each category.

Constructor Detail

def self.new(ctx : YAML::ParseContext, node : YAML::Nodes::Node) #

[View source]

def self.new(pull : JSON::PullParser) #

[View source]

def self.new(tokenizer = nil) #

[View source]

Instance Method Detail

def categories : Array(String) #

Category names

[View source]

def categorize(text) #

Determines what category the text belongs to.

[View source]

def doc_count : Hash(String, Int32) #

Document frequency table for each of our categories.

[View source]

def frequency_table(tokens) #

Build a frequency hash map where

the keys are the entries in tokens
the values are the frequency of each entry in tokens

[View source]

def initialize_category(name) #

Intializes each of our data structure entities for this new category and returns self.

[View source]

def token_probability(token, category) #

Calculate the probaility that a token belongs to a category.

[View source]

def tokenizer : Cadmium::Tokenizer #

[View source]

def tokenizer=(tokenizer : Cadmium::Tokenizer) #

[View source]

def total_documents : Int32 #

Number of documents we have learned from.

[View source]

def train(text, category) #

Train our native-bayes classifier by telling it what category the train text corresponds to.

[View source]

def vocabulary : Array(String) #

The words to learn from.

[View source]

def word_count : Hash(String, Int32) #

For each category, how many total words were mapped to it.

[View source]

def word_frequency_count : Hash(String, Hash(String, Int32)) #

Word frequency table for each category.

[View source]

CrystalDoc.info

cadmium

class Cadmium::BayesClassifier

Overview

Included Modules

Defined in:

Constant Summary

Constructors

Instance Method Summary

Constructor Detail

Instance Method Detail