class Cadmium::BayesClassifier
- Cadmium::BayesClassifier
- Reference
- Object
Overview
This is a native-bayes classifier which used Laplace Smoothing. It can be trained to categorize sentences based on the words in that sentence.
Example:
classifier = Cadmium.bayes_classifier.new
# Train some angry examples
classifier.train("omg I can't believe you would do that to me", "angry")
classifier.train("I hate you so much!", "angry")
classifier.train("Just go. I don't need this.", "angry")
classifier.train("You're so full of shit!", "angry")
# Some happy ones
classifier.train("omg you're the best!", "happy")
classifier.train("I can't believe how happy you make me", "happy")
classifier.train("I love you so damn much!", "happy")
classifier.train("You're the best!", "happy")
# And some indifferent ones
classifier.train("Idk, what do you think?", "indifferent")
classifier.train("yeah that's ok", "indifferent")
classifier.train("cool", "indifferent")
classifier.train("I guess we could do that", "indifferent")
# Now let's test it on a sentence
classifier.categorize("You shit head!")
# => "angry"
puts classifier.categorize("You're the best :)")
# => "happy"
classifier.categorize("idk, my bff jill?")
# => "indifferent"
Included Modules
- JSON::Serializable
- YAML::Serializable
Defined in:
cadmium/classifier/bayes.crConstant Summary
-
DEFAULT_TOKENIZER =
Cadmium::WordTokenizer.new
Constructors
- .new(ctx : YAML::ParseContext, node : YAML::Nodes::Node)
- .new(pull : JSON::PullParser)
- .new(tokenizer = nil)
Instance Method Summary
-
#categories : Array(String)
Category names
-
#categorize(text)
Determines what category the
text
belongs to. -
#doc_count : Hash(String, Int32)
Document frequency table for each of our categories.
-
#frequency_table(tokens)
Build a frequency hash map where - the keys are the entries in
tokens
- the values are the frequency of each entry intokens
-
#initialize_category(name)
Intializes each of our data structure entities for this new category and returns
self
. -
#token_probability(token, category)
Calculate the probaility that a
token
belongs to acategory
. - #tokenizer : Cadmium::Tokenizer
- #tokenizer=(tokenizer : Cadmium::Tokenizer)
-
#total_documents : Int32
Number of documents we have learned from.
-
#train(text, category)
Train our native-bayes classifier by telling it what
category
the traintext
corresponds to. -
#vocabulary : Array(String)
The words to learn from.
-
#word_count : Hash(String, Int32)
For each category, how many total words were mapped to it.
-
#word_frequency_count : Hash(String, Hash(String, Int32))
Word frequency table for each category.
Constructor Detail
Instance Method Detail
Build a frequency hash map where
- the keys are the entries in
tokens
- the values are the frequency of each entry in
tokens
Intializes each of our data structure entities for this
new category and returns self
.
Calculate the probaility that a token
belongs to
a category
.
Train our native-bayes classifier by telling it what
category
the train text
corresponds to.
Word frequency table for each category.