class Cadmium::Classifier::Bayes
- Cadmium::Classifier::Bayes
 - Reference
 - Object
 
Overview
This is a native-bayes classifier which used Laplace Smoothing. It can be trained to categorize sentences based on the words in that sentence.
Example:
classifier = Cadmium::Classifier::Bayes.new
# Train some angry examples
classifier.train("omg I can't believe you would do that to me", "angry")
classifier.train("I hate you so much!", "angry")
classifier.train("Just go. I don't need this.", "angry")
classifier.train("You're so full of shit!", "angry")
# Some happy ones
classifier.train("omg you're the best!", "happy")
classifier.train("I can't believe how happy you make me", "happy")
classifier.train("I love you so damn much!", "happy")
classifier.train("You're the best!", "happy")
# And some indifferent ones
classifier.train("Idk, what do you think?", "indifferent")
classifier.train("yeah that's ok", "indifferent")
classifier.train("cool", "indifferent")
classifier.train("I guess we could do that", "indifferent")
# Now let's test it on a sentence
classifier.classify("You shit head!")
# => "angry"
puts classifier.classify("You're the best :)")
# => "happy"
classifier.classify("idk, my bff jill?")
# => "indifferent"
  Included Modules
- JSON::Serializable
 - YAML::Serializable
 
Defined in:
cadmium/classifier/bayes.crConstant Summary
- 
        DEFAULT_TOKENIZER = 
Cadmium::Tokenizer::Word.new 
Constructors
- .new(ctx : YAML::ParseContext, node : YAML::Nodes::Node)
 - .new(pull : JSON::PullParser)
 - .new(tokenizer = nil)
 
Instance Method Summary
- 
        #categories : Array(String)
        
          
Category names
 - 
        #classify(text : String)
        
          
Determines what category the
textbelongs to. - 
        #doc_count : Hash(String, Int32)
        
          
Document frequency table for each of our categories.
 - 
        #frequency_table(tokens)
        
          
Build a frequency hash map where - the keys are the entries in
tokens- the values are the frequency of each entry intokens - 
        #initialize_category(name)
        
          
Intializes each of our data structure entities for this new category and returns
self. - 
        #token_probability(token, category)
        
          
Calculate the probaility that a
tokenbelongs to acategory. - #tokenizer : Cadmium::Tokenizer::Base
 - #tokenizer=(tokenizer : Cadmium::Tokenizer::Base)
 - 
        #total_documents : Int32
        
          
Number of documents we have learned from.
 - 
        #train(text, category)
        
          
Train our native-bayes classifier by telling it what
categorythe traintextcorresponds to. - 
        #vocabulary : Array(String)
        
          
The words to learn from.
 - 
        #vocabulary_size : Int32
        
          
The total number of words in the vocabulary
 - 
        #word_count : Hash(String, Int32)
        
          
For each category, how many total words were mapped to it.
 - 
        #word_frequency_count : Hash(String, Hash(String, Int32))
        
          
Word frequency table for each category.
 
Constructor Detail
Instance Method Detail
Build a frequency hash map where
- the keys are the entries in 
tokens - the values are the frequency of each entry in 
tokens 
Intializes each of our data structure entities for this
new category and returns self.
Calculate the probaility that a token belongs to
a category.
Train our native-bayes classifier by telling it what
category the train text corresponds to.
Word frequency table for each category.