class Cadmium::Summarizer::Luhn

Overview

The Luhn summarizer returns the most significant sentences of a text by : 1 - Calculating frequencies ratio of significant terms (discounting ones with ratio outside an arbitray range, ie "normalized"). 2 - Calculating each sentence rating ((number of significant terms)² / (greatest distance between two significant terms)). 3 - Sorting sentences according to their weight and returning the first n of them. Reference : https://ieeexplore.ieee.org/document/5392672?arnumber=5392672

Defined in:

cadmium/summarizer/luhn.cr

Instance methods inherited from class Cadmium::Summarizer::AbstractSummarizer

all_terms(text : String) : Array(String) all_terms, normalize_ratio(terms_ratio : Hash(String, Float64), min_ratio = 0.001, max_ratio = 0.5) : Hash(String, Float64) normalize_ratio, normalized_terms_ratio(text : String, min_ratio = 0.001, max_ratio = 0.5) : Hash(String, Float64) normalized_terms_ratio, select_sentences(text : String, max_num_sentences : Int32) : Array(String) select_sentences, significant_terms(text : String) : Array(String) significant_terms, summarize(text : String, max_num_sentences = 5) : String summarize, terms_frequencies(terms : Array(String)) : Hash(String, Int32) terms_frequencies, terms_ratio(terms_frequencies : Hash(String, Int32), number_of_terms : Int32) : Hash(String, Float64) terms_ratio