class Cadmium::Summarizer::SumBasic

Overview

SumBasic (Nenkova and Vanderwende, 2005) is a system that produces generic multi-document summaries. Its design is motivated by the observation that terms occurring frequently in the document cluster occur with higher probability in the human summaries than words occurring less frequently. Step 1 : Calculate the probability ( = normalized ratio) of each term in the document. Step 2 : Calculate for each sentence in the document a rating equals to the average probability of the terms in the sentence. Step 3 : Pick the best scoring sentence that contains the highest probability word. Step 4 : For each term in the sentence chosen at step 3, update their probability (probability²) Step 5 : If the desired summary length has not been reached, go back to Step 2 Reference : http://www.cis.upenn.edu/~nenkova/papers/ipm.pdf

Defined in:

cadmium/summarizer/sum_basic.cr

Instance methods inherited from class Cadmium::Summarizer::AbstractSummarizer

all_terms(text : String) : Array(String) all_terms, normalize_ratio(terms_ratio : Hash(String, Float64), min_ratio = 0.001, max_ratio = 0.5) : Hash(String, Float64) normalize_ratio, normalized_terms_ratio(text : String, min_ratio = 0.001, max_ratio = 0.5) : Hash(String, Float64) normalized_terms_ratio, select_sentences(text : String, max_num_sentences : Int32) : Array(String) select_sentences, significant_terms(text : String) : Array(String) significant_terms, summarize(text : String, max_num_sentences = 5) : String summarize, terms_frequencies(terms : Array(String)) : Hash(String, Int32) terms_frequencies, terms_ratio(terms_frequencies : Hash(String, Int32), number_of_terms : Int32) : Hash(String, Float64) terms_ratio