class Cadmium::Summarizer::TextRank

Overview

An implementation of TextRank algorithm for summarization. Step 1 : Create a stochastic matrix for PageRank. From sumy source code : Element at row i and column j of the matrix corresponds to the similarity of sentence i and j, where the similarity is computed as the number of common words between them, divided by their sum of logarithm of their lengths. After such matrix is created, it is turned into a stochastic matrix by normalizing over columns i.e. making the columns sum to one. TextRank uses PageRank algorithm with damping, so a damping factor is incorporated as explained in TextRank's paper. The resulting matrix is a stochastic matrix ready for power method. Source: https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf

Included Modules

Defined in:

cadmium/summarizer/text_rank.cr

Instance methods inherited from class Cadmium::Summarizer::AbstractSummarizer

all_terms(text : String) : Array(String) all_terms, normalize_ratio(terms_ratio : Hash(String, Float64), min_ratio = 0.001, max_ratio = 0.5) : Hash(String, Float64) normalize_ratio, normalized_terms_ratio(text : String, min_ratio = 0.001, max_ratio = 0.5) : Hash(String, Float64) normalized_terms_ratio, select_sentences(text : String, max_num_sentences : Int32) : Array(String) select_sentences, significant_terms(text : String) : Array(String) significant_terms, summarize(text : String, max_num_sentences = 5) : String summarize, terms_frequencies(terms : Array(String)) : Hash(String, Int32) terms_frequencies, terms_ratio(terms_frequencies : Hash(String, Int32), number_of_terms : Int32) : Hash(String, Float64) terms_ratio