Distributions

freq_dist(tokens)

Count frequency of tokens

compute_tfidf(text_list[, doc_ids])

Compute tfidf

important_words_per_corpus(scores[, n])

Based on tfidf scores, return most important words per corpus

important_words_per_doc(scores[, doc_id, n])

Based on tfidf scores, return most important words per document

Linguistics

pos_tag(model, text)

Return parts-of-speech for words in a peice of text.

Word Vectors

word_vectors(docs[, sg, mc, sz, wnd, epochs])

Compute word vectors from corpus

similar_words(wvs, word[, n])

Find similar words to word

vector_maths(wvs[, pwords, nwords, n])

Perform word vector maths

save_vectors(wvs, fname[, binary])

Save word vector model to file

load_vectors(fname[, binary])

Load word vector model from file

Topic Modelling

bow_topic_modelling(docs[, no_topics])

LDA Topic Modelling with BoW

tfidf_topic_modelling(docs[, no_topics])

LDA Topic Modelling with TF-IDF

print_topic_words(topic_model)

Print words corresponding to topic modelling

determine_topics(doc, topic_model, dictionary)

determine document topics

Sentiment Analysis

vader_sentiment_strength(textlist)

Compute sentiment strength of ENGLISH texts

Keyword Extraction

keyphrase_list(text[, language, stopwords, …])

Extract keywords from a piece of text