Standard Tools

get_stopwords()

Get list of Norwegian and English stopwords.

rm_stopwords(model, text, stopwords)

Remove stopwords from string.

rm_punctuation(model, text)

Return string free of punctuation

spacy_lemmatize(model, text)

Return string of lemmatized text

nltk_stem_no(model, text)

Return string of stemmed text using NLTK’s Norwegian snowball stemmer

tokenise(model, text)

Return list of tokens for a piece of text.

retain_spaces(processed)

Retaining spaces around punctuation at the end of a sentence

Processing Pipeline

NLPPipeline(model, text)

Pipeline class for combining functions from nlp_tools