enlp.processing.stdtools.rm_stopwords

enlp.processing.stdtools.rm_stopwords(model, text, stopwords)[source]

Remove stopwords from string.

Parameters
modelspacy.lang

SpaCy language model

textstr

text string on which to remove stopwords

stopwordslist

list of stopwords to remove

Returns
updated_textstr

Updated version of input string with stopwords (and possibly punctuation) removed

Notes

String output is to allow piping between functions to return words as a list use: tokenise(rm_stopwords(…))

Examples

>>> import spacy
>>> lang_mod = spacy.load('nb_dep_ud_sm')
>>> text = 'Den raske brune reven hoppet over den late hunden.'
>>> stopwords_all, stopwords_norwegian, stopwords_english = get_stopwords()
>>> print (rm_stopwords(lang_mod, text, stopwords_all))
raske brune reven hoppet late hunden.