enlp.processing.stdtools.spacy_lemmatize

enlp.processing.stdtools.spacy_lemmatize(model, text)[source]

Return string of lemmatized text

Lemmatization is the process of reducing the different forms of a word to one single form, for example, reducing “builds”, “building”, or “built” to the lemma “build”

Parameters
modelspacy.lang

SpaCy language model

textstr

text string on which to remove stopwords

Returns
updated_textstr

Updated version of input string where words have been lemmatized

Notes

String output is to allow piping between functions to return words as a list use: to_list(lemmatize(…))

Examples

>>> import spacy
>>> lang_mod = spacy.load('nb_dep_ud_sm')
>>> text = 'Den raske brune reven hoppet over den late hunden.'
>>> print (spacy_lemmatize(lang_mod,text))
'den rask brun rev hoppe over den lat hund.'