eNLP¶

This python library is a collection of common Natural Language Processing functions ranging from processing to visualisation. The purpose of the package is to collect commonly used functions into a single location and provide a simple approach for processing and understanding of textual data.

A number of example usages can be found in eNLP gallery, whilst publications whose research used the package are detailed in the publications section

Language Processing¶

The library has functions for basic language processing, some homemade functions, for example for punctuation removal, and other functions that leverage on the open-source packages of:

NLTK

spaCy

These functions have been wrote such that they can be called individually or strung together to make a processing pipeline. For example, to remove punctuation and perform a lemmatization of the remaining tokens, an NLP pipeline can be set up as so,

from enlp.pipeline import NLPPipeline
import spacy

langmodel = spacy.load('en_core_web_md')
text = "Some exciting text to be processed - ensure the language matches the spacy model"

processed_text = NLPPipeline(langmodel, text)
processed_text.rm_punctuation().spacy_lemmatize()

The processed text can be accessed via

processed_text.text

Understanding¶

The library also has a number of functions for language understanding, such as word vector creation, sentiment analysis, topic modelling and key word extraction. As well as the packages mentioned above, these functions leverage on the open-source packages of:

gensim

scikit-learn

rake-nltk

Visualisation¶

Finally, functions are provided for visualisation - bar plots for visualisation alongside commonly used word clouds. As well as the packages mentioned above, these functions leverage on the open-source packages of:

wordcloud