enlp.understanding.distributions.freq_dist

enlp.understanding.distributions.freq_dist(tokens)[source]

Count frequency of tokens

Parameters
tokenslist

list of tokens to be analysed, note these may include punctuation

Returns
countlist

sorted list of words and their respective frequency, i.e. list of tuples (word,count)

Notes

If words are originally in string format use stdtools.tokenise() to convert to input format

Examples

>>> words=['aa','sd','re','aa','er','hg','sd','le','ot','tr','tr']
>>> print(freq_dist(words)[:5]) # top 5 words
[('aa', 2), ('sd', 2), ('tr', 2), ('re', 1), ('er', 1)]