enlp.understanding.distributions.freq_dist¶

enlp.understanding.distributions.freq_dist(tokens)[source]¶

Count frequency of tokens

Parameters

tokenslist: list of tokens to be analysed, note these may include punctuation

Returns

countlist: sorted list of words and their respective frequency, i.e. list of tuples (word,count)

Notes

If words are originally in string format use stdtools.tokenise() to convert to input format

Examples

>>> words=['aa','sd','re','aa','er','hg','sd','le','ot','tr','tr']
>>> print(freq_dist(words)[:5]) # top 5 words
[('aa', 2), ('sd', 2), ('tr', 2), ('re', 1), ('er', 1)]