enlp.understanding.distributions.freq_dist¶
-
enlp.understanding.distributions.
freq_dist
(tokens)[source]¶ Count frequency of tokens
- Parameters
- tokens
list
list of tokens to be analysed, note these may include punctuation
- tokens
- Returns
- count
list
sorted list of words and their respective frequency, i.e. list of tuples (word,count)
- count
Notes
If words are originally in string format use stdtools.tokenise() to convert to input format
Examples
>>> words=['aa','sd','re','aa','er','hg','sd','le','ot','tr','tr'] >>> print(freq_dist(words)[:5]) # top 5 words [('aa', 2), ('sd', 2), ('tr', 2), ('re', 1), ('er', 1)]