enlp.processing.stdtools.retain_spaces

enlp.processing.stdtools.retain_spaces(processed)[source]

Retaining spaces around punctuation at the end of a sentence

Function for use when joining tokens and wishing to retain original spacing around punctuation.

without function - lemma = ‘the quick brown fox jump over the lazy dog .’

with function - lemma = ‘the quick brown fox jump over the lazy dog.’

Parameters
processedstr

processed text string

Returns
updated_textstr

updated processed sentence to ensure same spacing around symbols as in original

Notes

Have only accounted for punctuation at the end of a sentence and not others, for example % or $ or # etc.

Examples

>>> tokens = ['Den', 'raske', 'brune', 'reven', 'hoppet', 'over', 'den', 'late', 'hunden', '.']
>>> joined_tokens = ' '.join(tokens)
>>> print ('Original: ', joined_tokens)
>>> print ('Fixed spaces: ', retain_spaces(joined_tokens))
Original:  Den raske brune reven hoppet over den late hunden .
Fixed spaces:  Den raske brune reven hoppet over den late hunden.