I could successfully read the text in a variable, but when trying to fake texts, getting this strange error:
sentences=nltk.sent_tokenize(sample) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 11: ordinal not in range(128)
I know that the cause of the error is some special line / char, which the token is not able to read / decode, but then how to get around this? Thanks
user4197202
source share