Convert pandas dataframe to utf8

Question

Convert pandas dataframe to utf8

How to convert pandas framework to Unicode?

`messages=pandas.read_csv('data/SMSSpamCollection',sep='\t',quoting=csv.QUOTE_NONE,names=["label", "message"]) def split_into_tokens(message): message = unicode(message, 'utf8') # convert bytes into proper unicode return TextBlob(message).words messages.head().apply(split_into_tokens(messages))`

He gives an error

 Traceback (most recent call last): File "minor.py", line 46, in <module> messages.head().apply(split_into_tokens(messages)) File "minor.py", line 42, in split_into_tokens message = unicode(message, 'utf8') # convert bytes into proper unicode TypeError: coercing to Unicode: need string or buffer, DataFrame found

+5

python-3.x pandas

ADITYA KUMAR Feb 25 '17 at 13:47

source share

2 answers

Df.x.str.encode ('UTF-8')

Fix your problems.

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.encode.html

+3

jason m Feb 25 '17 at 16:17

source share

sandepp · Accepted Answer · 2017-02-25T16:04:49+0000

Change code

 messages.head().apply(split_into_tokens(messages))

to

 messages.head().apply(split_into_tokens)

when using "apply" with funtion, as in your case, transfer parameters are not required, as your code shows that it passes the data framework, which gives an error during execution.

Convert pandas dataframe to utf8

More articles: