Count individual words in a Pandas data frame

I am trying to count single words in a column of my data frame. It looks like this. Actually texts are tweets.

text
this is some text that I want to count
That all I wan't
It is unicode text

So what I found from other stackoverflow questions is that I could use the following:

Calculate the most frequent 100 words of sentences in the Dataframe Pandas

Count the different words from Pandas Data Frame

My df is called the result, and this is my code:

from collections import Counter
result2 = Counter(" ".join(result['text'].values.tolist()).split(" ")).items()
result2

I get the following error:

TypeError                                 Traceback (most recent call last)
<ipython-input-6-2f018a9f912d> in <module>()
      1 from collections import Counter
----> 2 result2 = Counter(" ".join(result['text'].values.tolist()).split(" ")).items()
      3 result2
TypeError: sequence item 25831: expected str instance, float found

A text dtype is an object that, as I understand it, is valid for text data in Unicode.

+4
source share
2 answers

, (result['text']) float. ' '.join(), , str.join().

Series.astype() . , .tolist(), str.join(). -

result2 = Counter(" ".join(result['text'].astype(str)).split(" ")).items()

-

In [60]: df = pd.DataFrame([['blah'],['asd'],[10.1]],columns=['A'])

In [61]: df
Out[61]:
      A
0  blah
1   asd
2  10.1

In [62]: ' '.join(df['A'])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-62-77e78c2ee142> in <module>()
----> 1 ' '.join(df['A'])

TypeError: sequence item 2: expected str instance, float found

In [63]: ' '.join(df['A'].astype(str))
Out[63]: 'blah asd 10.1'
+6

:

pd.set_option('display.max_rows', 100)
words = pd.Series(' '.join(result['text'].astype(str)).lower().split(" ")).value_counts()[:100]
words

.

+2

Source: https://habr.com/ru/post/1612472/


All Articles