Remove punctuation in pandas

code: df['review'].head()
        index         review
output: 0      These flannel wipes are OK, but in my opinion

I want to remove punctuation from a data frame column and create a new column.

code: import string 
      def remove_punctuations(text):
          return text.translate(None,string.punctuation)

      df["new_column"] = df['review'].apply(remove_punctuations)

Error:
  return text.translate(None,string.punctuation)
  AttributeError: 'float' object has no attribute 'translate'

I am using python 2.7. Any suggestions would be helpful.

+4
source share
3 answers

Using Pandas str.replace and regex:

df["new_column"] = df['review'].str.replace('[^\w\s]','')
+6
source

You can create a regular expression using a list of punctuation characters string:

df['review'].str.replace('[{}]'.format(string.punctuation), '')
+3
source

I solved the problem by going through string.punctuation

def remove_punctuations(text):
    for punctuation in string.punctuation:
        text = text.replace(punctuation, '')
    return text

You can call a function just like you, and it should work.

df["new_column"] = df['review'].apply(remove_punctuations)
+2
source

Source: https://habr.com/ru/post/1656294/


All Articles