How to delete lines not starting with 'x' in Pandas, or save lines starting with 'x'

Question

How to delete lines not starting with 'x' in Pandas, or save lines starting with 'x'

I was this morning and slowly packed up. But for life, I can't figure out how to use the .str.startswith () function in Pandas.

My XLSX table is as follows

1 Name, Registration Date, Phone number 2 John Doe, 2015-11-20T19:54:45Z, 1.1112223333 3 Jane Doe, 2015-11-20T20:44:26Z, 65.1112223333 etc...

So, I import it as a data frame, clearing the header so that there are no spaces, etc., then I want to delete any lines, not starting with "1". (or save lines starting with "1.") and delete all the others. Therefore, in this short example, delete the entire “Jane Doe” entry, as its phone number starts with “65”.

 import pandas as pd df = pd.read_excel('testingpanda.xlsx', sheetname = 'Export 1') def colHeaderCleaner(): cols = df.columns cols = cols.map(lambda x: x.replace(' ', '_') if isinstance(x, (str, unicode)) else x) df.columns = cols df.columns = [x.lower() for x in df.columns] colHeaderCleaner() #by default it sets the values in 'registrant_phone' as float64, so this is fixing that... df['registrant_phone'] = df['registrant_phone'].astype('object')

The closest I got, and by that I mean that the only line I could execute without annoying traces and other errors:

 df['registrant_phone'] = df['registrant_phone'].str.startswith('1')

But everything that does converts all the phone values to "NaN", it supports all the lines and everything, as shown below:

 print df [output] name, registration_date, phone_number [output] John Doe, 2015-11-20T19:54:45Z, NaN [output] Jane Doe, 2015-11-20T20:44:26Z, NaN

I looked for too many places to even try to list, I tried different versions of df.drop and just can not understand anything. Where am I going from here?

+5

python pandas

Mxracer888 Feb 03 '16 at 19:47

source share

1 answer

Ami tavory · Accepted Answer · 2016-02-03T20:00:45+0000

I am a little confused by your question. In any case, if you have a DataFrame df with a 'c' column, and you want to remove elements starting with 1 , then the safest way would be to use something like:

 df = df[~df['c'].astype(str).str.startswith('1')]

How to delete lines not starting with 'x' in Pandas, or save lines starting with 'x'

More articles: