I was this morning and slowly packed up. But for life, I can't figure out how to use the .str.startswith () function in Pandas.
My XLSX table is as follows
1 Name, Registration Date, Phone number 2 John Doe, 2015-11-20T19:54:45Z, 1.1112223333 3 Jane Doe, 2015-11-20T20:44:26Z, 65.1112223333 etc...
So, I import it as a data frame, clearing the header so that there are no spaces, etc., then I want to delete any lines, not starting with "1". (or save lines starting with "1.") and delete all the others. Therefore, in this short example, delete the entire βJane Doeβ entry, as its phone number starts with β65β.
import pandas as pd df = pd.read_excel('testingpanda.xlsx', sheetname = 'Export 1') def colHeaderCleaner(): cols = df.columns cols = cols.map(lambda x: x.replace(' ', '_') if isinstance(x, (str, unicode)) else x) df.columns = cols df.columns = [x.lower() for x in df.columns] colHeaderCleaner()
The closest I got, and by that I mean that the only line I could execute without annoying traces and other errors:
df['registrant_phone'] = df['registrant_phone'].str.startswith('1')
But everything that does converts all the phone values ββto "NaN", it supports all the lines and everything, as shown below:
print df [output] name, registration_date, phone_number [output] John Doe, 2015-11-20T19:54:45Z, NaN [output] Jane Doe, 2015-11-20T20:44:26Z, NaN
I looked for too many places to even try to list, I tried different versions of df.drop and just can not understand anything. Where am I going from here?