I work with a large data frame, and I try my best to find an effective way to eliminate specific dates. Please note that I am trying to eliminate any measurements from a specific date.
Pandas has this great feature where you can call:
df.ix['2016-04-22']
and pull out all the lines from this day. But what if I want to delete all lines from "2016-04-22"?
I need a function like this:
df.ix[~'2016-04-22']
(but it does not work)
Also, what if I want to delete a date list?
Now I have the following solution:
import numpy as np import pandas as pd from numpy import random This is the list of dates I want to remove removelist = ['2016-04-22', '2016-04-24']
This for loop basically captures the index for the dates I want to delete, then removes it from the index of the main data frame, and then positively selects the remaining dates (i.e.: good dates) from the data block.
for r in removelist: elimlist = df.ix[r].index.tolist() ind = df.index.tolist() culind = [i for i in ind if i not in elimlist] df = df.ix[culind]
Is there anything better?
I also tried indexing with a rounded date + 1 day, so something like this:
df[~((df['Timestamp'] < r+pd.Timedelta("1 day")) & (df['Timestamp'] > r))]
But it becomes very cumbersome and (at the end of the day), I will still use the for loop when I need to eliminate n specific dates.
There has to be a better way! Right? May be?