Pandas: remove NaN only at the beginning and at the end of the data frame

Question

Pandas: remove NaN only at the beginning and at the end of the data frame

I have a pandas DataFrame that looks like this:

and I would like to disable NaNat the beginning and at the end ONLY (i.e. only the values will remain, including NaNfrom 1950 to 1954). I already tried .isnull()and dropna(), but somehow I could not find the right solution. Can anyone help?

+4

python pandas time-series nan dataframe

user3017048 Jul 20 '15 at 6:55

source share

3 answers

Below is an approach with Numpy:

import numpy as np

x    = np.logical_not(pd.isnull(df))
mask = np.logical_and(np.cumsum(x)!=0, np.cumsum(x[::-1])[::-1]!=0)

In [313]: df.loc[mask['sum'].tolist()]

Out[313]:
      sum
1950    5
1951    3
1952  NaN
1953    4
1954    8

+2

Colonel beauvel Jul 20 '15 at 8:20

source share

.

import pandas as pd

# your data
# ==============================
df

      sum
1948  NaN
1949  NaN
1950    5
1951    3
1952  NaN
1953    4
1954    8
1955  NaN

# processing
# ===============================
idx = df.fillna(method='ffill').dropna().index
res_idx = df.loc[idx].fillna(method='bfill').dropna().index
df.loc[res_idx]

      sum
1950    5
1951    3
1952  NaN
1953    4
1954    8

+1

Jianxun Li 20 . '15 7:06

Edchum · Accepted Answer · 2015-07-20T08:12:25+0000

Use the built-in first_valid_indexand last_valid_indexthey are specially designed for this and slice your df:

In [5]:

first_idx = df.first_valid_index()
last_idx = df.last_valid_index()
print(first_idx, last_idx)
df.loc[first_idx:last_idx]
1950 1954
Out[5]:
      sum
1950    5
1951    3
1952  NaN
1953    4
1954    8

Pandas: remove NaN only at the beginning and at the end of the data frame

More articles: