Pandas DataFrame Slowly Shows Form or Types

I am very new to pythonand pandas. Any recommendations, comments and suggestions appreciated!

Here is my problem: it takes several minutes to return the result after calling df.shapeor df.dtypes. DataFramehas lines 1,610,658 and 5 . Three columns are stored as int64, one as float64and one as datetime64.

I used the following codes to train the load and convert to python. Both loading and conversion have good performance, but I ran into this problem when I checked the output.

Update 1:

After setting some columns as an index, the time df.shapedrops from 80 + s to 1.7s , but df.dtypesstill remains at 80 + s

import pandas as pd

###############
# Load
###############
raw = pd.read_csv("data.zip", compression='zip')

###############
# Transform
###############

payment_method = {
   "Cash": 1
   "Card": 2
}

df = raw. \
    assign(
        # Encode site ids to int. Only two sites in this data
        site     = (raw.site == "A").astype(int),
        # Encode payment types to int
        payment  = 
            [payment_method.get(k, 0) for k in raw.payment],
        # Rescale values
        amount   = raw.amount / 1e6,
        # Convert integer date key to datetime
        sold_date= pd.to_datetime(
            [str(dt) for dt in raw. sold_date],
            format="%Y%m%d")
    )

###############
# Check point
###############

df.shape # pain point I. Took minutes to return
# Out[9]: (1610658, 5)

df.dtypes # pain point II
# Out[10]: 
# site                       int64
# acct_key                   int64
# sold_date         datetime64[ns]
# amount                   float64
# payment                    int64

If I convert the data frame to numpy.ndarray, I can immediately get the result. I think I should miss something. Please give me some direction.

Thank you so much!

System: OS X 10.12 Python: 3.6.1 Scissors: 1.12 Pandas: 0.20.2 Jupiter Console: 5.1.0

+4
source share

Source: https://habr.com/ru/post/1678958/


All Articles