Jupyter laptop core dies when creating dummy variables using pandas

I am working on a Walmart Kaggle contest and I am trying to create a dummy column column "FinelineNumber". For context df.shapereturns (647054, 7). I am trying to make a dummy column for df['FinelineNumber']which has 5,196 unique values. The result should be a form data frame (647054, 5196), which I then plan concaton in the original data block.

Almost every time I run fineline_dummies = pd.get_dummies(df['FinelineNumber'], prefix='fl'), the following error message appears The kernel appears to have died. It will restart automatically.I run python 2.7 on a jupyter laptop on a MacBookPro with 16 GB of RAM.

Can someone explain why this happens (and why this happens most of the time, but not every time)? Is it a jupyter laptop or bug pandas? In addition, I thought that this might be due to insufficient RAM, but I get the same error on a Microsoft Azure Machine Learning s laptop> 100 GB of RAM. On Azure ML, the kernel dies every time - almost immediately.

+4
source share
1 answer

- 647054, 5196 3,362,092,584 , 24 , 64- . AzureML, VM , , ( 2 , 4 ), , ​​ . , .

. to_sparse() , - . Pandas .

+7

Source: https://habr.com/ru/post/1618309/


All Articles