I have about 0.8 million 256x256 RGB images, which is over 7 GB.
I want to use them as training data in the Nuclear Network and I want to put them in the cPickle file along with their labels.
Now it takes up a lot of memory, to the extent that it is necessary to exchange with my memory on the hard drive, and almost all of it consumes.
It is a bad idea?
What would be a smarter / more practical way to load into CNN or re-work without causing too many memory problems?
This code looks like
import numpy as np import cPickle from PIL import Image import sys,os pixels = [] labels = [] traindata = [] data=[] for subdir, dirs, files in os.walk('images'): curdir='' for file in files: if file.endswith(".jpg"): floc=str(subdir)+'/'+str(file) im= Image.open(floc) pix=np.array(im.getdata()) pixels.append(pix) labels.append(1) pixels=np.array(pixels) labels=np.array(labels) traindata.append(pixels) traindata.append(labels) traindata=np.array(traindata) .....
source share