I am trying to use some statistics using SciPy, but my input dataset is quite large (~ 1.9 GB) and in dbf format. The file is large enough that Numpy returns an error message when I try to create an array with genfromtxt. (I have 3GB RAM, but win32 works).
i.e:.
Traceback (most recent call last):
File "<pyshell#5>", line 1, in <module>
ind_sum = numpy.genfromtxt(r"W:\RACER_Analyses\Terrestrial_Heterogeneity\IND_SUM.dbf", dtype = (int, int, int, float, float, int), names = True, usecols = (5))
File "C:\Python26\ArcGIS10.0\lib\site-packages\numpy\lib\npyio.py", line 1335, in genfromtxt
for (i, line) in enumerate(itertools.chain([first_line, ], fhd)):
MemoryError
From other posts, I can see that the batch array provided by PyTables may be useful, but first of all it is a problem with reading this data. Or, in other words, PyTables or PyHDF easily create the HDF5 output that is required, but what should I do to get my data into an array first?
For instance:
import numpy, scipy, tables
h5file = tables.openFile(r"W:\RACER_Analyses\Terrestrial_Heterogeneity\HET_IND_SUM2.h5", mode = "w", title = "Diversity Index Results")
group = h5.createGroup("/", "IND_SUM", "Aggregated Index Values"`)
and then I could create a table or array, but how do I get back to the original dbf data? In description?
, !