Add Numpy ndarray Metadata Comment

I have a Nump ndarray of three large arrays, and I just wanted to save the path to the file that generated the data there somewhere. Some information about the toy:

A = array([[ 6.52479351e-01, 6.54686928e-01, 6.56884432e-01, ..., 2.55901861e+00, 2.56199503e+00, 2.56498647e+00], [ nan, nan, 9.37914686e-17, ..., 1.01366425e-16, 3.20371075e-16, -6.33655223e-17], [ nan, nan, 8.52057308e-17, ..., 4.26943463e-16, 1.51422386e-16, 1.55097437e-16]], dtype=float32) 

I can't just add it as an array to ndarray, because it should be the same length as the other three.

I could just add np.zeros(len(A[0])) and make the first value a string so that I can get it with A [-1] [0], but that seems ridiculous.

Is there any metadata key that I can use to store a string like /Documents/Data/foobar.txt' , so I can get it with something like A.metadata.comment ?

Thanks!

+5
source share
2 answers

Commenting TobiasR is the easiest way, but you can also subclass ndarray. See numpy documentation or this question

 class MetaArray(np.ndarray): """Array with metadata.""" def __new__(cls, array, dtype=None, order=None, **kwargs): obj = np.asarray(array, dtype=dtype, order=order).view(cls) obj.metadata = kwargs return obj def __array_finalize__(self, obj): if obj is None: return self.metadata = getattr(obj, 'metadata', None) 

Usage example:

 >>> a = MetaArray([1,2,3], comment='/Documents/Data/foobar.txt') >>> a.metadata {'comment': '/Documents/Data/foobar.txt'} 
+4
source

It looks like you might be interested in storing metadata in a permanent way along with your array. If so, HDF5 is a great option for use as a storage container.

For example, create an array and save it in an HDF file with some metadata using h5py :

 import numpy as np import h5py some_data = np.random.random((100, 100)) with h5py.File('data.hdf', 'w') as outfile: dataset = outfile.create_dataset('my data', data=some_data) dataset.attrs['an arbitrary key'] = 'arbitrary values' dataset.attrs['foo'] = 10.2 

Then we can read it:

 import h5py with h5py.File('data.hdf', 'r') as infile: dataset = infile['my data'] some_data = dataset[...] # Load it into memory. Could also slice a subset. print dataset.attrs['an arbitrary key'] print dataset.attrs['foo'] 

As already mentioned, if you are only involved in storing data + metadata in memory, the best option is a dict or a simple wrapper class. For instance:

 class Container: def __init__(self, data, **kwargs): self.data = data self.metadata = kwargs 

Of course, this will not behave like a numpy array directly, but it is usually a bad idea to subclass ndarrays . (You can, but it’s easy to do it wrong. You are almost always better at developing a class that stores the array as an attribute.)

Better yet, do all the operations you do with the same class in the example above. For instance:

 import scipy.signal import numpy as np class SeismicCube(object): def __init__(self, data, bounds, metadata=None): self.data = data self.x0, self.x1, self.y0, self.y1, self.z0, self.z1= bounds self.bounds = bounds self.metadata = {} if metadata is None else metadata def inside(self, x, y, z): """Test if a point is inside the cube.""" inx = self.x0 >= x >= self.x1 iny = self.y0 >= y >= self.y1 inz = self.z0 >= z >= self.z1 return inx and iny and inz def inst_amp(self): """Calculate instantaneous amplitude and return a new SeismicCube.""" hilb = scipy.signal.hilbert(self.data, axis=2) data = np.hypot(hilb.real, hilb.imag) return type(self)(data, self.bounds, self.metadata) 
+1
source

Source: https://habr.com/ru/post/1241321/


All Articles