I have a DataFrame with columns for x, y, z coordinates and a value at that position, and I want to convert this to a 3D ndarray.
To make things more complex, not all values exist in a DataFrame (you can simply replace them with NaN in ndarray).
A simple example:
df = pd.DataFrame({'x': [1, 2, 1, 3, 1, 2, 3, 1, 2],
'y': [1, 1, 2, 2, 1, 1, 1, 2, 2],
'z': [1, 1, 1, 1, 2, 2, 2, 2, 2],
'value': [1, 2, 3, 4, 5, 6, 7, 8, 9]})
Ndarray should appear:
array([[[ 1., 2., nan],
[ 3., nan, 4.]],
[[ 5., 6., 7.],
[ 8., 9., nan]]])
For two dimensions, this is easy:
array = df.pivot_table(index="y", columns="x", values="value").as_matrix()
However, this method cannot be applied to three or more sizes.
Could you give me some suggestions?
Bonus points, if this also works for more than three dimensions, processes several specific values (taking the average value) and ensures that all x, y, z coordinates are consecutive (by inserting rows / columns of NaN in the absence of a coordinate).
EDIT: A few more explanations:
CSV, x, y, z, . (, 0,1 ) ndarray, () . . .
EDIT: :
jakevdp 1.598s, Divikars 7.405s, JohnE 7.867s, Wens 6.286s .