I have 3 numpy recarrays with the following structure. The first column is a position (integer), and the second column is an estimate (Float).
Input:
a = [[1, 5.41], [2, 5.42], [3, 12.32], dtype=[('position', '<i4'), ('score', '<f4')]) ] b = [[3, 8.41], [6, 7.42], [4, 6.32], dtype=[('position', '<i4'), ('score', '<f4')]) ] c = [[3, 7.41], [7, 6.42], [1, 5.32], dtype=[('position', '<i4'), ('score', '<f4')]) ]
All 3 arrays contain the same number of elements.
I am looking for an efficient way to combine these three 2d arrays into one array based on the position column.
The arary output for the above example should look like this:
Output:
output = [[3, 12.32, 8.41, 7.41], dtype=[('position', '<i4'), ('score1', '<f4'),('score2', '<f4'),('score3', '<f4')])]
Only a line with position 3 is in the output array, because this position is displayed in all three input arrays.
Update . My naive approach will follow these steps:
- create the vector of the first columns of my 3 input arrays.
- use intersect1D to get the intersection of these three vectors.
- somehow extracting the indices for the vector for all 3 input arrays.
- create a new array with filtered strings from 3 input arrays.
Update2 : Each position value can be in one, two, or all three input arrays. In my output array, I only want to include strings for position values ββthat appear in all 3 input arrays.