Fill holes with most surrounding values ​​(Python)

I use Python and have an array with values ​​of 1.0, 2.0, 3.0, 4.0, 5.0, 6.0 and np.nan as NoData.

I want to fill all the "nan" value. This value should match most surrounding values.

For instance:

1 1 1 1 1 1 n 1 2 2 1 3 3 2 1 1 3 2 3 1 

"n" should represent "nan" in this example. Most of its neighbors have a value of 1. Thus, "nan" should be replaced with a value of 1.

Please note that holes consisting of "nan" can have a size from 1 to 5. For example (maximum size is 5 nan):

 1 1 1 1 1 1 nnn 2 1 nn 2 1 1 3 2 3 1 

Here, the "nan" hole has the following meanings:

 surrounding_values = [1,1,1,1,1,2,1,2,3,2,3,1,1,1] -> Majority = 1 

I tried the following code:

 from sklearn.preprocessing import Imputer array = np.array(.......) #consisting of 1.0-6.0 & np.nan imp = Imputer(strategy="most_frequent") fill = imp.fit_transform(array) 

It works very well. However, it uses only one axis (0 = column, 1 = row). The default value is 0 (column), so it uses most of the surrounding values ​​of the same column. For instance:

 Array 2 1 2 1 1 2 n 2 2 2 2 1 2 2 1 1 3 2 3 1 Filled Array 2 1 2 1 1 2 1 2 2 2 2 1 2 2 1 1 3 2 3 1 

So you see, although the majority is 2, the majority of the surrounding values ​​of the column are 1 and therefore it becomes 1 instead of 2.

As a result, I need to find another method using python. Any suggestions or ideas?


ADDITION:

Here you see the result, after I added a very useful improvement to Martin Valgur.

enter image description here

Think of β€œ0” as the sea (blue) and other meanings (> 0) as land (red).

If there is a "small" sea surrounded by land (the sea can again have a size of 1-5 px), it will receive land, as you can successfully see in the resulting image. If the surrounded sea is more than 5px or outside the earth, the sea will not receive land (this is not visible in the image, because it is not).

If there is 1px "nan" with more sea than land, it will still become land (in this example, 50/50).

The following figure shows what I need. On the border between the sea (value = 0) and the land (value> 0), the nan-pixel should get the value of most land values.

enter image description here

It sounds complicated, and I hope I could explain it lively.

+6
source share
3 answers

Possible solution using label() and binary_dilation() from scipy.ndimage :

 import numpy as np from scipy.ndimage import label, binary_dilation from collections import Counter def impute(arr): imputed_array = np.copy(arr) mask = np.isnan(arr) labels, count = label(mask) for idx in range(1, count + 1): hole = labels == idx surrounding_values = arr[binary_dilation(hole) & ~hole] most_frequent = Counter(surrounding_values).most_common(1)[0][0] imputed_array[hole] = most_frequent return imputed_array 

EDIT: Regarding your detailed question on the next steps, you can extend the code above to achieve what you need:

 import numpy as np from scipy.ndimage import label, binary_dilation, binary_closing def fill_land(arr): output = np.copy(arr) # Fill NaN-s mask = np.isnan(arr) labels, count = label(mask) for idx in range(1, count + 1): hole = labels == idx surrounding_values = arr[binary_dilation(hole) & ~hole] output[hole] = any(surrounding_values) # Fill lakes land = output.astype(bool) lakes = binary_closing(land) & ~land labels, count = label(lakes) for idx in range(1, count + 1): lake = labels == idx output[lake] = lake.sum() < 6 return output 
+2
source

I did not find lib, so I wrote a function if in case of all None in the middle of the array you can use these

 import numpy as np from collections import Counter def getModulusSurround(data): tempdata = list(filter(lambda x: x, data)) c = Counter(tempdata) if c.most_common(1)[0][0]: return(c.most_common(1)[0][0]) def main(): array = [[1, 2, 2, 4, 5], [2, 3, 4, 5, 6], [3, 4, None, 6, 7], [1, 4, 2, 3, 4], [4, 6, 2, 2, 4]] array = np.array(array) for i in range(5): for j in range(5): if array[i,j] == None: temparray = array[i-1:i+2,j-1:j+2] array[i,j] = getModulusSurround(temparray.flatten()) print(array) main() 
0
source

After the incredible help of Martin Valgur, I have the result that I need.

So I added the following lines to Martins code:

 from scipy.ndimage import label, binary_dilation from scipy.stats import mode def impute(arr): imputed_array = np.copy(arr) mask = np.isnan(arr) labels, count = label(mask) for idx in range(1, count + 1): hole = labels == idx surrounding_values = arr[binary_dilation(hole) & ~hole] sv_list = np.ndarray.tolist(surrounding_values) #! for sv in sv_list: #! if sv == 0: sv_list.remove(sv) surrounding_values = np.array(sv_list) imputed_array[hole] = mode(surrounding_values).mode[0] return imputed_array 
0
source

Source: https://habr.com/ru/post/1013914/


All Articles