Are there algorithms for selecting functions that can be applied to categorical data inputs?

I am training a neural network that has 10 or so categorical inputs. After coding these categorical inputs once, I finish downloading about 500 network entries.

I would like to be able to find out the importance of each of my categorical materials. Scikit-learn has numerous algorithms for the significance of functions , however, can any of them be applied to the input data of categorical data? All examples use numerical inputs.

I could apply these methods to the heated encoded inputs, but how would I extract the value after applying to the binary inputs? How can importance value be evaluated for categorical inputs?

+5
source share
1 answer

The use of function selection algorithms in one hot coding may be skipped due to the relationship between the encoded functions. For example, if you encode a function of n values ​​into n functions, and you select n-1 from m, the latter function is not required.

Since the number of your functions is rather small (~ 10), choosing a function does not help you so much, since you can probably reduce only a few of them without losing too much information.

You wrote that one hot coding turns 10 functions into 500, which means that each function has about 50 values. In this case, you may be interested in sampling algorithms by manipulating the values ​​themselves. If there is an implied order for the values, you can use algorithms for continuity features . Another option is to simply omit rare values ​​or values ​​without a strong correlation with the concept.

If you use the select function, most algorithms will work on categorical data, but you should beware of corner cases. For example, the reciprocal information suggested by @Igor Raush is a great measure. However, functions with many values ​​tend to have higher entropy than a function with lower values. This, in turn, can lead to more mutual information and bias in particular of many values. The way to deal with this problem is to normalize by dividing mutual information by the entropy of attributes.

Another set of feature selection algorithms that can help you are wrappers . They actually delegate training to the classification algorithm, and therefore they are indifferent to the presentation if the classification algorithm can handle it.

+1
source

Source: https://habr.com/ru/post/1264356/


All Articles